Skip to content

How CKB Works

CKB analyzes your code through a multi-stage pipeline that builds a queryable knowledge graph. Here's what happens under the hood.

The Analysis Pipeline

Your Code → Parsing → Symbol Extraction → Graph Building → Query Engine → Answers

1. Parsing with Tree-sitter

CKB uses Tree-sitter to parse your source files into concrete syntax trees (CSTs). This gives us:

  • Language-agnostic parsing — Same approach works for Go, TypeScript, Python, Rust, Java, and more
  • Incremental updates — Only re-parse changed files
  • Error tolerance — Partially broken code still parses
// Your code
func ProcessOrder(order Order) error {
    return db.Save(order)
}

// Tree-sitter sees
(function_declaration
  name: (identifier) @function.name
  parameters: (parameter_list ...)
  body: (block ...))

2. Symbol Extraction

From the syntax tree, CKB extracts symbols — the meaningful identifiers in your code:

Symbol Type Examples
Functions ProcessOrder, handleRequest
Types Order, UserService
Variables db, config
Imports "github.com/your/package"

Each symbol gets a unique identifier based on its fully-qualified path, so pkg/orders.ProcessOrder is distinct from pkg/returns.ProcessOrder.

3. Building the Knowledge Graph

CKB connects symbols into a directed graph:

┌─────────────┐     calls      ┌──────────────┐
│ProcessOrder │───────────────▶│   db.Save    │
└─────────────┘                └──────────────┘
       │                              │
       │ uses type                    │ uses type
       ▼                              ▼
┌─────────────┐                ┌──────────────┐
│    Order    │                │    Record    │
└─────────────┘                └──────────────┘

The graph captures:

  • Call relationships — What functions call what
  • Type usage — What types are used where
  • Import dependencies — What packages depend on what
  • File membership — What symbols live in what files

4. Enrichment Layers

On top of the base graph, CKB adds contextual information:

Ownership — Who owns what code, derived from:

  • Git blame history
  • CODEOWNERS files
  • Commit patterns

Complexity Metrics — How complex is each function:

  • Cyclomatic complexity
  • Cognitive complexity
  • Lines of code

Change History — How often code changes:

  • Commit frequency (hotspots)
  • Recent modifications
  • Churn rate

Documentation — What's documented:

  • Doc comments
  • README references
  • ADR (Architecture Decision Record) links

5. The Query Engine

When you ask CKB a question, the query engine:

  1. Routes your query to the right analyzer
  2. Traverses the knowledge graph
  3. Merges results from multiple sources
  4. Compresses output for efficient LLM consumption

Example: Impact Analysis

Let's trace what happens when you ask "What breaks if I change ProcessOrder?"

ckb impact ProcessOrder

Step 1: Find the symbol

Query: "ProcessOrder"
Result: pkg/orders/handler.go:ProcessOrder (function)

Step 2: Traverse callers (reverse call graph)

ProcessOrder
├── called by: OrderController.Create
├── called by: BatchProcessor.Run
└── called by: OrderService.Submit

Step 3: Expand transitively

OrderController.Create
├── called by: HTTP handler /api/orders
└── tested by: orders_test.go

BatchProcessor.Run
├── called by: CronJob scheduler
└── tested by: batch_test.go

Step 4: Assess risk

Impact Score: HIGH
- 3 direct callers
- 2 test files affected
- 1 public API endpoint
- Last changed: 3 days ago (active code)

Step 5: Format response

Changing ProcessOrder affects:
- 3 functions in 2 packages
- Tests: orders_test.go, batch_test.go
- Risk: HIGH (public API surface)

Storage Architecture

CKB stores the knowledge graph in SQLite for portability:

.ckb/
├── index.db          # Symbol graph, relationships
├── ownership.db      # Code ownership data
├── cache/            # Query result cache
└── telemetry.db      # Usage analytics (opt-in)

Why SQLite?

  • No server to run — works offline
  • Fast queries — optimized for graph traversal
  • Portable — copy the folder to share the index
  • Incremental — update without full rebuild

Backends and Fallbacks

CKB uses multiple backends depending on what's available:

Backend What it provides When used
SCIP Precise cross-references When SCIP index exists
Tree-sitter Syntax-based symbols Always (fallback)
LSP Real-time analysis IDE integrations
Git History and blame Ownership queries

If you have a SCIP index (generated by scip-go, scip-typescript, etc.), CKB uses it for compiler-accurate references. Otherwise, Tree-sitter provides good-enough analysis without any setup.

What CKB Doesn't Do

Understanding limitations helps set expectations:

  • No runtime analysis — CKB analyzes static code, not execution paths
  • No modification — CKB is read-only; it never changes your code
  • No dynamic dispatch — Interface implementations require explicit hints
  • No external dependencies — CKB analyzes your code, not node_modules

Performance Characteristics

Operation Typical Time Notes
Initial index 10-60s Depends on codebase size
Incremental update 1-5s Only changed files
Simple query 10-50ms Single symbol lookup
Impact analysis 100-500ms Graph traversal
Full-repo search 200ms-2s Depends on result count

CKB is designed for interactive use — queries should feel instant.

Next Steps