How CKB Works
CKB analyzes your code through a multi-stage pipeline that builds a queryable knowledge graph. Here's what happens under the hood.
The Analysis Pipeline
Your Code → Parsing → Symbol Extraction → Graph Building → Query Engine → Answers
1. Parsing with Tree-sitter
CKB uses Tree-sitter to parse your source files into concrete syntax trees (CSTs). This gives us:
- Language-agnostic parsing — Same approach works for Go, TypeScript, Python, Rust, Java, and more
- Incremental updates — Only re-parse changed files
- Error tolerance — Partially broken code still parses
// Your code
func ProcessOrder(order Order) error {
return db.Save(order)
}
// Tree-sitter sees
(function_declaration
name: (identifier) @function.name
parameters: (parameter_list ...)
body: (block ...))
2. Symbol Extraction
From the syntax tree, CKB extracts symbols — the meaningful identifiers in your code:
| Symbol Type | Examples |
|---|---|
| Functions | ProcessOrder, handleRequest |
| Types | Order, UserService |
| Variables | db, config |
| Imports | "github.com/your/package" |
Each symbol gets a unique identifier based on its fully-qualified path, so pkg/orders.ProcessOrder is distinct from pkg/returns.ProcessOrder.
3. Building the Knowledge Graph
CKB connects symbols into a directed graph:
┌─────────────┐ calls ┌──────────────┐
│ProcessOrder │───────────────▶│ db.Save │
└─────────────┘ └──────────────┘
│ │
│ uses type │ uses type
▼ ▼
┌─────────────┐ ┌──────────────┐
│ Order │ │ Record │
└─────────────┘ └──────────────┘
The graph captures:
- Call relationships — What functions call what
- Type usage — What types are used where
- Import dependencies — What packages depend on what
- File membership — What symbols live in what files
4. Enrichment Layers
On top of the base graph, CKB adds contextual information:
Ownership — Who owns what code, derived from:
- Git blame history
- CODEOWNERS files
- Commit patterns
Complexity Metrics — How complex is each function:
- Cyclomatic complexity
- Cognitive complexity
- Lines of code
Change History — How often code changes:
- Commit frequency (hotspots)
- Recent modifications
- Churn rate
Documentation — What's documented:
- Doc comments
- README references
- ADR (Architecture Decision Record) links
5. The Query Engine
When you ask CKB a question, the query engine:
- Routes your query to the right analyzer
- Traverses the knowledge graph
- Merges results from multiple sources
- Compresses output for efficient LLM consumption
Example: Impact Analysis
Let's trace what happens when you ask "What breaks if I change ProcessOrder?"
ckb impact ProcessOrder
Step 1: Find the symbol
Query: "ProcessOrder"
Result: pkg/orders/handler.go:ProcessOrder (function)
Step 2: Traverse callers (reverse call graph)
ProcessOrder
├── called by: OrderController.Create
├── called by: BatchProcessor.Run
└── called by: OrderService.Submit
Step 3: Expand transitively
OrderController.Create
├── called by: HTTP handler /api/orders
└── tested by: orders_test.go
BatchProcessor.Run
├── called by: CronJob scheduler
└── tested by: batch_test.go
Step 4: Assess risk
Impact Score: HIGH
- 3 direct callers
- 2 test files affected
- 1 public API endpoint
- Last changed: 3 days ago (active code)
Step 5: Format response
Changing ProcessOrder affects:
- 3 functions in 2 packages
- Tests: orders_test.go, batch_test.go
- Risk: HIGH (public API surface)
Storage Architecture
CKB stores the knowledge graph in SQLite for portability:
.ckb/
├── index.db # Symbol graph, relationships
├── ownership.db # Code ownership data
├── cache/ # Query result cache
└── telemetry.db # Usage analytics (opt-in)
Why SQLite?
- No server to run — works offline
- Fast queries — optimized for graph traversal
- Portable — copy the folder to share the index
- Incremental — update without full rebuild
Backends and Fallbacks
CKB uses multiple backends depending on what's available:
| Backend | What it provides | When used |
|---|---|---|
| SCIP | Precise cross-references | When SCIP index exists |
| Tree-sitter | Syntax-based symbols | Always (fallback) |
| LSP | Real-time analysis | IDE integrations |
| Git | History and blame | Ownership queries |
If you have a SCIP index (generated by scip-go, scip-typescript, etc.), CKB uses it for compiler-accurate references. Otherwise, Tree-sitter provides good-enough analysis without any setup.
What CKB Doesn't Do
Understanding limitations helps set expectations:
- No runtime analysis — CKB analyzes static code, not execution paths
- No modification — CKB is read-only; it never changes your code
- No dynamic dispatch — Interface implementations require explicit hints
- No external dependencies — CKB analyzes your code, not
node_modules
Performance Characteristics
| Operation | Typical Time | Notes |
|---|---|---|
| Initial index | 10-60s | Depends on codebase size |
| Incremental update | 1-5s | Only changed files |
| Simple query | 10-50ms | Single symbol lookup |
| Impact analysis | 100-500ms | Graph traversal |
| Full-repo search | 200ms-2s | Depends on result count |
CKB is designed for interactive use — queries should feel instant.
Next Steps
- Quick Start — Index your first repository
- Architecture — Deep dive into system design
- Impact Analysis — Learn about blast radius queries