Incremental Indexing
Incremental indexing makes SCIP index updates O(changed files) instead of O(entire repo). After editing a file, the index updates in seconds instead of requiring a full reindex.
Availability: Go, TypeScript, JavaScript, Python, Dart, Rust (v7.5+). Other languages fall back to full reindexing.
v1.1 (v7.3): Adds incremental callgraph maintenance—outgoing calls from changed files are always accurate.
v2.0 (v7.3): Adds transitive invalidation—files depending on changed files can be automatically queued for rescanning.
v4.0 (v7.3): Adds CI-generated delta artifacts for O(delta) server-side ingestion.
v5.0 (v7.5): Adds multi-language support via indexer registry pattern.
Why Incremental Indexing?
Full SCIP indexing scans your entire codebase, which can take 30+ seconds for large projects. This creates friction:
- During development: You edit one file but wait 30s for the index to update
- In CI/CD: Every commit triggers a full reindex even if only one file changed
- With watch mode: Frequent reindexes burn CPU and slow down your machine
Incremental indexing solves this by only processing changed files.
How It Works
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ Change Detection│ ──► │ SCIP Extraction │ ──► │ Delta Application│ ──► │ Transitive │
│ (git diff -z) │ │ (symbols + calls)│ │ (delete+insert) │ │ Invalidation (v2)│
└─────────────────┘ └──────────────────┘ └─────────────────┘ └──────────────────┘
1. Change Detection
CKB detects changes using git:
git diff --name-status -z <last-indexed-commit> HEAD
The -z flag uses NUL separators, correctly handling paths with spaces or special characters.
Tracked change types:
- Added - New .go files
- Modified - Changed .go files
- Deleted - Removed .go files
- Renamed - Moved/renamed .go files (tracks old path for cleanup)
Fallback: For non-git repos, CKB falls back to hash-based comparison against stored file hashes.
2. SCIP Extraction
CKB runs scip-go to regenerate the full SCIP index (protobuf doesn't support partial updates), but then:
- Loads the index into memory
- Iterates documents, only processing those in the changed set
- Extracts symbols, references, and call edges for changed files only
- Resolves caller symbols (which function contains each call site)
- Skips unchanged documents entirely
This means even though scip-go runs on the full codebase, CKB only does the expensive database work for changed files.
Call Edge Extraction (v1.1): For each reference to a callable symbol (function/method), CKB:
- Detects callables using symbol kind or the
().pattern in symbol IDs - Resolves the enclosing function as the caller
- Stores edges with location info:
(caller_file, line, column, callee_id)
3. Delta Application
For each changed file, CKB applies updates using delete+insert:
Modified file.go:
1. DELETE FROM file_symbols WHERE file_path = 'file.go'
2. DELETE FROM indexed_files WHERE path = 'file.go'
3. DELETE FROM callgraph WHERE caller_file = 'file.go' -- v1.1
4. DELETE FROM file_deps WHERE dependent_file = 'file.go' -- v2
5. INSERT new symbols, file state, call edges, and dependencies
Renamed old.go → new.go:
1. DELETE using old path (including callgraph, file_deps)
2. INSERT using new path
This approach is simple and correct—no complex diffing logic. The caller-owned edges invariant means call edges are always deleted and rebuilt with their owning file.
4. Transitive Invalidation (v2)
When a file changes, other files that depend on it may have stale references. v2 adds transitive invalidation to track and optionally rescan these dependent files.
File Dependency Tracking:
- CKB maintains a
file_depstable:(dependent_file, defining_file) - When
a.goreferences a symbol defined inb.go, CKB recordsa.go → b.go - Only internal dependencies are tracked (not stdlib/external packages)
Rescan Queue:
- When
b.gochanges, files depending on it (a.go) are enqueued for rescanning - The queue tracks: file path, reason, BFS depth, and attempt count
- Queue processing respects configurable budgets (max files, max time)
Usage
Default Behavior (Supported Languages)
Incremental indexing is enabled by default for supported languages:
- Go - scip-go
- TypeScript/JavaScript - scip-typescript
- Python - scip-python
- Dart - scip_dart
- Rust - rust-analyzer
# Incremental by default for supported languages
ckb index
# Output for incremental update:
Incremental Index Complete
--------------------------
Files: 3 modified, 1 added, 0 deleted
Symbols: 15 added, 8 removed
Refs: 42 updated
Calls: 127 edges updated
Time: 1.2s
Commit: abc1234 (+dirty)
Pending: 5 files queued for rescan
Accuracy:
OK Go to definition - accurate
OK Find refs (forward) - accurate
!! Find refs (reverse) - may be stale
OK Callees (outgoing) - accurate
!! Callers (incoming) - may be stale
Run 'ckb index --force' for full accuracy (47 files since last full)
Force Full Reindex
# Full reindex (ignores incremental)
ckb index --force
Use --force when:
- You need 100% accurate reverse references
- You need accurate caller information (who calls a function)
- After major refactoring across many files
- When incremental reports issues
- To clear the rescan queue and start fresh
Transitive Invalidation Modes (v2)
CKB supports four invalidation modes:
| Mode | Behavior |
|---|---|
none |
Disabled—no dependency tracking or invalidation |
lazy |
Enqueue dependents, drain on next full reindex (default) |
eager |
Enqueue and drain immediately (with budgets) |
deferred |
Enqueue and drain periodically in background |
Lazy Mode (Default)
In lazy mode, dependent files are queued but not immediately rescanned:
- Low overhead during incremental indexing
- Queue drains automatically on next
ckb index --force - Best for development workflows where occasional staleness is acceptable
Eager Mode
In eager mode, CKB rescans dependent files immediately:
- Higher accuracy after incremental updates
- Respects budget limits to prevent runaway processing
- Best when accuracy is critical
Configuration
{
"incremental": {
"threshold": 50,
"indexTests": false,
"excludes": ["vendor", "testdata"]
},
"transitive": {
"enabled": true,
"mode": "lazy",
"depth": 1,
"maxRescanFiles": 200,
"maxRescanMs": 1500
}
}
| Setting | Default | Description |
|---|---|---|
enabled |
true | Enable transitive invalidation |
mode |
lazy |
Invalidation mode: none, lazy, eager, deferred |
depth |
1 | BFS cascade depth (1 = direct dependents only) |
maxRescanFiles |
200 | Max files to rescan per drain run |
maxRescanMs |
1500 | Max time (ms) per drain run (0 = unlimited) |
Accuracy Guarantees
Incremental indexing maintains forward accuracy but may have stale reverse references. With v1.1, call graph accuracy is improved: outgoing calls (callees) are always accurate. With v2 in eager mode with queue drained, all queries are accurate.
| Query Type | After Incremental | After Queue Drained |
|---|---|---|
| Go to definition | Always accurate | Always accurate |
| Find refs FROM changed files | Always accurate | Always accurate |
| Find refs TO symbols in changed files | May be stale | Accurate |
| Call graph (callees) | Always accurate | Always accurate |
| Call graph (callers) | May be stale | Accurate |
| Symbol search | Always accurate | Always accurate |
Why Reverse References May Be Stale
Consider this scenario:
// utils.go (unchanged)
func Helper() { ... }
// main.go (changed - removed call to Helper)
func main() {
// Helper() <- removed this line
}
After incremental indexing:
main.gois re-indexed correctly (no longer references Helper)utils.gois NOT re-indexed (unchanged)- CKB's stored references still show
main.go→Helperfromutils.go's perspective
This is the "caller-owned edges" invariant: references are owned by the FROM file, not the TO file.
Impact: When you ask "what calls Helper?", CKB might still show the deleted call from main.go until you run ckb index --force.
With v2 eager mode: If you change helper.go, files that depend on it are automatically rescanned, keeping reverse references accurate.
Index State Tracking
CKB tracks index state in the database:
Index State:
State: partial (3 files since last full)
Commit: abc1234
Dirty: yes (uncommitted changes)
Pending: 5 files queued for rescan
States:
full- Complete reindex, all references accurate, queue emptypartial- Incremental updates applied, reverse refs may be stalepending- Work queued in rescan queue (v2)full_dirty/partial_dirty- Uncommitted changes detected
When Full Reindex Is Required
CKB automatically triggers a full reindex when:
| Condition | Reason |
|---|---|
| No previous index | Nothing to diff against |
| Schema version mismatch | Database structure changed |
| No tracked commit | Can't compute git diff |
| >50% files changed | Incremental overhead exceeds full reindex |
You'll see messages like:
Full reindex required: schema version mismatch (have 7, need 8)
Performance Characteristics
| Scenario | Full Index | Incremental |
|---|---|---|
| Small project (100 files) | ~2s | ~0.5s |
| Medium project (1000 files) | ~15s | ~1-2s |
| Large project (10000 files) | ~60s | ~2-5s |
| Single file change | ~60s | ~1s |
The key insight: incremental time is proportional to changed files, not total files.
Transitive invalidation overhead (v2):
- Lazy mode: negligible (~1ms to enqueue dependents)
- Eager mode: depends on cascade size and budgets
Limitations
Current limitations:
- Some languages unsupported - Java, Kotlin, C++, Ruby, C#, PHP always do full reindex (build complexity)
- Reverse refs may be stale in lazy mode - Use eager mode or
--forcewhen accuracy is critical - Callers may be stale - Incoming calls to changed symbols may be outdated until queue drains
- No partial SCIP - Still runs full indexer, just processes less output
- External deps not tracked - Only internal file dependencies are tracked
- Indexer must be installed - Missing indexers fall back to full reindex with install hint
Troubleshooting
"Full reindex required" every time
Check that:
- You're in a git repository
- The previous index completed successfully
- Schema version matches (may need
--forceafter CKB upgrade)
Incremental seems slow
If incremental takes as long as full reindex:
- Check how many files changed (
git status) - If >50% changed, CKB falls back to full automatically
- Large individual files still take time to process
Stale references causing issues
If you're seeing phantom references:
# Force full reindex (also clears rescan queue)
ckb index --force
This rebuilds all references from scratch.
Too many pending rescans
If the rescan queue grows large:
# Check queue status
ckb status
# Force full reindex to clear queue
ckb index --force
Or increase budgets in configuration to process more files per run.
Delta Artifacts (v4)
Delta artifacts enable O(delta) server-side ingestion by pre-computing the diff in CI. Instead of the server comparing databases, CI generates a manifest of exactly what changed.
Why Delta Artifacts?
Traditional incremental indexing computes diffs by comparing the staging DB to the current DB—O(N) over all symbols/refs/calls. For repos with 500k+ symbols, this becomes a bottleneck.
Delta artifacts solve this by having CI emit the diff alongside the index:
┌──────────┐ ┌──────────────┐ ┌─────────────┐
│ CI Build │ ──► │ ckb diff │ ──► │ delta.json │
│ (scip) │ │ (compare DBs)│ │ (manifest) │
└──────────┘ └──────────────┘ └─────────────┘
│
▼
┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
│ CKB Server │ ◄── │ POST /delta │ ◄── │ CI Upload │
│ (apply delta)│ │ /ingest │ │ (artifact) │
└──────────────┘ └─────────────────┘ └──────────────┘
Generating Delta Artifacts
Use ckb diff to generate a delta manifest:
# Compare two snapshot databases
ckb diff \
--base /path/to/old-snapshot.db \
--new /path/to/new-snapshot.db \
--output delta.json
# Output: delta.json with changes
Delta JSON Schema
{
"delta_schema_version": 1,
"base_snapshot_id": "sha256:abc123...",
"new_snapshot_id": "sha256:def456...",
"commit": "def456789",
"timestamp": 1703260800,
"deltas": {
"symbols": {
"added": ["scip-go...NewFunc()."],
"modified": ["scip-go...ChangedFunc()."],
"deleted": ["scip-go...RemovedFunc()."]
},
"refs": {
"added": [{"pk": "f_abc:42:12:scip-go...Foo().", "data": {...}}],
"deleted": ["f_abc:50:5:scip-go...Old()."]
},
"callgraph": { "added": [...], "deleted": [...] },
"files": { "added": [...], "modified": [...], "deleted": [...] }
},
"stats": { "total_added": 45, "total_modified": 12, "total_deleted": 8 }
}
Ingesting Delta Artifacts
Upload delta artifacts to CKB server via the API:
# Validate delta without applying
curl -X POST http://localhost:8080/delta/validate \
-H "Content-Type: application/json" \
-d @delta.json
# Ingest delta artifact
curl -X POST http://localhost:8080/delta/ingest \
-H "Content-Type: application/json" \
-d @delta.json
Server Validation
Before applying a delta, the server validates:
- Schema version -
delta_schema_versionmust be supported - Base snapshot -
base_snapshot_idmust match current active snapshot - Counts - Entity counts must match stats
- Hashes - Spot-check hashes for modified entities
- Integrity - Foreign key relationships must be valid
If validation fails, the server rejects the delta and requires a full snapshot.
Configuration
{
"ingestion": {
"deltaArtifacts": true,
"deltaValidation": "strict",
"fallbackToStagingDiff": true
}
}
| Setting | Default | Description |
|---|---|---|
deltaArtifacts |
true | Enable delta artifact ingestion |
deltaValidation |
strict |
Validation mode: strict or permissive |
fallbackToStagingDiff |
true | Fall back to staging diff if delta fails |
CI Integration Example (GitHub Actions)
name: Index and Upload Delta
on:
push:
branches: [main]
jobs:
index:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Download previous snapshot
uses: actions/download-artifact@v4
with:
name: ckb-snapshot
path: .ckb/
continue-on-error: true
- name: Run SCIP indexer
run: ckb index
- name: Generate delta
run: |
if [ -f .ckb/previous.db ]; then
ckb diff --base .ckb/previous.db --new .ckb/ckb.db --output delta.json
fi
- name: Upload delta to CKB server
if: hashFiles('delta.json') != ''
run: |
curl -X POST ${{ secrets.CKB_SERVER_URL }}/delta/ingest \
-H "Authorization: Bearer ${{ secrets.CKB_TOKEN }}" \
-H "Content-Type: application/json" \
-d @delta.json
- name: Save snapshot for next run
run: cp .ckb/ckb.db .ckb/previous.db
- uses: actions/upload-artifact@v4
with:
name: ckb-snapshot
path: .ckb/previous.db
retention-days: 7
Performance Impact
| Repo Size | Traditional Diff | Delta Artifact |
|---|---|---|
| 10k symbols | 50ms | 5ms |
| 100k symbols | 500ms | 10ms |
| 500k symbols | 5s | 20ms |
Delta artifacts shift the diff computation to CI (where it runs once) instead of the server (where it would run on every request).
Related
- CI-CD-Integration - Using incremental indexing in CI pipelines
- User Guide - CLI commands including
ckb index - Performance - Latency targets and benchmarks
- Configuration - All configuration options