Does Vectr require an API key to run?

No. The default embedding model (ibm-granite/granite-embedding-english-r2) runs locally via sentence-transformers. It downloads once (~290MB) and requires no API key. The codebase passport is synthesised by your AI code editor on the first vectr_map call — using your editor's existing API access, not a separate key Vectr requires. All features — search, symbol graph, working memory — run entirely offline.

How does Vectr remember working context across sessions?

Vectr provides vectr_remember and vectr_recall MCP tools. The AI calls vectr_remember to save a working note — key files, edge cases, what's still missing — to a persistent SQLite store. Notes survive /compact and new sessions intact. vectr_recall retrieves them in <50ms, verbatim, any time. Notes have five kinds (directive, task, gotcha, finding, reference), each with distinct injection semantics, plus priority levels and time-based decay. A vectr_snapshot captures the full session state for exact restoration.

Can a team or a fleet of agents share one Vectr instance?

Yes. vectr start --host binds a central instance (a network-reachable bind refuses to start without an API key), and vectr connect --url configures each developer's or agent's editor to use it. The code index and working memory are shared — a note one connected agent stores, every other can recall — with per-client attribution on notes, constant-time API-key authentication, and optional encryption at rest for note contents and snapshots.

Does the AI have to call recall, or can Vectr deliver memory on its own?

Both. Notes can carry per-memory triggers — path globs, lifecycle events like session-start, pre-commit, or post-compaction, exact symbol references resolved against the code graph, semantic similarity to the current prompt, and temporal guards — evaluated in the editor's session-hook pipeline with per-session fire ledgers and injection budgets, so a note resurfaces exactly when its condition matches. An opt-in localhost proxy (vectr proxy) goes further and injects matched notes into the request context deterministically, without relying on the model to ask. Both channels are consent-gated and observable.

What does vectr_evict_hint do?

vectr_evict_hint lists the retrieved code chunks that vectr can re-retrieve in <50ms — the AI doesn't need to re-read those files. Vectr tracks everything retrieved in a session, estimates token cost, and lists exactly which chunks are fully indexed with a <50ms re-retrieval guarantee. This is the reverse signal in the vectr protocol: the AI saves findings (vectr_remember), vectr signals what it can recall instantly (vectr_evict_hint).

Why is AST-aware chunking better than splitting by token count?

Token-based splitting cuts code at arbitrary boundaries — often mid-function or mid-argument-list. This produces chunks that are syntactically incomplete and embed poorly. AST-aware chunking splits at semantic boundaries (function, class, method) so each chunk is a complete, meaningful unit. Vectr uses tree-sitter for Python, JavaScript, TypeScript, Go, Rust, Java, Zig, C, and C++.

What is vectr_locate and how is it different from vectr_search?

vectr_locate finds where a symbol is defined — file path, line number, and kind (function/class/method) — without returning any code content. It costs almost no tokens. vectr_search returns the actual code chunk. The intended workflow is: locate first to know where to look, then search only when you need the code itself.

Vectr — Code Intelligence AI Tool

Q: Can a team or a fleet of agents share one Vectr instance?

Yes. vectr start --host binds a central instance (a network-reachable bind refuses to start without an API key), and vectr connect --url configures each developer's or agent's editor to use it. The code index and working memory are shared — a note one connected agent stores, every other can recall — with per-client attribution on notes, constant-time API-key authentication, and optional encryption at rest for note contents and snapshots.

Q: Does the AI have to call recall, or can Vectr deliver memory on its own?

Both. Notes can carry per-memory triggers — path globs, lifecycle events like session-start, pre-commit, or post-compaction, exact symbol references resolved against the code graph, semantic similarity to the current prompt, and temporal guards — evaluated in the editor's session-hook pipeline with per-session fire ledgers and injection budgets, so a note resurfaces exactly when its condition matches. An opt-in localhost proxy (vectr proxy) goes further and injects matched notes into the request context deterministically, without relying on the model to ask. Both channels are consent-gated and observable.

Q: What does vectr_evict_hint do?

vectr_evict_hint lists the retrieved code chunks that vectr can re-retrieve in <50ms — the AI doesn't need to re-read those files. Vectr tracks everything retrieved in a session, estimates token cost, and lists exactly which chunks are fully indexed with a <50ms re-retrieval guarantee. This is the reverse signal in the vectr protocol: the AI saves findings (vectr_remember), vectr signals what it can recall instantly (vectr_evict_hint).

Q: Why is AST-aware chunking better than splitting by token count?

Token-based splitting cuts code at arbitrary boundaries — often mid-function or mid-argument-list. This produces chunks that are syntactically incomplete and embed poorly. AST-aware chunking splits at semantic boundaries (function, class, method) so each chunk is a complete, meaningful unit. Vectr uses tree-sitter for Python, JavaScript, TypeScript, Go, Rust, Java, Zig, C, and C++.

Q: What is vectr_locate and how is it different from vectr_search?

vectr_locate finds where a symbol is defined — file path, line number, and kind (function/class/method) — without returning any code content. It costs almost no tokens. vectr_search returns the actual code chunk. The intended workflow is: locate first to know where to look, then search only when you need the code itself.

Problem

On a codebase with 40,000 files, the AI runs rg -l "authenticate", gets 200 results, reads 8 complete files — 12,000 tokens gone for one query. And the next session, it starts over from zero: no memory of what it found, no record of what's still missing.

Retrieve, Locate, Remember

Vectr replaces the grep-then-read loop with three knowledge layers. A structural map gives the AI a 300-token overview of the entire codebase at session start. A symbol graph answers "where is X defined?" without reading any code. A content index returns the exact function body only when needed.

And as discoveries happen, the AI saves working notes to Vectr. Next morning, one vectr_recall() brings it back to exactly where it stopped — key files, edge cases, what's still missing. The conversation is gone, but the knowledge survived the gap.

Architecture

Three Layers of Knowledge

Humans don't memorise codebases — they recall intelligently. Vectr applies the same principle to LLMs.

Layer 1

Codebase Map

The first time your AI editor calls vectr_map with no cached passport, Vectr returns raw metadata and the CLAUDE.md that vectr init wrote prompts the editor to write a 300-token structural summary and save it with vectr_map_save. After that, every session opens with the full picture — module purposes, tech stack, entry points, domain vocabulary — without reading a single file.

vectr_map

Layer 2

Symbol Graph

tree-sitter extracts every function, class, and method into a persistent call graph. Ask "where is EvaluateSegments defined?" and get targeting/segment/evaluator.go:45 — no code content, no tokens wasted.

vectr_locate vectr_trace

Layer 3

Content Search

AST-aware chunks embedded with ibm-granite/granite-embedding-english-r2 (local, no API key). Every symbol is indexed twice — a full-body vector plus a body-stripped purpose vector (signature + docstring) — so intent-shaped queries surface the function whose doc answers them. Adaptive hybrid search: vector + BM25 weights tuned per codebase fingerprint.

vectr_search

How It's Built

Map

On the first vectr_map call with no saved passport, Vectr returns raw directory metadata and prompts the AI editor to write a structural summary. The editor calls vectr_map_save to store it. Vectr makes no LLM calls at any point — the AI editor uses its own API access for this one step.

Parse

tree-sitter reads every file and extracts functions, classes, and methods — both as embeddable chunks and as a call graph with caller/callee relationships. Unsupported languages fall back to 200-line windows.

Embed

Each chunk is embedded with ibm-granite/granite-embedding-english-r2. Runs locally — no API key needed, downloaded once (~290MB) and cached. A file watcher re-embeds only changed files on save. Override with VECTR_EMBED_MODEL=<hf-model-id> for any sentence-transformers compatible model.

Serve

15 MCP tools over HTTP (localhost:8765/mcp) or a foreground stdio transport (vectr mcp-stdio). Map, locate, search, trace, remember, recall, promote, evict, snapshot, forget. Memory tools are live from process start — remember/recall never wait on model load.

Features

What Makes It Different

Session memory that persists

vectr_remember saves a working note — key files, edge cases found, what's still missing — to a SQLite store. Five kinds (directive, task, gotcha, finding, reference), each with its own injection semantics: directives fire at every session start, gotchas resurface when their anchored file is touched. Notes survive /compact and new sessions; vectr_recall retrieves them in <50ms, verbatim, with a [STALE] flag if referenced files changed since.

Bidirectional recall protocol

The AI saves findings; Vectr signals what it can recall instantly. vectr_evict_hint lists every retrieved chunk with its estimated token cost and a <50ms re-retrieval guarantee — the AI never needs to re-read those files.

Symbol graph, not just search

vectr_locate finds where a function is defined — file, line, kind — without returning any code content. vectr_trace follows the call graph: who calls this, what does this call. Navigate before you read.

300-token codebase passport

vectr_map returns a plain-English structural overview: module purposes, tech stack, entry points, domain vocabulary. One call at session start means the AI already knows where everything lives before asking a single question.

AST-aware chunking + hybrid search

tree-sitter splits code at function and class boundaries — never mid-logic. Adaptive hybrid search (vector + BM25, weights auto-tuned per codebase) finds verify_jwt_token when you ask about JWT validation, and surfaces exact symbol names for precise lookups.

Zero-config, zero cloud

Run vectr start. Vectr detects your git root, indexes the workspace, and writes config files for Claude Code, Cursor, and VS Code/Copilot automatically (Windsurf, Cline, and Continue: manual setup documented in the README). The embedding model runs locally — no API key required anywhere. See the README for the full list.

Per-memory triggers

vectr_remember accepts explicit triggers: path globs, lifecycle events (session-start, pre-edit, pre-commit, post-compaction), exact symbol references resolved against the code graph, semantic similarity to the current prompt, and temporal guards. Evaluated in the session-hook pipeline with fire ledgers and injection budgets — the right note resurfaces exactly when its condition matches, deterministically.

Team mode — one memory, many agents

vectr start --host binds a central instance; vectr connect --url points each teammate's — or each agent's — editor at it. Shared index, shared working memory with per-client attribution, constant-time API-key auth, and optional encryption at rest for notes and snapshots. A network-reachable bind refuses to start unauthenticated.

Proactive injection (opt-in)

vectr proxy runs a localhost API-shaped proxy that deterministically injects matched notes into the agent's request context — no reliance on the model choosing to call recall. Consent-gated, budgeted, deduplicated, and fail-open: an injection-path error never blocks the underlying request. Injection counts are fully observable in status.

Quickstart

Up in Under 15 Minutes

First run downloads the embedding model (~290MB). Restart your AI editor once after vectr start — the installed CLAUDE.md will guide it through the first-session passport setup automatically.

Option A — pip (recommended)

Individual developers. Runs the embedding model locally. No API key required.

Option B — Docker

Servers and CI pipelines. No Python environment needed — docker-compose up api and you're indexing.

# Option A — pip install
pip install vectr
cd /path/to/your/project
vectr start

# Option B — Docker
git clone https://github.com/swapnanil/vectr
docker-compose up api

# Either way: stop and restart on a different workspace
vectr restart --path /path/to/other/project

# Write CLAUDE.md + .mcp.json without starting the server
vectr init

# Team mode: point your editor at a shared central instance
vectr connect --url http://vectr.internal:8765 --api-key=<key>

# Foreground stdio MCP transport (no HTTP port)
vectr mcp-stdio

# Clear working memory after a large refactor
vectr forget --path .

# Search from the terminal
vectr search "JWT token validation"

Example

A Session With Memory

What an AI assistant's session looks like with Vectr — from cold start to end-of-day handoff.

Morning — session start (3 calls, ~5 seconds)

# 1. Structural overview — free, ~247 tokens
vectr_map()
→ "Go DSP ad server. targeting/ (audience matching),
   bidder/ (bid logic), tracker/ events.
   Entry: bidder/pipeline.go:RunBidPipeline"

# 2. Recall yesterday's notes
vectr_recall()
→ "[HIGH] Entry: EvaluateSegments() in
   targeting/segment/evaluator.go. Missing tests
   for nil visitor_id case."

# 3. Find the symbol — no code read yet
vectr_locate("EvaluateSegments")
→ [function] EvaluateSegments
   targeting/segment/evaluator.go:45

End of day — context handoff

# After implementing, before closing
vectr_remember(
  "Segment targeting done. EvaluateSegments in
   evaluator.go:45. Added nil guard at line 61.
   Still need: integration test for multi-segment
   visitor with expired segments.",
  tags=["segment-targeting", "wip"],
  priority="high"
)
→ "Stored note #4. Related code chunks can be
   re-retrieved in <50ms — safe to drop."

# Vectr lists what it can re-retrieve instantly
vectr_evict_hint()
→ "Vectr has 6 chunks (~3,840 tokens) indexed
   and re-retrievable in <50ms — safe to drop:
   targeting/segment/evaluator.go [lines 40-110]
   bidder/auction.go [lines 88-134]
   Recall latency: <50ms. Nothing will be lost."

Benchmarks

Measured on Real Codebases

Two-phase benchmark: Phase 1 explores and stores notes; Phase 2 opens cold, calls vectr_recall(), and implements. Vanilla Phase 2 re-reads from scratch.

Django

Familiar codebase

Mixed results — stated plainly. The −24% tokens, −60% cost figure is the single best task (custom_field, ORM internals) from an early run (run 1, pre-upgrade build). That run's total across all three Django tasks was net-negative — +30% tokens, +16% cost — pulled down by one strongly negative task. Well-known APIs where the model already has training coverage: no benefit. Vectr helps in proportion to how much re-discovery work Phase 2 would otherwise do.

Camel

5,856-file enterprise Java

−40% Phase 2 input tokens, −58% Phase 2 cost across 3 tasks. On custom_component, vanilla produced 0 bytes after 51 tool calls. Vectr produced a complete 5-file implementation. On route_policy: both working, but vectr was 3× cheaper and 2.4× faster.

Why

The mechanism

vectr_recall() at Phase 2 start returns structured notes in ~200 tokens, replacing hundreds of re-discovery tool calls. The AI picks up mid-thought: no files re-read, no symbols re-grepped. On Apache Camel, this dropped Phase 2 tool calls from 135 to 38.

When

When it matters most

Large unfamiliar codebases, cross-session continuation tasks, and implementation work following a research phase. Single-session tasks on well-known codebases see minimal benefit — the model's training data already covers those.

Task	Vanilla P2	Vectr P2	Cost Δ	Tools Δ	Output
custom_component	$0.56 · 134s · 51 tools	$0.36 · 195s · 11 tools	−35%	−78%	0 bytes (failure) vs 9,398 bytes (5 files)
route_policy	$1.15 · 430s · 59 tools	$0.35 · 177s · 16 tools	−70%	−73%	both 280-line impl
type_converter	$0.48 · 187s · 25 tools	$0.20 · 86s · 11 tools	−57%	−56%	both working
Totals (Camel)	$2.19 · 751s · 135 tools	$0.92 · 458s · 38 tools	−58%	−72%	−40% input tokens

Beyond the benchmarks: "Delivery, Not Storage" (arXiv:2607.20972) — a controlled study of agent memory under repeated context compaction, with full run archives published in the repository.

VECTR