AI Architecture · RAG

RAG
Readiness

Describe your data and use case. Get a complete, opinionated RAG architecture. Diagnose what's wrong with the one you already have. Estimate what it'll cost.

$ python main.py diagnose --interactive

[1/5] Vector database? Pinecone
[2/5] Chunking strategy? 512-token fixed
[3/5] Embedding model? ada-002
[4/5] Retrieval method? dense only
[5/5] Problems? (blank to finish)
> misses exact clause references
> hallucinates contract terms
>

Diagnosis — overall severity: CRITICAL
chunking_strategy: critical
  Fixed chunks split mid-clause
  → switch to hierarchical chunking

retrieval_method: high
  Dense-only misses exact terms
  → add BM25 hybrid + RRF merge

Quick fix today:
Enable 10% token overlap in
fixed chunks as immediate patch

Every RAG architecture blog post ends with "it depends." Teams spend weeks evaluating options with no framework, ship the wrong choices, then discover root causes six months later — after ingesting 50,000 documents, signing a cloud contract that violates GDPR, and watching precision tank on exact-term queries.

RAG Readiness pre-scores complexity from rules, then Claude returns one specific recommendation per component. If you already have a stack, diagnose it — root causes ordered by severity, one concrete fix each. If you need to iterate, every session persists so you can refine against new constraints. Cost estimation, eval dataset generation, and implementation bundles included.


Rules first.
Then Claude.

Complexity scoring before the LLM call means recommendations are calibrated to your actual constraints, not generic best practices.

01
Describe
Data types, volume, update frequency, compliance requirements, self-hosting preference, team ML experience — or share an existing broken stack for diagnosis
02
Score
Rule-based pre-scorer computes complexity 1–10 before any LLM call; conflict detection flags contradictory constraints
03
Recommend
Claude returns one specific choice per component with full reasoning — plus diagnosis severity, cost estimate, or eval questions depending on mode
04
Persist & Refine
Every session persists to SQLite. Refine against new constraints, track refinement history, generate implementation bundles when ready

Six modes.
One tool.

From blank-slate architecture to debugging a production system — all from the same CLI and API.

Architecture Recommendation
One primary choice per component — no "it depends." Weaviate or Pinecone. BM25 or dense. If GDPR applies, managed cloud options are eliminated before Claude is even called. The output is a decision, not a comparison table.
Architecture Diagnosis
Describe your existing broken stack and the problems you're seeing. Get a root-cause analysis per component with severity levels and one specific, actionable fix — not "improve your chunking" but a named strategy with exact parameters.
Multi-Use-Case Session
Run up to 5 parallel audits in a single request. Get cross-cutting insights: which components can be shared across use cases, where requirements conflict, and which use case to build first for the highest return.
Implementation Bundle
Generate a complete requirements.txt, docker-compose.yml, .env.example, and migration guide from any architecture recommendation. If you have an existing stack, get ordered migration steps with rollback notes.
Cost Estimation
Rule-based monthly cost breakdown per component — no LLM call needed. Lookup tables for vector DB tiers, embedding API pricing, reranker inference, and LLM costs. Includes optimization tips and a hosting model classification.
Eval Dataset Generation
Generate RAGAS-ready evaluation questions grounded in the actual use case and query patterns — not generic retrieval questions. Easy/medium/hard distribution, RAGAS metric mapping, annotation guide, and time estimate included.

Running in
three minutes.

Setup
git clone https://github.com/swapnanil/rag-readiness
cd rag-readiness
cp .env.example .env   # add your ANTHROPIC_API_KEY
docker-compose up api
# or: pip install -r requirements.txt && python api.py
New architecture audit
python main.py audit --interactive
python main.py audit --file examples/usecase_legal_contracts.json --with-cost
Diagnose existing broken stack
python main.py diagnose --interactive
python main.py diagnose --file examples/diagnosis_pinecone_fixed.json
Multi-use-case session + refinement
python main.py multi-audit examples/multi_usecase_lexvault.json
python main.py sessions
python main.py refine <session-id> --feedback "Qdrant was too heavy"
python main.py cost <session-id>
python main.py eval-dataset <session-id> --num-questions 20

Broken stack in.
Root causes out.

Input — diagnosis request
{
  "existing_architecture": {
    "vector_database": "Pinecone",
    "chunking_strategy":
      "512-token fixed",
    "embedding_model": "ada-002",
    "retrieval_method": "dense",
    "observed_problems": [
      "misses clause references",
      "hallucinates terms"
    ]
  }
}
Output — diagnosis result
overall_severity: critical

chunking_strategy critical
Fixed chunks split mid-clause in
long legal documents
Fix: parent-child hierarchical
chunking, 512-token child nodes

retrieval_method high
Dense-only misses exact terms like
dollar amounts in clauses
Fix: hybrid BM25+dense + RRF

quick_fix:
Enable 10% token overlap today

Five more tools.
Same standard.

Each tool is a standalone CLI + REST API solving a real enterprise problem with Claude.