AI Architecture · RAG

RAG
Readiness

Describe your data and use case. Get a complete, opinionated RAG architecture. Diagnose what's wrong with the one you already have. Estimate what it'll cost.

View on GitHub Read the Docs

$ python main.py diagnose --interactive

[1/5] Vector database? Pinecone
[2/5] Chunking strategy? 512-token fixed
[3/5] Embedding model? ada-002
[4/5] Retrieval method? dense only
[5/5] Problems? (blank to finish)
> misses exact clause references
> hallucinates contract terms
>

Diagnosis — overall severity: CRITICAL
chunking_strategy: critical
  Fixed chunks split mid-clause
  → switch to hierarchical chunking

retrieval_method: high
  Dense-only misses exact terms
  → add BM25 hybrid + RRF merge

Quick fix today:
Enable 10% token overlap in
fixed chunks as immediate patch

Problem

Every RAG architecture blog post ends with "it depends." Teams spend weeks evaluating options with no framework, ship the wrong choices, then discover root causes six months later — after ingesting 50,000 documents, signing a cloud contract that violates GDPR, and watching precision tank on exact-term queries.

RAG Readiness pre-scores complexity from rules, then the LLM returns one specific recommendation per component. If you already have a stack, diagnose it — root causes ordered by severity, one concrete fix each. If you need to iterate, every session persists so you can refine against new constraints. Cost estimation, eval dataset generation, and implementation bundles included.

Architecture

Rules first.
Then the LLM.

Complexity scoring before the LLM call means recommendations are calibrated to your actual constraints, not generic best practices.

Describe

Data types, volume, update frequency, compliance requirements, self-hosting preference, team ML experience — or share an existing broken stack for diagnosis

Score

Rule-based pre-scorer computes complexity 1–10 before any LLM call; conflict detection flags contradictory constraints

Recommend

Returns one specific choice per component with full reasoning — plus diagnosis severity, cost estimate, or eval questions depending on mode

Persist & Refine

Every session persists to SQLite. Refine against new constraints, track refinement history, generate implementation bundles when ready

Features

Six modes.
One tool.

From blank-slate architecture to debugging a production system — all from the same CLI and API.

Architecture Recommendation

One primary choice per component — no "it depends." Weaviate or Pinecone. BM25 or dense. If GDPR applies, managed cloud options are eliminated before the LLM is even called. The output is a decision, not a comparison table.

Architecture Diagnosis

Describe your existing broken stack and the problems you're seeing. Get a root-cause analysis per component with severity levels and one specific, actionable fix — not "improve your chunking" but a named strategy with exact parameters.

Multi-Use-Case Session

Run up to 5 parallel audits in a single request. Get cross-cutting insights: which components can be shared across use cases, where requirements conflict, and which use case to build first for the highest return.

Implementation Bundle

Generate a complete requirements.txt, docker-compose.yml, .env.example, and migration guide from any architecture recommendation. If you have an existing stack, get ordered migration steps with rollback notes.

Cost Estimation

Rule-based monthly cost breakdown per component — no LLM call needed. Lookup tables for vector DB tiers, embedding API pricing, reranker inference, and LLM costs. Includes optimization tips and a hosting model classification.

Eval Dataset Generation

Generate RAGAS-ready evaluation questions grounded in the actual use case and query patterns — not generic retrieval questions. Easy/medium/hard distribution, RAGAS metric mapping, annotation guide, and time estimate included.

Quickstart

Running in
three minutes.

Setup

git clone https://github.com/swapnanil/rag-readiness
cd rag-readiness
cp .env.example .env   # add your ANTHROPIC_API_KEY
docker-compose up api
# or: pip install -r requirements.txt && python api.py

New architecture audit

python main.py audit --interactive
python main.py audit --file examples/usecase_legal_contracts.json --with-cost

Diagnose existing broken stack

python main.py diagnose --interactive
python main.py diagnose --file examples/diagnosis_pinecone_fixed.json

Multi-use-case session + refinement

python main.py multi-audit examples/multi_usecase_lexvault.json
python main.py sessions
python main.py refine <session-id> --feedback "Qdrant was too heavy"
python main.py cost <session-id>
python main.py eval-dataset <session-id> --num-questions 20

Example

Broken stack in.
Root causes out.

Input — diagnosis request

{
  "existing_architecture": {
    "vector_database": "Pinecone",
    "chunking_strategy":
      "512-token fixed",
    "embedding_model": "ada-002",
    "retrieval_method": "dense",
    "observed_problems": [
      "misses clause references",
      "hallucinates terms"
    ]
  }
}

Output — diagnosis result

overall_severity: critical

chunking_strategy critical
Fixed chunks split mid-clause in
long legal documents
Fix: parent-child hierarchical
chunking, 512-token child nodes

retrieval_method high
Dense-only misses exact terms like
dollar amounts in clauses
Fix: hybrid BM25+dense + RRF

quick_fix:
Enable 10% token overlap today

RAGReadiness

Rules first.Then the LLM.

Six modes.One tool.

Running inthree minutes.

Broken stack in.Root causes out.

Five more tools.Same standard.

RAG
Readiness

Rules first.
Then the LLM.

Six modes.
One tool.

Running in
three minutes.

Broken stack in.
Root causes out.

Five more tools.
Same standard.