Skip to content

Hybrid RAG & NL-to-SQL Engine

Dual-mode enterprise Q&A - document RAG at 85%+ relevance + NL-to-SQL processing 500+ queries/day at 99.9% uptime

85%+Retrieval Relevance

Problem

Enterprise users needed to query both unstructured documents and structured databases using plain English. LLM-generated SQL is unreliable and dangerous without validation. Single-vector retrieval misses keyword matches. And any LLM dependency is a reliability risk without fallback.

Solution

  • Indexed 100K+ chunks in Aurora PostgreSQL with pgvector (1024-dim Titan embeddings) for semantic search.
  • Hybrid retrieval fuses pgvector semantic search with BM25 keyword search via Reciprocal Rank Fusion (RRF) — eliminates blind spots of single-mode retrieval.
  • LLM-based reranking pushes relevance to 85%+; RAGAS-aligned evaluation (MRR, correctness, groundedness, completeness) with golden test datasets validates quality continuously.
  • NL-to-SQL engine with circuit breaker and Sonnet-to-Haiku failover for 99.9% uptime.
  • Every generated SQL query parsed by sqlglot into an AST — structurally validated to prevent injection before execution.
  • PII detection and column-level access controls applied at the query layer; S3 audit logging on every query for compliance.

System Flow

Query

NL Input
Intent Classifier

Router

Query Analyzer
Search Type Router

RAG Branch

pgvector Search
BM25 + RRF

SQL Branch

Schema Mapper
sqlglot Validator

Validation

AST Validation
RAGAS Eval

Response

Answer Synth
Source Grounding

Architecture

  • 01pgvector (1024-dim Titan embeddings) + BM25 hybrid retrieval via RRF
  • 02LLM-based reranking for 85%+ relevance on golden test datasets
  • 03RAGAS evaluation: MRR, correctness, groundedness, completeness
  • 04S3 audit logging on every query for compliance tracking
  • 05Circuit breaker with Sonnet-to-Haiku failover for 99.9% uptime
  • 06sqlglot AST validation - SQL injection prevention on every query
  • 07PII detection + column-level access controls at query layer

Impact

  • 85%+ retrieval relevance across 100K+ indexed document chunks
  • 500+ NL-to-SQL queries processed daily at 99.9% uptime
  • Zero SQL injection incidents in production via sqlglot AST-level validation
  • S3 audit logging on every query for compliance

Tech Stack

pgvectorBM25RRFAWS BedrockClaudeTitansqlglotRAGASPostgreSQLAuroraSQL ServerPython