LongProbe¶

Sub-second RAG regression testing for production pipelines
Overview¶
"Did my last commit break retrieval?" โ now you know in seconds.
LongProbe is a sub-second RAG regression harness. Define your Golden Questions once, run longprobe check on every commit, and get an exact diff of which document chunks were lost in your latest change โ before your users notice.
Think pytest --watch for your RAG pipeline.
Why LongProbe?¶
Every RAG developer faces the same silent killer: you refactor chunking strategy, upgrade LangChain, or add a new document โ and your retrieval silently degrades. DeepEval and RAGChecker are heavyweight evaluation frameworks meant for batch analysis, not fast regression checks in a dev loop.
LongProbe gives you instant feedback:
- โก Sub-second checks on small golden sets
- ๐ Exact diffs showing which chunks were lost/gained
- ๐ Recall scores with per-question breakdown
- ๐พ Baseline tracking to catch regressions over time
- ๐งช pytest integration for existing test suites
- ๐ Pluggable adapters for any vector store
Quick Example¶
# Install
pip install longprobe
# Initialize
longprobe init
# Define your golden questions in goldens.yaml
# Configure your vector store in longprobe.yaml
# Run checks
longprobe check
# Save baseline
longprobe baseline save --label v1.0
# Compare after changes
longprobe diff --baseline v1.0
Part of the Long Suite¶
LongProbe is part of the EnDevSols Long Suite of RAG tools:
- LongParser - Document ingestion and chunking
- LongTrainer - RAG chatbot framework
- LongTracer - Hallucination detection
- LongProbe - Retrieval regression testing โ You are here
Together they cover the full RAG pipeline from ingestion to production monitoring.
Features¶
Core Capabilities¶
- โก Sub-second checks on small golden sets
- ๐ Golden Questions + Required Chunks defined in simple YAML
- ๐ Three match modes: exact ID, text substring, semantic similarity
- ๐ Recall Score with per-question breakdown
- ๐ Regression diff: exactly which chunks were lost/gained
- ๐พ SQLite baseline store: compare against any previous run
Developer Experience¶
- ๐งช pytest plugin: integrate into existing test suites
- ๐ฅ๏ธ Beautiful CLI with Rich tables, JSON, and GitHub Actions output
- ๐ Watch mode: auto re-run on file changes
- ๐๏ธ CI/CD ready: fails pipeline on regression
Integrations¶
- ๐ Pluggable adapters: LangChain, LlamaIndex, Chroma, Pinecone, Qdrant
- ๐ HTTP adapter: test any RAG API
- ๐ Python API: programmatic access to all features
Next Steps¶
-
Quick Start
Get up and running in 5 minutes
-
User Guide
Learn how to define golden questions and configure LongProbe
-
Demos
See LongProbe in action with live demos
-
API Reference
Detailed API documentation for Python integration
Community & Support¶
- GitHub Issues: Report bugs or request features
- Discussions: Ask questions and share ideas
- Contributing: Contribution guidelines
License¶
LongProbe is released under the MIT License.