Chat Engine¶

The LongParser chat engine provides RAG-powered Q&A with a 3-layer memory system (recent turns, rolling summary, long-term facts).

Architecture¶

graph LR
    Q[Question] --> R[Retriever]
    R --> B[Budget Trimmer]
    B --> L[LLM Chain]
    L --> V[Citation Validator]
    V --> A[Answer]
    M1[Recent Turns] --> B
    M2[Rolling Summary] --> B
    M3[Long-term Facts] --> B

Session Management¶

# Create a session
POST /chat/sessions
{
  "job_id": "abc123"
}
# Returns: { "session_id": "...", "job_id": "..." }

# Get session history
GET /chat/sessions/{session_id}

Ask a Question¶

POST /chat
{
  "session_id": "sess_xyz",
  "job_id": "abc123",
  "question": "What are the key findings?",
  "config": {
    "llm_provider": "openai",
    "llm_model": "gpt-5.3",
    "top_k": 5
  }
}

Memory Layers¶

Layer	Description	Scope
Recent turns	Last N questions + answers	Short-term (trimmed to token budget)
Rolling summary	LLM-condensed conversation summary	Medium-term (grows with conversation)
Long-term facts	Extracted persistent facts	Long-term (across sessions)

Citation Validation¶

Every answer's cited_chunk_ids are validated against the retrieved set. IDs not present in the retrieval results are stripped automatically — preventing hallucinated citations:

# If LLM cites chunk-999 but only chunk-1, chunk-2 were retrieved:
# → chunk-999 is removed
# → if all citations stripped + no docs retrieved:
#   answer → "I don't have enough information..."

LLM Providers¶

Provider	Key
OpenAI	`OPENAI_API_KEY`
Google Gemini	`GEMINI_API_KEY`
Groq	`GROQ_API_KEY`
OpenRouter	`OPENROUTER_API_KEY`