Skip to content

Chat Engine

The LongParser chat engine provides RAG-powered Q&A with a 3-layer memory system (recent turns, rolling summary, long-term facts).

Architecture

graph LR
    Q[Question] --> R[Retriever]
    R --> B[Budget Trimmer]
    B --> L[LLM Chain]
    L --> V[Citation Validator]
    V --> A[Answer]
    M1[Recent Turns] --> B
    M2[Rolling Summary] --> B
    M3[Long-term Facts] --> B

Session Management

# Create a session
POST /chat/sessions
{
  "job_id": "abc123"
}
# Returns: { "session_id": "...", "job_id": "..." }

# Get session history
GET /chat/sessions/{session_id}

Ask a Question

POST /chat
{
  "session_id": "sess_xyz",
  "job_id": "abc123",
  "question": "What are the key findings?",
  "config": {
    "llm_provider": "openai",
    "llm_model": "gpt-5.3",
    "top_k": 5
  }
}

Memory Layers

Layer Description Scope
Recent turns Last N questions + answers Short-term (trimmed to token budget)
Rolling summary LLM-condensed conversation summary Medium-term (grows with conversation)
Long-term facts Extracted persistent facts Long-term (across sessions)

Citation Validation

Every answer's cited_chunk_ids are validated against the retrieved set. IDs not present in the retrieval results are stripped automatically — preventing hallucinated citations:

# If LLM cites chunk-999 but only chunk-1, chunk-2 were retrieved:
# → chunk-999 is removed
# → if all citations stripped + no docs retrieved:
#   answer → "I don't have enough information..."

LLM Providers

Provider Key
OpenAI OPENAI_API_KEY
Google Gemini GEMINI_API_KEY
Groq GROQ_API_KEY
OpenRouter OPENROUTER_API_KEY