Skip to content

Configuration

LongParser is configured entirely via environment variables (no config files to manage).

Core Variables

Copy .env.example to .env and edit:

cp .env.example .env

Required Variables

Variable Description
LONGPARSER_API_KEY API key for the REST server
LONGPARSER_MONGO_URL MongoDB connection string
OPENAI_API_KEY For OpenAI LLM provider

Processing Options

Variable Default Description
LONGPARSER_FORMULA_MODE smart fast / smart / full
LONGPARSER_MAX_TOKENS 512 Max tokens per chunk
LONGPARSER_CHUNK_OVERLAP 64 Token overlap between chunks
LONGPARSER_UPLOAD_DIR ./uploads Upload directory

LLM Providers

Variable Description
LONGPARSER_LLM_PROVIDER openai / gemini / groq / openrouter
LONGPARSER_LLM_MODEL Model name (uses provider default if unset)
GEMINI_API_KEY For Google Gemini
GROQ_API_KEY For Groq

Vector Store

Variable Default Description
LONGPARSER_VECTOR_DB chroma chroma / faiss / qdrant
LONGPARSER_COLLECTION longparser Collection name
QDRANT_URL Qdrant server URL (if using Qdrant)

ProcessingConfig Defaults

When using the Python SDK directly, configure via ProcessingConfig:

from longparser import ProcessingConfig

config = ProcessingConfig(
    do_ocr=True,
    do_table_structure=True,
    formula_mode="smart",       # fast | smart | full
    formula_ocr=True,
    export_images=False,
    max_pages=None,             # None = all pages
)

ChunkingConfig Defaults

from longparser.schemas import ChunkingConfig

config = ChunkingConfig(
    max_tokens=512,
    overlap_tokens=64,
    detect_equations=True,
    exclude_headers_footers=True,
    generate_schema_chunks=True,    # table schema chunks
    table_chunk_format="row_record", # pipe | row_record
    wide_table_col_threshold=15,
)