Chunkers Reference¶
HybridChunker¶
The main chunking engine combining 6 strategies for RAG-optimized output.
from longparser.chunkers import HybridChunker
from longparser.schemas import ChunkingConfig
chunker = HybridChunker(config=ChunkingConfig())
chunks = chunker.chunk(blocks)
Constructor¶
If config is None, uses default ChunkingConfig().
Methods¶
chunk(blocks)¶
Run the full 6-strategy pipeline on a list of Block objects.
Steps:
- Strategy 0 — Autonomous equation detection pre-pass
- Filter — Remove header/footer blocks and separator-only blocks
- Strategy 1 — Group blocks by
hierarchy_path(sections) - Per section:
- Strategy 4 — Table-aware chunking (schema + data chunks)
- Strategy 5 — List-aware chunking (group bullet lists)
- Strategy 3 — Token-window packing for remaining blocks
- Merge small chunks below
min_chunk_tokens - Overlap — Add token overlap between consecutive chunks
Configuration Reference¶
| Config | Default | Description |
|---|---|---|
max_tokens |
512 |
Hard ceiling per chunk |
overlap_tokens |
64 |
Overlap between chunks |
detect_equations |
True |
Run equation detection pre-pass |
exclude_headers_footers |
True |
Remove page headers/footers |
generate_schema_chunks |
True |
Add schema chunk per table |
table_chunk_format |
"row_record" |
pipe or row_record |
wide_table_col_threshold |
15 |
Split columns into bands above this |
min_chunk_tokens |
20 |
Merge chunks smaller than this |