Summarization Strategies
Flow-Like provides a shared summarization engine used by both the Summarize node (raw text) and the Summarize Document node (document pages). This guide explains the available strategies, when to use each, and how to tune them.
How It Works
Section titled “How It Works”Long text is split into chunks, processed by the chosen strategy, and optionally post-processed with Chain of Density compression. The pipeline:
Input Text/Pages │ ▼ Chunking (Markdown-aware with overlap) │ ▼ Entity Extraction (optional) │ ▼ Strategy (Refine / MapReduce / Hierarchical / Hybrid / SlidingWindow) │ ▼ Iteration (if output still exceeds chunk size) │ ▼ Chain of Density (optional post-processing) │ ▼ Final SummaryStrategy Comparison
Section titled “Strategy Comparison”| Strategy | Parallelism | Coherence | Structure-Aware | LLM Calls | Best For |
|---|---|---|---|---|---|
| Refine | None | ★★★★★ | No | N | Narratives, meeting notes |
| MapReduce | Full | ★★★ | No | N + reduce | Speed-critical, large docs |
| Hierarchical | Partial | ★★★★ | Yes | Sections + merges | Reports, papers with headings |
| Hybrid | Map phase | ★★★★ | No | N + K refine | Balance of speed & quality |
| SlidingWindow | None | ★★★★ (recent) | No | N + compressions | Very long docs (100+ pages) |
Refine
Section titled “Refine”Processes chunks sequentially. Each step receives the accumulated summary so far plus the next chunk, producing a rolling summary.
Pros:
- Best narrative coherence — the model always sees prior context
- Simple, predictable behavior
- Works well with small models
Cons:
- No parallelism — wall-clock time scales linearly with chunk count
- Later chunks may be under-represented as the summary grows
- Single point of failure if one chunk produces a bad summary
When to use: Documents where order matters (chronological reports, meeting transcripts, legal documents).
MapReduce
Section titled “MapReduce”Summarizes each chunk independently in parallel, then recursively merges the partial summaries.
Pros:
- Fastest strategy — map phase is fully parallelizable
- Scales well with concurrency setting
- Each chunk gets equal attention
Cons:
- Chunks are isolated during mapping — information spanning boundaries is lost
- Reduce phase may distort relative importance
- Requires more total LLM calls than Refine
When to use: Large documents with uniformly important content, speed-critical pipelines, high-concurrency environments.
Hierarchical
Section titled “Hierarchical”Detects document structure (markdown headings) and builds a summary tree. Sections are summarized independently, then merged level by level.
Pros:
- Respects the author’s original organization
- Produces section-level summaries as a byproduct
- Natural fit for technical documentation
Cons:
- Requires detectable headings — falls back to balanced tree if none found
- Deep hierarchies multiply LLM calls
- Less effective for unstructured text
When to use: Technical reports, academic papers, documentation with clear heading structure.
Hybrid
Section titled “Hybrid”Combines MapReduce (parallel map phase) with Refine (sequential polish over the map outputs).
Pros:
- Captures MapReduce speed for initial processing
- Refine pass restores narrative coherence
- Good trade-off for most real-world documents
Cons:
- Highest total LLM cost (map + refine)
- More complex failure modes
- Diminishing returns on very short documents
When to use: Large documents where you want both speed and a coherent final output.
SlidingWindow
Section titled “SlidingWindow”Maintains a fixed-size memory buffer that is compressed whenever it exceeds a budget. Each new chunk is integrated into the buffer, and a final synthesis pass produces the output.
Pros:
- Constant memory usage regardless of document length
- Handles arbitrarily long documents
- Good for streaming/real-time scenarios
Cons:
- Early content may be aggressively compressed and lose detail
- Recent chunks are over-represented in the final output
- Memory budget tuning affects quality significantly
When to use: Very long documents (100+ pages), streaming ingestion, memory-constrained environments.
Chain of Density Post-Processing
Section titled “Chain of Density Post-Processing”After the main strategy produces a summary, Chain of Density (CoD) iteratively refines it to increase information density while maintaining the same length. Each step identifies 1–3 missing entities and rewrites the summary to include them.
Based on research by Adams et al. (2023), step 3 produces summaries closest to human preference:
| Step | Information Density | Readability | Recommendation |
|---|---|---|---|
| 1 | Low | Very easy | Casual overviews |
| 2 | Medium | Easy | General audiences |
| 3 | Optimal | Balanced | Default — most use cases |
| 4 | High | Moderate | Technical audiences |
| 5 | Very high | Dense | Maximum compression |
Tip: Skip CoD for small models — it requires nuanced revision ability that sub-14B models often lack.
Entity Tracking
Section titled “Entity Tracking”When enabled, the engine extracts named entities (people, organizations, dates, technical terms) from a sample of chunks before summarization begins. These entities are injected as context into every LLM call.
Impact:
- Adds 2–3 extra LLM calls for extraction
- Significantly improves factual preservation, especially with MapReduce
- Extracted entities are available in the output for downstream use
Best paired with: MapReduce and Hybrid (where chunks are processed independently and most likely to lose cross-chunk entities).
Configuration Guide
Section titled “Configuration Guide”Chunk Size & Overlap
Section titled “Chunk Size & Overlap”| Setting | Default | Description |
|---|---|---|
| Chunk Size | 8000 chars | Maximum characters per chunk. Reduce for models with smaller context windows. |
| Chunk Overlap | 10% | Percentage of overlap between adjacent chunks (0–50%). Higher values prevent boundary information loss but increase total chunks. |
Rule of thumb: Set chunk size to roughly 60–70% of your model’s context window (in characters) to leave room for the system prompt and output.
Concurrency
Section titled “Concurrency”Controls parallel requests for MapReduce and Hybrid strategies.
| Value | Behavior |
|---|---|
| 0 | Unlimited — all chunks processed at once |
| 1 | Sequential (equivalent to disabling parallelism) |
| 4 | Default — good balance for most API rate limits |
Model Selection
Section titled “Model Selection”- Large models (GPT-4o, Claude Sonnet, Gemini Pro): All strategies work well. Use MapReduce/Hybrid for speed.
- Medium models (GPT-4o-mini, Llama 70B): Refine or Hybrid recommended. CoD works at step 2–3.
- Small models (7B–13B): Prefer Refine or SlidingWindow. Skip CoD and entity tracking.
- No function calling required: All strategies use standard text completion — no tool use or structured output needed.
Quick Decision Tree
Section titled “Quick Decision Tree”Is the document short (fits in one chunk)? → Any strategy works, skip CoD
Need maximum speed? → MapReduce (concurrency = 0)
Need best coherence? → Refine
Document has clear headings/sections? → Hierarchical
Very long document (100+ pages)? → SlidingWindow
Want both speed and quality? → Hybrid
Want maximum information density? → Any strategy + ChainOfDensity (step 3)
Working with small models? → Refine, skip CoD & entity tracking