Skip to content

Summarization Strategies

Flow-Like provides a shared summarization engine used by both the Summarize node (raw text) and the Summarize Document node (document pages). This guide explains the available strategies, when to use each, and how to tune them.

Long text is split into chunks, processed by the chosen strategy, and optionally post-processed with Chain of Density compression. The pipeline:

Input Text/Pages
Chunking (Markdown-aware with overlap)
Entity Extraction (optional)
Strategy (Refine / MapReduce / Hierarchical / Hybrid / SlidingWindow)
Iteration (if output still exceeds chunk size)
Chain of Density (optional post-processing)
Final Summary
StrategyParallelismCoherenceStructure-AwareLLM CallsBest For
RefineNone★★★★★NoNNarratives, meeting notes
MapReduceFull★★★NoN + reduceSpeed-critical, large docs
HierarchicalPartial★★★★YesSections + mergesReports, papers with headings
HybridMap phase★★★★NoN + K refineBalance of speed & quality
SlidingWindowNone★★★★ (recent)NoN + compressionsVery long docs (100+ pages)

Processes chunks sequentially. Each step receives the accumulated summary so far plus the next chunk, producing a rolling summary.

Pros:

  • Best narrative coherence — the model always sees prior context
  • Simple, predictable behavior
  • Works well with small models

Cons:

  • No parallelism — wall-clock time scales linearly with chunk count
  • Later chunks may be under-represented as the summary grows
  • Single point of failure if one chunk produces a bad summary

When to use: Documents where order matters (chronological reports, meeting transcripts, legal documents).

Summarizes each chunk independently in parallel, then recursively merges the partial summaries.

Pros:

  • Fastest strategy — map phase is fully parallelizable
  • Scales well with concurrency setting
  • Each chunk gets equal attention

Cons:

  • Chunks are isolated during mapping — information spanning boundaries is lost
  • Reduce phase may distort relative importance
  • Requires more total LLM calls than Refine

When to use: Large documents with uniformly important content, speed-critical pipelines, high-concurrency environments.

Detects document structure (markdown headings) and builds a summary tree. Sections are summarized independently, then merged level by level.

Pros:

  • Respects the author’s original organization
  • Produces section-level summaries as a byproduct
  • Natural fit for technical documentation

Cons:

  • Requires detectable headings — falls back to balanced tree if none found
  • Deep hierarchies multiply LLM calls
  • Less effective for unstructured text

When to use: Technical reports, academic papers, documentation with clear heading structure.

Combines MapReduce (parallel map phase) with Refine (sequential polish over the map outputs).

Pros:

  • Captures MapReduce speed for initial processing
  • Refine pass restores narrative coherence
  • Good trade-off for most real-world documents

Cons:

  • Highest total LLM cost (map + refine)
  • More complex failure modes
  • Diminishing returns on very short documents

When to use: Large documents where you want both speed and a coherent final output.

Maintains a fixed-size memory buffer that is compressed whenever it exceeds a budget. Each new chunk is integrated into the buffer, and a final synthesis pass produces the output.

Pros:

  • Constant memory usage regardless of document length
  • Handles arbitrarily long documents
  • Good for streaming/real-time scenarios

Cons:

  • Early content may be aggressively compressed and lose detail
  • Recent chunks are over-represented in the final output
  • Memory budget tuning affects quality significantly

When to use: Very long documents (100+ pages), streaming ingestion, memory-constrained environments.

After the main strategy produces a summary, Chain of Density (CoD) iteratively refines it to increase information density while maintaining the same length. Each step identifies 1–3 missing entities and rewrites the summary to include them.

Based on research by Adams et al. (2023), step 3 produces summaries closest to human preference:

StepInformation DensityReadabilityRecommendation
1LowVery easyCasual overviews
2MediumEasyGeneral audiences
3OptimalBalancedDefault — most use cases
4HighModerateTechnical audiences
5Very highDenseMaximum compression

Tip: Skip CoD for small models — it requires nuanced revision ability that sub-14B models often lack.

When enabled, the engine extracts named entities (people, organizations, dates, technical terms) from a sample of chunks before summarization begins. These entities are injected as context into every LLM call.

Impact:

  • Adds 2–3 extra LLM calls for extraction
  • Significantly improves factual preservation, especially with MapReduce
  • Extracted entities are available in the output for downstream use

Best paired with: MapReduce and Hybrid (where chunks are processed independently and most likely to lose cross-chunk entities).

SettingDefaultDescription
Chunk Size8000 charsMaximum characters per chunk. Reduce for models with smaller context windows.
Chunk Overlap10%Percentage of overlap between adjacent chunks (0–50%). Higher values prevent boundary information loss but increase total chunks.

Rule of thumb: Set chunk size to roughly 60–70% of your model’s context window (in characters) to leave room for the system prompt and output.

Controls parallel requests for MapReduce and Hybrid strategies.

ValueBehavior
0Unlimited — all chunks processed at once
1Sequential (equivalent to disabling parallelism)
4Default — good balance for most API rate limits
  • Large models (GPT-4o, Claude Sonnet, Gemini Pro): All strategies work well. Use MapReduce/Hybrid for speed.
  • Medium models (GPT-4o-mini, Llama 70B): Refine or Hybrid recommended. CoD works at step 2–3.
  • Small models (7B–13B): Prefer Refine or SlidingWindow. Skip CoD and entity tracking.
  • No function calling required: All strategies use standard text completion — no tool use or structured output needed.
Is the document short (fits in one chunk)?
→ Any strategy works, skip CoD
Need maximum speed?
→ MapReduce (concurrency = 0)
Need best coherence?
→ Refine
Document has clear headings/sections?
→ Hierarchical
Very long document (100+ pages)?
→ SlidingWindow
Want both speed and quality?
→ Hybrid
Want maximum information density?
→ Any strategy + ChainOfDensity (step 3)
Working with small models?
→ Refine, skip CoD & entity tracking