Skip to content

Extract Document Node

AI/Processing

Extract Document

Extracts text and content from documents (PDF, DOCX, XLSX, images, etc.) and converts to markdown.

ai_processing_extract_documentprocessing
Inputs3
Outputs2
Security exposure0/10
Packageprocessing

Ratings

Scores range from 0 to 10. Higher values mean more impact, exposure, or operational weight.

SecurityAttack surface and exposure impact.
0/10High
PrivacyPotential sensitivity of processed data.
0/10High
PerformanceRuntime or resource pressure.
2/10High
GovernancePolicy, audit, or compliance impact.
0/10High
ReliabilityOperational stability considerations.
2/10High
CostExternal or compute cost impact.
0/10High

Input Pins

3

Input

Execution
exec_in

Execution trigger to start document extraction.

File

Struct
file

Document file to extract (PDF, DOCX, XLSX, images, etc.).

FlowPathFlowPath3 fields
pathstringrequired
store_refstringrequired
cache_store_refstring | null
Schema enforced

Extract Images

Boolean
extract_images

Whether to extract and embed images from the document.

Default true

Output Pins

2

Output

Execution
exec_out

Execution output after extraction completes.

Pages

Struct Array
pages

Extracted document pages with content and images.

DocumentPageDocumentPage3 fields
page_numberinteger:uint32required
format uint32min 0
contentstringrequired
imagesArray<NodeImage>required
itemsNodeImagearray item
image_refstringrequired
Schema enforced

Node Info

Internal name
ai_processing_extract_document
Category
AI/Processing