Skip to content

AI Extract Document Node

AI/Processing

AI Extract Document

Extracts text and content from documents using AI for enhanced image descriptions and OCR.

ai_processing_extract_document_aiprocessingLong running
Inputs8
Outputs2
Security exposure5/10
Packageprocessing

Ratings

Scores range from 0 to 10. Higher values mean more impact, exposure, or operational weight.

SecurityAttack surface and exposure impact.
5/10Medium
PrivacyPotential sensitivity of processed data.
5/10Medium
PerformanceRuntime or resource pressure.
4/10Medium
GovernancePolicy, audit, or compliance impact.
5/10Medium
ReliabilityOperational stability considerations.
3/10High
CostExternal or compute cost impact.
6/10Medium

Input Pins

8

Input

Execution
exec_in

Execution trigger to start AI-powered document extraction.

File

Struct
file

Document file to extract (PDF, DOCX, XLSX, images, etc.).

FlowPathFlowPath3 fields
pathstringrequired
store_refstringrequired
cache_store_refstring | null
Schema enforced

Model

Struct
model

Vision-capable AI model for image analysis and OCR.

BitBit19 fields
idstring
default ""
typeBitTypes
enum "Llm", "Vlm", "Tts", "Stt"...default "Other"
metaMap<string, Metadata>
default {}
*Metadatamap value
namestringrequired
descriptionstringrequired
long_descriptionstring | null
release_notesstring | null
tagsArray<string>required
itemsstringarray item
+11 more fields
authorsArray<string>
default []
itemsstringarray item
repositorystring | null
default null
download_linkstring | null
default null
file_namestring | null
default null
hashstring
default ""
sizeinteger | null
format uint64default nullmin 0
hubstring
default ""
parametersvalue
default null
versionstring | null
default null
licensestring | null
default null
dependenciesArray<string>
default []
itemsstringarray item
dependency_tree_hashstring
default ""
createdstring
default ""
updatedstring
default ""
model_slugstring | null
default null
+1 more fields
Schema enforced

Extract Images

Boolean
extract_images

Whether to extract and embed images from the document.

Default true

Images Per Message

Integer
images_per_message

Number of images to batch per LLM request (higher = faster but may hit token limits).

Default 1

Pages Per Batch

Integer
pages_per_batch

Number of PDF pages to process in parallel (higher = faster but uses more memory).

Default 4

Temperature

Float
temperature

LLM temperature (0.0 = deterministic, 1.0 = creative). Lower is better for extraction.

Default 0.1

Max Tokens

Integer
max_tokens

Maximum output tokens per LLM call. Leave at 0 for model default. Set lower for unreliable models.

Default 0

Output Pins

2

Output

Execution
exec_out

Execution output after extraction completes.

Pages

Struct Array
pages

Extracted document pages with AI-generated descriptions and images.

DocumentPageDocumentPage3 fields
page_numberinteger:uint32required
format uint32min 0
contentstringrequired
imagesArray<NodeImage>required
itemsNodeImagearray item
image_refstringrequired
Schema enforced

Node Info

Internal name
ai_processing_extract_document_ai
Category
AI/Processing
Version
2