Schemas In-Depth
Schemas, data blocks, data points, data types, edges, collections, import/export, AI generation, JSON schema output.
A schema defines the structure of data that your workflows process. Schemas are reusable across multiple processes and serve as the blueprint for AI-powered data extraction (schematization).
Schema Structure
Schema
├── name: string
├── description: string
├── Data Blocks[]
│ ├── name: string
│ ├── description: string
│ ├── position: { x, y }
│ └── Data Points[]
│ ├── name: string
│ ├── type: string
│ ├── description: string
│ └── config: object
└── Edges[]
├── sourceDataBlockID
├── targetDataBlockID
├── sourceToTargetFunction (optional)
└── targetToSourceFunction (optional)Data Blocks
A data block is a named group of fields representing an entity, message, or document type.
Creating a Data Block
- Open a schema in the editor.
- Right-click the canvas or use the toolbar to add a data block.
- Set the name (used as a key in schematization output and template references).
- Set the description (fed to the LLM during extraction -- be specific).
Naming
Data block names become keys in the schematization output. If your data block is named Order Request, the output is:
{
"Order Request": {
"field1": "value1",
"field2": "value2"
}
}Downstream template references use this name: {{input["Schematize Node"]["Order Request"]["field1"]}}.
Data Points (Fields)
Each data block contains data points. A data point defines a single field.
Data Point Properties
| Property | Required | Description |
|---|---|---|
name | Yes | Field name (becomes JSON key in output) |
type | Yes | Data type (see table below) |
description | Yes | What this field contains -- this is read by the LLM during schematization, so be descriptive |
config | No | Type-specific configuration |
Data Types
| Type | JSON Output | Config Options | Example Value |
|---|---|---|---|
string | "text" | None | "John Smith" |
number | 123 or 45.67 | None | 5000 |
boolean | true/false | None | true |
date | "YYYY-MM-DD" | None | "2026-04-12" |
date-time | "ISO 8601" | None | "2026-04-12T14:30:00Z" |
time | "HH:MM:SS" | None | "14:30:00" |
enum | "value" | values: list of allowed values | "standard" |
datablock | {...} | datablockId: reference to another data block | nested object |
Enum Configuration
For enum-type data points, provide the allowed values in the config:
{
"values": ["standard", "rush", "emergency"]
}The LLM will constrain its extraction to one of these values.
Nested Data Blocks (datablock type)
A data point of type datablock references another data block in the same schema, creating a nested structure. Set the datablockId in the config to the target data block's ID.
Example: An Invoice data block with a line_items field of type datablock referencing a Line Item data block produces:
{
"Invoice": {
"invoice_number": "INV-001",
"line_items": [
{
"description": "Carbon Steel",
"quantity": 5000,
"unit_price": 0.45
}
]
}
}Schema Edges
Edges connect two data blocks within a schema. They define relationships and can carry coupling functions that transform data between related blocks.
Creating an Edge
- In the schema editor, drag from one data block's output port to another data block's input port.
- The edge appears as a connection line.
Edge Properties
| Property | Description |
|---|---|
sourceDataBlockID | The originating data block |
targetDataBlockID | The destination data block |
sourceToTargetFunction | Optional function that derives the target from the source |
targetToSourceFunction | Optional function that derives the source from the target |
How Edges Affect Schematization
When a schematization node extracts multiple data blocks from the same schema:
- Data blocks without edges (solo) are extracted independently in parallel.
- Data blocks connected by edges (coupled) are extracted as a group. The engine:
- Picks a root data block
- Extracts the root using the LLM
- Runs the coupling function on each edge to derive related blocks
- Continues until all coupled blocks are extracted
This ensures related data stays consistent (e.g., an invoice header and its line items).
JSON Schema Output
Each data block can be exported as a JSON Schema. This is used internally by the schematization node for structured LLM output.
API endpoint: GET /schemas/data-blocks/by-id/{dataBlockId}/json-schema
Example output for the Order Request block:
{
"type": "object",
"properties": {
"customer_name": {
"type": "string",
"description": "Name of the requesting customer"
},
"customer_email": {
"type": "string",
"description": "Email address of the customer"
},
"steel_type": {
"type": "string",
"description": "Type of steel requested (e.g., carbon, stainless, alloy)"
},
"weight_lbs": {
"type": "number",
"description": "Weight of steel requested in pounds"
},
"urgency": {
"type": "string",
"enum": ["standard", "rush", "emergency"],
"description": "Urgency level of the order"
}
},
"required": ["customer_name", "customer_email", "steel_type", "weight_lbs", "urgency"]
}AI Schema Generation
You can generate schemas automatically from:
From a Text Description
- Click Generate with AI in the schema editor.
- Describe your data in plain English:
I need a schema for tracking steel orders. Each order has a customer name, email, the type of steel (carbon, stainless, alloy, galvanized, tool), weight in lbs, and an urgency level (standard, rush, emergency).
- The AI creates data blocks and data points with appropriate types and descriptions.
From a PDF
- Click Generate from PDF.
- Upload a sample document (invoice, order form, report).
- The AI analyzes the document structure and creates a matching schema.
In both cases, review and adjust the generated schema before using it in processes.
Schema Validation (Linting)
The lint endpoint checks a schema for issues:
API endpoint: POST /schemas/by-id/{schemaId}/lint
Checks performed:
- Data blocks have at least one data point
- Data point names are unique within a block
- Enum types have at least one value defined
- Datablock references point to valid blocks
- Edge source/target blocks exist
Collections
Schemas can be organized into collections (folders). Collections support:
- Nesting (a collection can have a parent collection)
- Multiple membership (a schema can be in multiple collections)
- Descriptions for documentation
Managing Collections
| Operation | How |
|---|---|
| Create | Schemas page > New Collection |
| Add schema to collection | Drag schema onto collection, or use the API |
| Remove schema from collection | Right-click > Remove from Collection |
| Nest collections | Set parentCollectionID when creating |
Import and Export
Export
Export a schema as JSON for backup or sharing:
API endpoint: GET /schemas/by-id/{schemaId}/export
The export includes:
- Schema metadata (name, description)
- All data blocks with positions
- All data points with types and configs
- All edges with coupling function references
Import
Import a schema from JSON:
API endpoint: POST /schemas/bulk-import/{orgId}
Request body:
{
"name": "Steel Order",
"description": "Schema for steel ordering workflow",
"dataBlocks": [
{
"name": "Order Request",
"description": "Incoming steel order details",
"position": { "x": 100, "y": 200 },
"dataPoints": [
{
"name": "customer_name",
"type": "string",
"description": "Name of the requesting customer"
},
{
"name": "weight_lbs",
"type": "number",
"description": "Weight of steel in pounds"
}
]
}
],
"edges": []
}Using Schemas in Processes
Schemas are referenced in two node types:
Schematization Node
The schematization node uses a schema to extract structured data from unstructured input. You select:
- Which schema to use
- Which data blocks to extract
- An LLM model and prompt
The node outputs one key per selected data block, with field values extracted by the LLM.
API Input Node (Webhook)
The API input node can reference a schema to validate incoming webhook payloads against a specific data block's structure.