Core Concepts

Schemas In-Depth

Schemas, data blocks, data points, data types, edges, collections, import/export, AI generation, JSON schema output.

A schema defines the structure of data that your workflows process. Schemas are reusable across multiple processes and serve as the blueprint for AI-powered data extraction (schematization).

Schema Structure

Schema
 ├── name: string
 ├── description: string
 ├── Data Blocks[]
 │    ├── name: string
 │    ├── description: string
 │    ├── position: { x, y }
 │    └── Data Points[]
 │         ├── name: string
 │         ├── type: string
 │         ├── description: string
 │         └── config: object
 └── Edges[]
      ├── sourceDataBlockID
      ├── targetDataBlockID
      ├── sourceToTargetFunction (optional)
      └── targetToSourceFunction (optional)

Data Blocks

A data block is a named group of fields representing an entity, message, or document type.

Creating a Data Block

Open a schema in the editor.
Right-click the canvas or use the toolbar to add a data block.
Set the name (used as a key in schematization output and template references).
Set the description (fed to the LLM during extraction -- be specific).

Naming

Data block names become keys in the schematization output. If your data block is named Order Request, the output is:

json

{
  "Order Request": {
    "field1": "value1",
    "field2": "value2"
  }
}

Downstream template references use this name: {{input["Schematize Node"]["Order Request"]["field1"]}}.

Data Points (Fields)

Each data block contains data points. A data point defines a single field.

Data Point Properties

Property	Required	Description
`name`	Yes	Field name (becomes JSON key in output)
`type`	Yes	Data type (see table below)
`description`	Yes	What this field contains -- this is read by the LLM during schematization, so be descriptive
`config`	No	Type-specific configuration

Data Types

Type	JSON Output	Config Options	Example Value
`string`	`"text"`	None	`"John Smith"`
`number`	`123` or `45.67`	None	`5000`
`boolean`	`true`/`false`	None	`true`
`date`	`"YYYY-MM-DD"`	None	`"2026-04-12"`
`date-time`	`"ISO 8601"`	None	`"2026-04-12T14:30:00Z"`
`time`	`"HH:MM:SS"`	None	`"14:30:00"`
`enum`	`"value"`	`values`: list of allowed values	`"standard"`
`datablock`	`{...}`	`datablockId`: reference to another data block	nested object

Enum Configuration

For enum-type data points, provide the allowed values in the config:

json

{
  "values": ["standard", "rush", "emergency"]
}

The LLM will constrain its extraction to one of these values.

Nested Data Blocks (datablock type)

A data point of type datablock references another data block in the same schema, creating a nested structure. Set the datablockId in the config to the target data block's ID.

Example: An Invoice data block with a line_items field of type datablock referencing a Line Item data block produces:

json

{
  "Invoice": {
    "invoice_number": "INV-001",
    "line_items": [
      {
        "description": "Carbon Steel",
        "quantity": 5000,
        "unit_price": 0.45
      }
    ]
  }
}

Schema Edges

Edges connect two data blocks within a schema. They define relationships and can carry coupling functions that transform data between related blocks.

Creating an Edge

In the schema editor, drag from one data block's output port to another data block's input port.
The edge appears as a connection line.

Edge Properties

Property	Description
`sourceDataBlockID`	The originating data block
`targetDataBlockID`	The destination data block
`sourceToTargetFunction`	Optional function that derives the target from the source
`targetToSourceFunction`	Optional function that derives the source from the target

How Edges Affect Schematization

When a schematization node extracts multiple data blocks from the same schema:

Data blocks without edges (solo) are extracted independently in parallel.
Data blocks connected by edges (coupled) are extracted as a group. The engine:
1. Picks a root data block
2. Extracts the root using the LLM
3. Runs the coupling function on each edge to derive related blocks
4. Continues until all coupled blocks are extracted

This ensures related data stays consistent (e.g., an invoice header and its line items).

JSON Schema Output

Each data block can be exported as a JSON Schema. This is used internally by the schematization node for structured LLM output.

API endpoint: GET /schemas/data-blocks/by-id/{dataBlockId}/json-schema

Example output for the Order Request block:

json

{
  "type": "object",
  "properties": {
    "customer_name": {
      "type": "string",
      "description": "Name of the requesting customer"
    },
    "customer_email": {
      "type": "string",
      "description": "Email address of the customer"
    },
    "steel_type": {
      "type": "string",
      "description": "Type of steel requested (e.g., carbon, stainless, alloy)"
    },
    "weight_lbs": {
      "type": "number",
      "description": "Weight of steel requested in pounds"
    },
    "urgency": {
      "type": "string",
      "enum": ["standard", "rush", "emergency"],
      "description": "Urgency level of the order"
    }
  },
  "required": ["customer_name", "customer_email", "steel_type", "weight_lbs", "urgency"]
}

AI Schema Generation

You can generate schemas automatically from:

From a Text Description

Click Generate with AI in the schema editor.

Describe your data in plain English:

I need a schema for tracking steel orders. Each order has a customer name,
email, the type of steel (carbon, stainless, alloy, galvanized, tool),
weight in lbs, and an urgency level (standard, rush, emergency).

The AI creates data blocks and data points with appropriate types and descriptions.

From a PDF

Click Generate from PDF.
Upload a sample document (invoice, order form, report).
The AI analyzes the document structure and creates a matching schema.

In both cases, review and adjust the generated schema before using it in processes.

Schema Validation (Linting)

The lint endpoint checks a schema for issues:

API endpoint: POST /schemas/by-id/{schemaId}/lint

Checks performed:

Data blocks have at least one data point
Data point names are unique within a block
Enum types have at least one value defined
Datablock references point to valid blocks
Edge source/target blocks exist

Collections

Schemas can be organized into collections (folders). Collections support:

Nesting (a collection can have a parent collection)
Multiple membership (a schema can be in multiple collections)
Descriptions for documentation

Managing Collections

Operation	How
Create	Schemas page > New Collection
Add schema to collection	Drag schema onto collection, or use the API
Remove schema from collection	Right-click > Remove from Collection
Nest collections	Set `parentCollectionID` when creating

Import and Export

Export

Export a schema as JSON for backup or sharing:

API endpoint: GET /schemas/by-id/{schemaId}/export

The export includes:

Schema metadata (name, description)
All data blocks with positions
All data points with types and configs
All edges with coupling function references

Import

Import a schema from JSON:

API endpoint: POST /schemas/bulk-import/{orgId}

Request body:

json

{
  "name": "Steel Order",
  "description": "Schema for steel ordering workflow",
  "dataBlocks": [
    {
      "name": "Order Request",
      "description": "Incoming steel order details",
      "position": { "x": 100, "y": 200 },
      "dataPoints": [
        {
          "name": "customer_name",
          "type": "string",
          "description": "Name of the requesting customer"
        },
        {
          "name": "weight_lbs",
          "type": "number",
          "description": "Weight of steel in pounds"
        }
      ]
    }
  ],
  "edges": []
}

Using Schemas in Processes

Schemas are referenced in two node types:

Schematization Node

The schematization node uses a schema to extract structured data from unstructured input. You select:

Which schema to use
Which data blocks to extract
An LLM model and prompt

The node outputs one key per selected data block, with field values extracted by the LLM.

API Input Node (Webhook)

The API input node can reference a schema to validate incoming webhook payloads against a specific data block's structure.