Input

Load data from files or databases into your workflow.

Sockets

Socket	Direction	Description
`output`	Output	Loaded data as a LazyFrame

Supported Formats

CSV

Comma-separated values and other delimited text files.

Reading Modes:

Mode	Description	Best For
Scan	Lazy loading, streams data	Large files, memory efficiency
Read	Eager loading, loads entire file	Small files, when you need all data upfront

CSV Options:

Option	Default	Description
Source	(required)	Path to the CSV file
Separator	`,`	Field delimiter (comma, tab, pipe, etc.)
Has Header	`true`	First row contains column names
Quote Character	`"`	Character used to quote fields
Encoding	`utf-8`	File encoding
Skip Rows	`0`	Number of rows to skip at the start
Skip Rows After Header	`0`	Rows to skip after the header row
N Rows	(all)	Limit number of rows to read
Infer Schema Length	`100`	Rows to sample for type inference
Try Parse Dates	`false`	Attempt to parse date columns
Ignore Errors	`false`	Skip rows with parsing errors
Truncate Ragged Lines	`false`	Truncate lines that are too long instead of erroring
Comment Prefix	(none)	Character that marks comment lines to skip
Null Values	(none)	String(s) to interpret as null
Decimal Comma	`false`	Use comma as decimal separator (European format)
Low Memory	`false`	Reduce memory usage at cost of speed
Rechunk	`false`	Rechunk to contiguous memory after reading

Excel

Microsoft Excel spreadsheet files (.xlsx, .xls, .xlsb).

Reading Mode:

Excel files only support eager reading (the entire file is loaded into memory). This is a limitation of the Excel format - unlike CSV or Parquet, Excel files cannot be lazily streamed.

Excel Options:

Option	Default	Description
Source	(required)	Path to the Excel file
Sheet Number	`1`	Sheet to read (1 = first sheet, 0 = all sheets)
Sheet Name	(none)	Alternative: specify sheet by name
Has Header	`true`	First row contains column names
Infer Schema Length	`100`	Rows to sample for type inference
Drop Empty Rows	`true`	Omit completely empty rows
Drop Empty Columns	`true`	Omit empty columns with no headers
Error if Empty	`true`	Raise error if sheet contains no data

Sheet Selection

You can select a sheet either by number (1-indexed) or by name, but not both. If neither is specified, the first sheet is read.

Parquet

Columnar binary format with embedded schema.

Reading Modes:

Mode	Description	Best For
Scan	Lazy loading, predicate pushdown	Large files, memory efficiency
Read	Eager loading	Small files

Parquet Options:

Option	Default	Description
Source	(required)	Path to the Parquet file
N Rows	(all)	Limit number of rows to read
Columns	(all)	Specific columns to load
Use Statistics	`true`	Use file statistics for optimization
Parallel	`auto`	Parallel reading strategy
Low Memory	`false`	Reduce memory usage at cost of speed
Cache	`true`	Cache scan result (scan mode only)
Rechunk	`false`	Rechunk to contiguous memory
Retries	`2`	Number of retries on I/O errors

Advanced Parquet Options:

Option	Default	Description
Row Index Name	(none)	Add a row index column with this name
Row Index Offset	`0`	Start row index at this value
Hive Partitioning	`false`	Infer partition columns from directory structure
Parse Hive Dates	`true`	Try to parse date columns from Hive partitions

IPC / Arrow

Apache Arrow's native binary format (.arrow, .ipc, .feather). Designed for zero-copy data sharing and maximum performance.

Reading Modes:

Mode	Description	Best For
Scan	Lazy loading, memory-mapped	Large files, memory efficiency
Read	Eager loading	Small files

IPC Options:

Option	Default	Description
Source	(required)	Path to the Arrow file
N Rows	(all)	Limit number of rows to read
Memory Map	`true`	Memory-map the file for efficient access
Cache	`true`	Cache scan result (scan mode only)
Rechunk	`false` / `true`	Rechunk to contiguous memory (default differs by mode)
Retries	`2`	Number of retries on I/O errors

Advanced Options:

Option	Default	Description
Row Index Name	(none)	Add a row index column with this name
Row Index Offset	`0`	Start row index at this value
Hive Partitioning	`false`	Infer partition columns from directory structure
Parse Hive Dates	`true`	Try to parse date columns from Hive partitions

Performance

IPC/Arrow is the fastest format for read/write operations. Use it for intermediate files in your workflow or for sharing data between applications that support Apache Arrow (DuckDB, Spark, R, Julia, etc.).

Avro

Apache Avro binary format (.avro). A row-based format commonly used in big data ecosystems like Hadoop, Kafka, and Spark.

Reading Mode:

Avro files only support eager reading (the entire file is loaded into memory). Unlike IPC or Parquet, Avro does not support lazy scanning in Polars.

Avro Options:

Option	Default	Description
Source	(required)	Path to the Avro file
N Rows	(all)	Limit number of rows to read
Columns	(all)	Specific columns to load (by name or index)

When to Use Avro

Avro is ideal when:

Integrating with Hadoop, Kafka, or Spark pipelines
You need schema evolution support
Working with row-based data patterns
Interoperability with Java/JVM ecosystems is important

For pure performance within Sigilweaver Loom, consider IPC/Arrow or Parquet instead.

ODS

OpenDocument Spreadsheet (.ods) files from LibreOffice and OpenOffice. Similar to Excel but uses an open standard format.

Reading Mode:

ODS files only support eager reading (the entire file is loaded into memory). Polars does not support lazy scanning or writing for ODS format - it is input-only.

ODS Options:

Option	Default	Description
Source	(required)	Path to the ODS file
Sheet ID	1	Which sheet to read (1-indexed, 0 = all sheets)
Sheet Name	(none)	Alternative to Sheet ID - read by name
Has Header	true	First row contains column names
Columns	(all)	Specific columns to load (by name or index)
Infer Schema Length	100	Number of rows to scan for type inference
Drop Empty Rows	true	Remove completely empty rows
Drop Empty Cols	true	Remove completely empty columns
Raise If Empty	true	Error if the sheet is empty

When to Use ODS

ODS is ideal when:

Working with LibreOffice or OpenOffice spreadsheets
You need an open standard alternative to Excel
Sharing data between open source office suites
Compatibility with government or academic systems using open formats

Note: ODS is input-only - use Excel Output or convert to CSV/Parquet for saving data.

PostgreSQL

Query data directly from PostgreSQL databases.

Connection Modes:

Mode	Description	Best For
Client Connection	Credentials stored on your local machine via Electron's secure storage	Personal databases, development
Server Connection	Credentials managed by server administrator	Shared databases, team environments

PostgreSQL Options:

Option	Default	Description
Connection	(required)	Select a configured database connection
Query	(required)	SQL query to execute

Setting Up Connections:

Connections are managed in Settings > Connections. See Managing Connections for details on creating and using database connections.

Security

Credentials are never stored in workflow files. Whether using client or server connections:

Client connections: Only a connection ID is stored in the workflow
Server connections: The connection ID references server-managed credentials that are completely opaque to users

When you export Python code or view workflow JSON, credentials are stripped or represented only as references.

Lazy Evaluation

PostgreSQL queries are lazily evaluated like file sources. The query is executed when you preview data or run the workflow, not when you configure the tool.

MySQL

Query data directly from MySQL databases.

MySQL Options:

Option	Default	Description
Connection	(required)	Select a configured database connection
Query	(required)	SQL query to execute

Connection setup and security are the same as PostgreSQL. See Managing Connections for details.

SQLite

Query data from SQLite database files.

SQLite Options:

Option	Default	Description
Connection	(required)	Select a configured database connection
Query	(required)	SQL query to execute

SQLite Connections

SQLite connections point to a .db or .sqlite file on disk rather than a remote server. The file must be accessible from where Sigilweaver Loom Server is running.

MSSQL

Query data from Microsoft SQL Server databases.

MSSQL Options:

Option	Default	Description
Connection	(required)	Select a configured database connection
Query	(required)	SQL query to execute

Connection setup and security are the same as PostgreSQL. See Managing Connections for details.

JSON

JSON data files. Sigilweaver Loom supports two JSON formats, auto-detected by file extension:

Format	Extensions	Reading Mode	Description
Standard JSON	`.json`	Eager (Read)	Array of objects
NDJSON	`.jsonl`, `.ndjson`	Lazy (Scan)	Newline-delimited JSON

JSON Options (standard .json):

Option	Default	Description
Source	(required)	Path to the JSON file
Infer Schema Length	`100`	Rows to sample for type inference

NDJSON Options (.jsonl, .ndjson):

Option	Default	Description
Source	(required)	Path to the NDJSON file
Infer Schema Length	`100`	Rows to sample for type inference
N Rows	(all)	Limit number of rows to read
Batch Size	(default)	Processing batch size
Low Memory	`false`	Reduce memory usage at cost of speed
Rechunk	`false`	Rechunk to contiguous memory
Ignore Errors	`false`	Skip rows with parsing errors

NDJSON for Large Files

If your JSON data is large, use NDJSON format (.jsonl or .ndjson) for lazy streaming support. Standard .json files must be loaded entirely into memory.

YXDB

YXDB database files (.yxdb). Read using OpenYXDB, a high-performance Rust-based reader.

Reading Mode:

YXDB files are read eagerly (the entire file is loaded into memory). The row-based LZF compression used by the format prevents partial reads or predicate pushdown. The result is wrapped in a LazyFrame for workflow compatibility, so subsequent operations (filter, select, join) are still composed lazily.

YXDB Options:

Option	Default	Description
Source	(required)	Path to the YXDB file
N Rows	(all)	Limit number of rows to read

YXDB Support

Sigilweaver Loom supports all 17 YXDB field types including Bool, Int16/32/64, Float, Double, FixedDecimal, String/WString/VString/VWString, Date, Time, DateTime, Blob, and SpatialObj. Blob and SpatialObj fields are read as strings.

Configuration

Add an Input tool to the canvas
Select the data source type:
- File sources: CSV, Excel, Parquet, IPC/Arrow, Avro, ODS, JSON/NDJSON, or YXDB
- Database sources: PostgreSQL, MySQL, SQLite, or MSSQL
For file sources, choose the file using the file picker
For database sources, select a connection and write a SQL query
Configure format-specific options as needed

Examples

Loading a CSV

Drag Input tool to canvas
Select "CSV" as the data source
Click "Browse" and select your file
Adjust separator if not comma-delimited
Wire the output to downstream tools

Loading an Excel File

Drag Input tool to canvas
Select "Excel" as the data source
Click "Browse" and select your .xlsx, .xls, or .xlsb file
Select the sheet to read (by number or name)
Wire the output to downstream tools

Loading a Parquet

Drag Input tool to canvas
Select "Parquet" as the data source
Click "Browse" and select your file
Wire the output to downstream tools

Loading an IPC/Arrow File

Drag Input tool to canvas
Select "IPC / Arrow" as the data source
Click "Browse" and select your .arrow, .ipc, or .feather file
Wire the output to downstream tools

Loading an Avro File

Drag Input tool to canvas
Select "Avro" as the data source
Click "Browse" and select your .avro file
Optionally limit rows with "Max Rows" option
Wire the output to downstream tools

Loading an ODS File

Drag Input tool to canvas
Select "ODS" as the data source
Click "Browse" and select your .ods file
Select the sheet to read (by number or name)
Configure options like drop empty rows/columns
Wire the output to downstream tools

Querying a PostgreSQL Database

Drag Input tool to canvas
Select "Database" as the data source category
Select "PostgreSQL" as the database type
Choose an existing connection or create one via Settings > Connections
Enter your SQL query
Wire the output to downstream tools

Notes

Lazy loading (Scan mode) is recommended for CSV, Parquet, IPC/Arrow, and NDJSON when working with large files
Excel, Avro, ODS, standard JSON, and YXDB files are always loaded entirely into memory - consider converting large files to CSV, Parquet, or IPC for better performance
IPC/Arrow is the fastest format and ideal for intermediate data or sharing between Arrow-compatible applications
NDJSON (.jsonl, .ndjson) supports lazy scanning unlike standard .json files
YXDB is read using OpenYXDB's Rust core for high performance
Avro is best for interoperability with big data ecosystems (Hadoop, Kafka, Spark)
ODS is read-only in Sigilweaver Loom (Polars limitation) - use Excel Output or convert to other formats for saving
All database types (PostgreSQL, MySQL, SQLite, MSSQL) use the connectorx engine for high-performance reads
Database credentials are never stored in workflow files - only connection IDs are saved
Schema inference may not always be accurate - use Select to cast types if needed
File paths must be accessible from where Sigilweaver Loom is running

Sockets​

Supported Formats​

CSV​

Excel​

Parquet​

IPC / Arrow​

Avro​

ODS​

PostgreSQL​

MySQL​

SQLite​

MSSQL​

JSON​

YXDB​

Configuration​

Examples​

Loading a CSV​

Loading an Excel File​

Loading a Parquet​

Loading an IPC/Arrow File​

Loading an Avro File​

Loading an ODS File​

Querying a PostgreSQL Database​

Notes​

Sockets

Supported Formats

CSV

Excel

Parquet

IPC / Arrow

Avro

ODS

PostgreSQL

MySQL

SQLite

MSSQL

JSON

YXDB

Configuration

Examples

Loading a CSV

Loading an Excel File

Loading a Parquet

Loading an IPC/Arrow File

Loading an Avro File

Loading an ODS File

Querying a PostgreSQL Database

Notes