Ingestion Utilities

Shared helpers for Arrow-based ingestion, partitioned writes, and Parquet output.

Overview

Ingestion utilities provide reusable components used by:

They coordinate Arrow readers, partition logic, transformations, and file output.

Component	Purpose
`ParquetIngestionQueue`	Buffered, batched Parquet writes
`BulkIngestQueue`	Time / size-based batching
`PostIngestionTask`	Metadata and registration hooks
`MappedReader`	Column transformation

Writes occur when:

Ingestion uses DuckDB's native COPY:

COPY (
  SELECT *, expr AS col
  FROM read_arrow([...])
  ORDER BY col
)
TO 'path'
(FORMAT parquet, PARTITION_BY(col), RETURN_FILES);