Usage (Spark SQL)

This section demonstrates how to query DazzleDuck-managed data from Spark SQL.

Creating a Temporary View

CREATE TEMP VIEW t (key STRING, value STRING, p INT)
USING io.dazzleduck.sql.spark.ArrowRPCTableProvider
OPTIONS (
  url 'jdbc:arrow-flight-sql://localhost:59307?disableCertificateVerification=true&user=admin&password=admin',
  path '/local-data/parquet/kv',
  partition_columns 'p',
  connection_timeout 'PT60M'
);

Querying the View

SELECT * FROM t;

Spark will:

Request partitions
Open parallel Arrow streams
Process data using Spark execution engine

Working with DuckLake

Install & Load Extension

INSTALL ducklake;
LOAD ducklake;

Attach Catalog

ATTACH 'ducklake:/warehouse/metadata'
AS my_catalog (DATA_PATH '/warehouse/data');

Create Spark View

CREATE TEMP VIEW t (key STRING, value STRING, partition INT)
USING io.dazzleduck.sql.spark.ArrowRPCTableProvider
OPTIONS (
  url 'jdbc:arrow-flight-sql://localhost:59307?useEncryption=false&user=admin&password=admin',
  database 'catalog_name',
  schema 'schema_name',
  table 'table_name',
  partition_columns 'partition'
);

Notes & Limitations

Read-only integration
Schema must match exactly
Unsupported Spark expressions will fail fast

Troubleshooting

Issue	Resolution
Connection refused	Check server + ports
TLS errors	Disable verification (dev only)
Timeout	Increase `connection_timeout`

Creating a Temporary View​

Querying the View​

Working with DuckLake​

Install & Load Extension​

Attach Catalog​

Create Spark View​

Notes & Limitations​

Troubleshooting​

Creating a Temporary View

Querying the View

Working with DuckLake

Install & Load Extension

Attach Catalog

Create Spark View

Notes & Limitations

Troubleshooting