Skip to main content

Usage (Spark SQL)

This section demonstrates how to query DazzleDuck-managed data from Spark SQL.


Creating a Temporary View

CREATE TEMP VIEW t (key STRING, value STRING, p INT)
USING io.dazzleduck.sql.spark.ArrowRPCTableProvider
OPTIONS (
url 'jdbc:arrow-flight-sql://localhost:59307?disableCertificateVerification=true&user=admin&password=admin',
path '/local-data/parquet/kv',
partition_columns 'p',
connection_timeout 'PT60M'
);

Querying the View

SELECT * FROM t;

Spark will:

  • Request partitions
  • Open parallel Arrow streams
  • Process data using Spark execution engine

Working with DuckLake

Install & Load Extension

INSTALL ducklake;
LOAD ducklake;

Attach Catalog

ATTACH 'ducklake:/warehouse/metadata'
AS my_catalog (DATA_PATH '/warehouse/data');

Create Spark View

CREATE TEMP VIEW t (key STRING, value STRING, partition INT)
USING io.dazzleduck.sql.spark.ArrowRPCTableProvider
OPTIONS (
url 'jdbc:arrow-flight-sql://localhost:59307?useEncryption=false&user=admin&password=admin',
database 'catalog_name',
schema 'schema_name',
table 'table_name',
partition_columns 'partition'
);

Notes & Limitations

  • Read-only integration
  • Schema must match exactly
  • Unsupported Spark expressions will fail fast

Troubleshooting

IssueResolution
Connection refusedCheck server + ports
TLS errorsDisable verification (dev only)
TimeoutIncrease connection_timeout