DazzleDuck SQL Logger

File-based log ingestion pipeline powered by Apache Arrow and DazzleDuck SQL Server.

Overview

DazzleDuck SQL Logger is a production-grade log ingestion pipeline that tails JSON log files from disk, converts them into Apache Arrow format, and ingests them into DazzleDuck SQL Server over HTTP for durable storage and analytics (Parquet).

Unlike traditional logging systems that write line-oriented text files, this module produces structured, columnar log data that is immediately analytics-ready.

What This Module Does

Watches a directory for log files (*.log)
Safely tails newly appended log lines
Parses one-JSON-object-per-line log records
Converts records into Apache Arrow batches
Sends Arrow streams to DazzleDuck HTTP ingestion (/v1/ingest)
Persists logs as Parquet in the warehouse

Core Components

Log Processing

LogFileTailReader Detects new files and incrementally tails appended lines without rereading old data.
LogTailToArrowProcessor Orchestrates tailing → JSON parsing → Arrow conversion → sending.
JsonToArrowConverter Converts validated JSON log records into Arrow vectors using a fixed schema.

Sending & Ingestion

HttpProducer Sends Arrow IPC streams to the DazzleDuck SQL Server HTTP ingestion endpoint. Handles batching, retries, backpressure, and JWT authentication.

Log Generation (Development & Testing)

SimpleLogGenerator — Static logs for unit tests
LogFileGenerator — Realistic rolling log files for end-to-end tests

Why Arrow-Based Log Ingestion?

Traditional logging is:

❌ Line-based ❌ Text-only ❌ Expensive to parse ❌ Poor for analytics

Arrow-based ingestion is:

✅ Columnar ✅ Typed & schema-driven ✅ Zero-copy friendly ✅ Analytics-ready ✅ Efficient at scale

When to Use This

Use DazzleDuck SQL Logger when you need:

File-based log ingestion
Real-time or near-real-time analytics
Arrow-native transport
Reliable end-to-end validation
Logs stored as Parquet for SQL analytics

Overview​

What This Module Does​

Core Components​

Log Processing​

Sending & Ingestion​

Log Generation (Development & Testing)​

Why Arrow-Based Log Ingestion?​

When to Use This​