Partition Pruning
Predicate-driven pruning for Hive and Delta Lake datasets.
Overview
Partition pruning analyzes SQL predicates and eliminates irrelevant partitions before query execution.
This dramatically reduces I/O and improves query latency.
Supported Layouts
- Hive-style partitions (
key=value/) - Delta Lake directories
How It Works
Parse SQL WHERE clause
↓
Extract partition predicates
↓
Resolve matching directories
↓
Exclude irrelevant partitions
↓
Execute reduced scan
Example
Query:
SELECT * FROM sales WHERE year = 2024 AND month = 1
Scans only:
/sales/year=2024/month=1/*
Tools
Hive
./mvnw exec:java -Dexec.mainClass="io.github.tanejagagan.sql.commons.hive.HivePartitionPruning"
Delta Lake
./mvnw exec:java -Dexec.mainClass="io.github.tanejagagan.sql.commons.delta.PartitionPruning"
Production Benefits
- Reduced disk I/O
- Faster queries
- Lower memory usage
- Scales to very large datasets