Skip to main content

Partition Pruning

Predicate-driven pruning for Hive and Delta Lake datasets.


Overview

Partition pruning analyzes SQL predicates and eliminates irrelevant partitions before query execution.

This dramatically reduces I/O and improves query latency.


Supported Layouts

  • Hive-style partitions (key=value/)
  • Delta Lake directories

How It Works

Parse SQL WHERE clause

Extract partition predicates

Resolve matching directories

Exclude irrelevant partitions

Execute reduced scan

Example

Query:

SELECT * FROM sales WHERE year = 2024 AND month = 1

Scans only:

/sales/year=2024/month=1/*

Tools

Hive

./mvnw exec:java -Dexec.mainClass="io.github.tanejagagan.sql.commons.hive.HivePartitionPruning"

Delta Lake

./mvnw exec:java -Dexec.mainClass="io.github.tanejagagan.sql.commons.delta.PartitionPruning"

Production Benefits

  • Reduced disk I/O
  • Faster queries
  • Lower memory usage
  • Scales to very large datasets