Stats Pruning#
Vortex uses statistics to prove when a filter cannot match a row group, zone, or
file. The proof expression returns true when the input can be skipped. It
returns false or null when pruning is not proven.
Both false and null are non-pruning outcomes, but they mean different
things. false means the available stats disproved the skip proof. null means
the proof was unknown, usually because a required stat was missing or inexact.
The pruning pipeline has two phases:
Expression::falsify(scope, session)asks the session’sStatsRewriteRules to rewrite a filter into an abstract proof expression. Rules describe semantics in terms ofvortex.stat(input, aggregate_fn)placeholders. These placeholders name the statistic needed by the proof, but not where that statistic is stored.bind_statslowers those abstract stat placeholders with aStatBinder. The binder maps stats to the representation used by the caller, such as zone-map table fields, file-level stat literals, or typed null literals for missing stats.
Missing stats lower to typed null literals. This preserves the three-valued
logic used by pruning: only a non-null true value proves that the scope can be
skipped. A missing stat therefore cannot accidentally prune data.
Binding Targets#
Zone maps bind stats to fields in their per-zone stats table. The lowered
expression is evaluated against that table and produces a mask where true
means the zone can be skipped.
File-level stats bind stats to literal values from the file footer. The lowered
expression is reduced and evaluated once for the full file. If it evaluates to
true, the file stats reader can return an all-false pruning mask without
reading child layouts.
For the layout model around these pruning points, see Layouts and Scanning.