Benchmarking#
Vortex has two categories of benchmarks: microbenchmarks for individual operations, and SQL benchmarks for end-to-end query performance.
Microbenchmarks#
Microbenchmarks use the Divan framework and live in benches/ directories within individual crates.
Run microbenchmarks for a specific crate with:
cargo bench -p <crate-name>
Best Practices#
Separate setup from profiled code#
Always use bencher.with_inputs(|| ...) so fixture construction is excluded from timing:
bencher
.with_inputs(|| bench_fixture()))
.bench_refs(|(array, indices)| {
array.take(indices.to_array()).unwrap()
});
Exclude Drop from measurements#
Divan measures only the closure body, not the Drop of its return value.
Structure your benchmark so that expensive drops happen via the return value or
via bench_refs inputs.
Return the value from the closure — Divan will drop it after timing stops:
bencher .with_inputs(|| make_big_vec()) .bench_values(|v| transform(v)) // drop of the result is NOT timed
Use
bench_refs— the input is dropped after the entire sample loop, not per-iteration:bencher .with_inputs(|| make_big_vec()) .bench_refs(|v| v.sort()) // v is dropped outside the timed region
Structure your benchmark so that expensive drops happen via the return value or via bench_refs inputs.
Black-box inputs to prevent compiler optimization#
The compiler can constant-fold or eliminate work if it can prove that inputs are known at compile time.
Values provided through with_inputs are automatically black-boxed by Divan — no action
needed:
// ✓ `array` and `indices` are automatically black-boxed by Divan
bencher
.with_inputs(|| (&prebuilt_array, &prebuilt_indices))
.bench_refs(|(array, indices)| array.take(indices.to_array()).unwrap());
Captured variables#
Variables captured from the surrounding scope are not black-boxed. Wrap them with
divan::black_box() or pass them through with_inputs instead:
let array = make_array();
// ✗ `array` is captured — the compiler may optimize based on its known contents
bencher.bench(|| process(&array));
// ✓ Option A: pass through with_inputs
bencher
.with_inputs(|| &array)
.bench_refs(|array| process(array));
// ✓ Option B: explicit black_box on the capture
bencher.bench(|| process(divan::black_box(&array)));
Return values and manual loops#
Return values are automatically black-boxed. You only need explicit
black_box for side-effect-free results inside manual loops:
bencher.with_inputs(|| &array).bench_refs(|array| {
for idx in 0..len {
divan::black_box(array.scalar_at(idx).unwrap());
}
});
Use deterministic, seeded RNG#
Always use StdRng::seed_from_u64(N) for reproducible data generation:
let mut rng = StdRng::seed_from_u64(0);
Parameterize with args, consts, and types#
Use Divan’s parameterization features and define parameter arrays as named constants:
const NUM_INDICES: &[usize] = &[1_000, 10_000, 100_000];
const VECTOR_SIZE: &[usize] = &[16, 256, 2048, 8192];
#[divan::bench(args = NUM_INDICES, consts = VECTOR_SIZE)]
fn my_bench<const N: usize>(bencher: Bencher, num_indices: usize) { ... }
Keep per-iteration execution time under ~1 ms#
Each individual iteration of the benchmarked closure should complete in less than 1ms. This is to keep benchmarks snappy, locally and on CI.
Gate CodSpeed-incompatible benchmarks#
Use #[cfg(not(codspeed))] for benchmarks that are incompatible with CodSpeed.
CodSpeed’s single-run model#
CI benchmarks run under CodSpeed’s CPU simulation, which executes each benchmark exactly once and estimates CPU cycles from the instruction trace — including cache and memory access costs. This has several implications:
sample_countandsample_sizehave no effect — CodSpeed always runs one iteration.Results are deterministic — the simulated cycle count is derived from the instruction trace, not wall-clock time, so there is no noise from system load or scheduling.
System calls are excluded — CodSpeed only measures user-space code. Benchmarks that rely on I/O or kernel interactions will not reflect those costs, so they should use the walltime instrument or be gated with
#[cfg(not(codspeed))].
Prefer mimalloc for throughput benchmarks#
Throughput benchmarks should use mimalloc as the global allocator to reduce system allocator
noise:
use mimalloc::MiMalloc;
#[global_allocator]
static GLOBAL: MiMalloc = MiMalloc;
SQL Benchmarks#
SQL benchmarks measure end-to-end query performance across different engines and file formats.
The vortex-bench crate provides a common Benchmark trait that each benchmark suite
implements, defining its queries, data generation, and expected results.
Available suites include TPC-H, TPC-DS, ClickBench, FineWeb, and others. Each suite can be run against multiple engines (DataFusion, DuckDB) and formats (Parquet, Vortex, Vortex Compact, Lance, DuckDB native).
Data Generation#
Before running SQL benchmarks, test data must be generated:
cargo run --release --bin data-gen -- <benchmark> --formats parquet,vortex
The data generator creates base Parquet data and converts it to each requested format. Scale
factors are configurable per suite (e.g. --opt scale-factor=10.0 for TPC-H SF=10).
Running SQL Benchmarks#
SQL benchmarks can be run directly via their per-engine binaries:
cargo run --release --bin datafusion-bench -- <benchmark>
cargo run --release --bin duckdb-bench -- <benchmark>
Orchestrator#
The bench-orchestrator is a Python CLI tool (vx-bench) that coordinates running SQL
benchmarks across multiple engines, stores results, and provides comparison tooling.
See bench-orchestrator/README.md for installation,
commands, and example workflows.
CI Benchmarks#
Benchmarks run automatically on all commits to develop and can be run on-demand for PRs:
Post-commit – compression, random access, and SQL benchmarks run on every commit to
develop, with results uploaded for historical tracking.PR benchmarks – triggered by the
action/benchmarklabel. Results are compared against the latestdeveloprun and posted as a PR comment.SQL benchmarks – triggered by the
action/benchmark-sqllabel. Runs a parametric matrix of suites, engines, formats, and storage backends (NVMe, S3).
All CI benchmarks run on dedicated instances with the release_debug profile and
-C target-cpu=native to produce representative numbers.
Results can be viewed at bench.vortex.dev.