Crate Architecture#
The Vortex workspace is organized as a Rust monorepo with four main groups: core crates, encodings, language bindings, and query engine integrations.
The vortex Crate#
The vortex crate is the main entry point for all external consumers. It re-exports core
functionality and bundles the standard set of encodings. All integrations and third-party
encodings should depend only on this crate – not on internal crates like vortex-array or
vortex-file directly.
This single-dependency design ensures:
Stable API surface for external consumers.
Freedom to refactor internal crate boundaries without breaking downstream code.
Consistent versioning across the ecosystem.
Third-party encodings implement their vtables against types re-exported from vortex, and
query engine integrations build on the file reading and scan APIs exposed through it.
Vortex Core#
The core crates provide the foundation for the Vortex type system, array representation, file format, and I/O.
Crate |
Role |
|---|---|
|
|
|
Zero-copy aligned |
|
|
|
Single-value representations of each dtype |
|
Bitmask operations for validity and selection |
|
Session object holding registries for encodings, layouts, and extension types |
|
|
|
Async I/O abstraction (local filesystem, object store, HTTP) |
|
Layout traits and built-in layouts (Flat, Struct, Chunked) |
|
IPC format for inter-process communication |
|
|
|
Table scan with filter and projection pushdown |
|
Expression representation and optimization |
|
FlatBuffer schema definitions |
Encodings#
Encodings live in separate crates under /encodings/. Each encoding implements the array vtable
and registers itself with the session. The standard encodings are bundled into the vortex crate.
Crate |
Technique |
|---|---|
|
Adaptive Lossless floating-Point compression |
|
FastLanes bit-packing, delta, and frame-of-reference |
|
Fast Static Symbol Table compression for strings |
|
Run-end encoding for repetitive data |
|
Sparse array encoding |
|
ZigZag encoding for signed integers |
|
Roaring bitmap encoding |
|
Dictionary encoding |
|
Byte-per-boolean encoding |
|
DateTime field decomposition |
|
Decimal byte decomposition |
|
Arithmetic sequence encoding |
Language Bindings#
Language bindings expose Vortex to non-Rust environments.
Directory |
Role |
|---|---|
|
Python bindings via PyO3 and Maturin |
|
Java JNI bindings |
|
C FFI bindings (generates |
|
C++ wrapper around the C FFI |
Integrations#
Query engine integrations allow Vortex files to be queried through existing analytics engines.
Crate / Directory |
Engine |
Notes |
|---|---|---|
|
DataFusion |
|
|
DuckDB |
Table function integration |
|
Spark |
DataSource V2 connector via JNI |
|
Trino |
Trino connector (in development) |
Other Crates#
Crate |
Role |
|---|---|
|
GPU-accelerated decompression and compute (Linux only) |
|
Terminal UI for inspecting Vortex files |
|
Benchmark harness and data generators |