Concepts#

Vortex is a modular ecosystem for working with compressed columnar data: in-memory, on-disk, over-the-wire, and integrated with query engines.

Core Concepts#

DTypes are Vortex’s logical type system. Types like UTF8 describe what data means without dictating physical layout, allowing the same logical data to use different encodings.

Arrays are the in-memory representation. Unlike Arrow, Vortex arrays can be compressed—an integer array might be bit-packed rather than stored as a flat buffer. Arrays share the same representation on disk and over the wire, enabling zero-copy I/O.

Compute functions operate directly on compressed arrays where possible, dispatching to encoding-specific kernels or falling back to canonical implementations.

Storage & I/O#

Layouts organize arrays into larger-than-memory datasets (e.g., chunked row groups) and can read from any block storage: local disk, object stores, caches, etc.

File Format (.vortex files) serialize layouts to disk with efficient segment retrieval, FlatBuffer metadata for O(1) schema access, and support for memory mapping.

IPC Format provides streaming transfer of compressed arrays.

Integrations#

Language bindings: Rust, Python, Java, C, C++

Query engines: DataFusion, DuckDB, Spark, Polars, Ray