Vortex Python#

Warning

The Python API surface is not yet complete and is subject to change. Many operations available in the Rust API are not yet exposed. See the Python API for the full reference.

Installation#

pip install vortex-data
uv add vortex-data

Creating Arrays#

array() constructs a Vortex array from Python values:

>>> import vortex as vx
>>> arr = vx.array([1, 2, 3, 4])
>>> arr.dtype
int(64, nullable=False)
>>> len(arr)
4

Python’s None represents a missing value and makes the dtype nullable:

>>> arr = vx.array([1, 2, None, 4])
>>> arr.dtype
int(64, nullable=True)

A list of dict produces a struct array. Missing values may appear at any level:

>>> arr = vx.array([
...   {'name': 'Joseph', 'age': 25},
...   {'name': None, 'age': 31},
...   None,
... ])
>>> arr.dtype
struct({"age": int(64, nullable=True), "name": utf8(nullable=True)}, nullable=True)

array() also accepts pyarrow.Array, pyarrow.Table, pandas.DataFrame, and range objects.

DTypes#

DType factory functions are available at the top level of the vortex module:

>>> vx.int_(32)
int(32, nullable=False)
>>> vx.utf8(nullable=True)
utf8(nullable=True)
>>> vx.list_(vx.float_(64))
list(float(64, nullable=False), nullable=False)
>>> vx.struct({'x': vx.int_(32), 'y': vx.int_(32)})
struct({"x": int(32, nullable=False), "y": int(32, nullable=False)}, nullable=False)

Available types: null(), bool_(), int_(), uint(), float_(), decimal(), utf8(), binary(), struct(), list_(), fixed_size_list(), date(), time(), timestamp().

Array Operations#

Element Access#

>>> arr = vx.array([10, 20, 30, 40, 50])
>>> arr.scalar_at(0).as_py()
10
>>> arr.to_arrow_array().to_pylist()
[10, 20, 30, 40, 50]

Slicing and Selection#

>>> arr.slice(1, 3).to_arrow_array().to_pylist()
[20, 30]
>>> indices = vx.array([0, 2, 4])
>>> arr.take(indices).to_arrow_array().to_pylist()
[10, 30, 50]

Filtering#

>>> mask = vx.array([True, False, True, False, True])
>>> arr.filter(mask).to_arrow_array().to_pylist()
[10, 30, 50]

Comparisons#

>>> other = vx.array([10, 25, 25, 45, 50])
>>> (arr > other).to_arrow_array().to_pylist()
[False, False, True, False, False]

Expressions#

The vortex.expr module provides expressions for filtering and projecting. These are primarily used with VortexFile.scan() and VortexFile.to_arrow() but can also be evaluated directly:

>>> import vortex.expr as ve
>>> arr = vx.array([
...     {'name': 'Alice', 'age': 30},
...     {'name': 'Bob', 'age': 25},
...     {'name': 'Carol', 'age': 35},
... ])
>>> expr = ve.column('age') > 28
>>> expr.evaluate(arr).to_arrow_array().to_pylist()
[True, False, True]

VortexFile#

open() lazily opens a Vortex file for reading:

>>> import pyarrow.parquet as pq
>>> vx.io.write(pq.read_table("_static/example.parquet"), 'example.vortex')
>>>
>>> f = vx.open('example.vortex')
>>> len(f)
1000

Use VortexFile.scan() to read data with optional projection, filtering, and limit:

>>> result = f.scan(['tip_amount'], limit=3).read_all()
>>> result.to_arrow_array()
<pyarrow.lib.StructArray object at ...>
-- is_valid: all not null
-- child 0 type: double
  [
    0,
    5.1,
    16.54
  ]

ArrayIterator#

ArrayIterator streams batches of arrays from a scan or other source. It supports iteration, collecting into a single array, and conversion to Arrow.

ArrayIterator.read_all() collects all batches into a single in-memory Array:

>>> arr = f.scan(['tip_amount'], limit=5).read_all()
>>> len(arr)
5

ArrayIterator.to_arrow() converts to a pyarrow.RecordBatchReader for use with Arrow-based tools:

>>> reader = f.scan(['tip_amount']).to_arrow()
>>> reader.schema
tip_amount: double
>>> table = reader.read_all()
>>> len(table)
1000

Conversion#

Arrays convert to other formats: