Python Integrations

Getting Started

First, install if you haven’t already:

pip install vortex-array
uv add vortex-array

Construct a Vortex array from lists of simple Python values:

>>> import vortex as vx
>>> arr = vx.array([1, 2, 3, 4])
>>> arr.dtype
int(64, nullable=False)

Python’s None represents a missing or null value and changes the dtype of the array from non-nullable 64-bit integers to nullable 64-bit integers:

>>> arr = vx.array([1, 2, None, 4])
>>> arr.dtype
int(64, nullable=True)

A list of dict is converted to an array of structures. Missing values may appear at any level:

>>> arr = vx.array([
...   {'name': 'Joseph', 'age': 25},
...   {'name': None, 'age': 31},
...   {'name': 'Angela', 'age': None},
...   {'name': 'Mikhail', 'age': 57},
...   {'name': None, 'age': None},
...   None,
... ])
>>> arr.dtype
struct({"age": int(64, nullable=True), "name": utf8(nullable=True)}, nullable=True)

Array.to_pylist() converts a Vortex array into a list of Python values.

>>> arr.to_pylist()
[{'age': 25, 'name': 'Joseph'}, {'age': 31, 'name': None}, {'age': None, 'name': 'Angela'}, {'age': 57, 'name': 'Mikhail'}, {'age': None, 'name': None}, {'age': None, 'name': None}]

Arrow

The array() function constructs a Vortex array from an Arrow one without any copies:

>>> import pyarrow as pa
>>> arrow = pa.array([1, 2, None, 3])
>>> arrow.type
DataType(int64)
>>> arr = vx.array(arrow)
>>> arr.dtype
int(64, nullable=True)

Array.to_arrow_array() converts back to an Arrow array:

>>> arr.to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
1,
2,
null,
3
]

If you have a struct array, use Array.to_arrow_table() to construct an Arrow table:

>>> struct_arr = vx.array([
... {'name': 'Joseph', 'age': 25},
... {'name': 'Narendra', 'age': 31},
... {'name': 'Angela', 'age': 33},
... {'name': 'Mikhail', 'age': 57},
... ])
>>> struct_arr.to_arrow_table()
pyarrow.Table
age: int64
name: string_view
----
age: [[25,31,33,57]]
name: [["Joseph","Narendra","Angela","Mikhail"]]

Pandas

Array.to_pandas_df() converts a Vortex array into a Pandas DataFrame:

>>> df = struct_arr.to_pandas_df()
>>> df
      age      name
   0   25    Joseph
   1   31  Narendra
   2   33    Angela
   3   57   Mikhail

array() converts from a Pandas DataFrame into a Vortex array:

>>> vx.array(df).to_arrow_table()
pyarrow.Table
age: int64
name: string_view
----
age: [[25,31,33,57]]
name: [["Joseph","Narendra","Angela","Mikhail"]]

Query Engines

VortexDataset implements the pyarrow.dataset.Dataset API which enables many Python-based query engines to pushdown row filters and column projections on Vortex files. All the query engine examples use the same Vortex file:

>>> import vortex as vx
>>> import pyarrow.parquet as pq
>>> arr = vx.array(pq.read_table("_static/example.parquet"))
>>> vx.io.write_path(arr, 'example.vortex')
>>> ds = vx.dataset.from_path(
>>> ...     'example.vortex'
>>> ... )

Polars

>>> import polars as pl
>>> lf = pl.scan_pyarrow_dataset(ds)
>>> lf = lf.select('tip_amount', 'fare_amount')
>>> lf = lf.head(3)
>>> lf.collect()
shape: (3, 2)
┌────────────┬─────────────┐
│ tip_amount ┆ fare_amount │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞════════════╪═════════════╡
│ 0.0 ┆ 61.8 │
│ 5.1 ┆ 20.5 │
│ 16.54 ┆ 70.0 │
└────────────┴─────────────┘

DuckDB

>>> import duckdb
>>> duckdb.sql('select ds.tip_amount, ds.fare_amount from ds limit 3').show()
┌────────────┬─────────────┐
│ tip_amount │ fare_amount │
│ double │ double │
├────────────┼─────────────┤
│ 0.0 │ 61.8 │
│ 5.1 │ 20.5 │
│ 16.54 │ 70.0 │
└────────────┴─────────────┘