# Python Integrations ## Getting Started First, install if you haven't already: ````{tab} pip ```bash pip install vortex-array ``` ```` ````{tab} uv ```bash uv add vortex-array ``` ```` Construct a Vortex array from lists of simple Python values: ```{doctest} pycon >>> import vortex as vx >>> arr = vx.array([1, 2, 3, 4]) >>> arr.dtype int(64, nullable=False) ``` Python's {obj}`None` represents a missing or null value and changes the dtype of the array from non-nullable 64-bit integers to nullable 64-bit integers: ```{doctest} pycon >>> arr = vx.array([1, 2, None, 4]) >>> arr.dtype int(64, nullable=True) ``` A list of {class}`dict` is converted to an array of structures. Missing values may appear at any level: ```{doctest} pycon >>> arr = vx.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': None, 'age': 31}, ... {'name': 'Angela', 'age': None}, ... {'name': 'Mikhail', 'age': 57}, ... {'name': None, 'age': None}, ... None, ... ]) >>> arr.dtype struct({"age": int(64, nullable=True), "name": utf8(nullable=True)}, nullable=True) ``` {meth}`.Array.to_pylist` converts a Vortex array into a list of Python values. ```{doctest} pycon >>> arr.to_pylist() [{'age': 25, 'name': 'Joseph'}, {'age': 31, 'name': None}, {'age': None, 'name': 'Angela'}, {'age': 57, 'name': 'Mikhail'}, {'age': None, 'name': None}, {'age': None, 'name': None}] ``` ## Arrow The {func}`~vortex.array` function constructs a Vortex array from an Arrow one without any copies: ```{doctest} pycon >>> import pyarrow as pa >>> arrow = pa.array([1, 2, None, 3]) >>> arrow.type DataType(int64) >>> arr = vx.array(arrow) >>> arr.dtype int(64, nullable=True) ``` {meth}`.Array.to_arrow_array` converts back to an Arrow array: ```{doctest} pycon >>> arr.to_arrow_array() [ 1, 2, null, 3 ] ``` If you have a struct array, use {meth}`.Array.to_arrow_table` to construct an Arrow table: ```{doctest} pycon >>> struct_arr = vx.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... {'name': 'Mikhail', 'age': 57}, ... ]) >>> struct_arr.to_arrow_table() pyarrow.Table age: int64 name: string_view ---- age: [[25,31,33,57]] name: [["Joseph","Narendra","Angela","Mikhail"]] ``` ## Pandas {meth}`.Array.to_pandas_df` converts a Vortex array into a Pandas DataFrame: ```{doctest} pycon >>> df = struct_arr.to_pandas_df() >>> df age name 0 25 Joseph 1 31 Narendra 2 33 Angela 3 57 Mikhail ``` {func}`~vortex.array` converts from a Pandas DataFrame into a Vortex array: ```pycon >>> vx.array(df).to_arrow_table() pyarrow.Table age: int64 name: string_view ---- age: [[25,31,33,57]] name: [["Joseph","Narendra","Angela","Mikhail"]] ``` ## Query Engines {class}`~vortex.dataset.VortexDataset` implements the {class}`pyarrow.dataset.Dataset` API which enables many Python-based query engines to pushdown row filters and column projections on Vortex files. All the query engine examples use the same Vortex file: ```pycon >>> import vortex as vx >>> import pyarrow.parquet as pq >>> arr = vx.array(pq.read_table("_static/example.parquet")) >>> vx.io.write_path(arr, 'example.vortex') >>> ds = vx.dataset.from_path( >>> ... 'example.vortex' >>> ... ) ``` ### Polars ```pycon >>> import polars as pl >>> lf = pl.scan_pyarrow_dataset(ds) >>> lf = lf.select('tip_amount', 'fare_amount') >>> lf = lf.head(3) >>> lf.collect() shape: (3, 2) ┌────────────┬─────────────┐ │ tip_amount ┆ fare_amount │ │ --- ┆ --- │ │ f64 ┆ f64 │ ╞════════════╪═════════════╡ │ 0.0 ┆ 61.8 │ │ 5.1 ┆ 20.5 │ │ 16.54 ┆ 70.0 │ └────────────┴─────────────┘ ``` ### DuckDB ```pycon >>> import duckdb >>> duckdb.sql('select ds.tip_amount, ds.fare_amount from ds limit 3').show() ┌────────────┬─────────────┐ │ tip_amount │ fare_amount │ │ double │ double │ ├────────────┼─────────────┤ │ 0.0 │ 61.8 │ │ 5.1 │ 20.5 │ │ 16.54 │ 70.0 │ └────────────┴─────────────┘ ```