Pandas#

Vortex in-memory arrays can be converted to and from Pandas DataFrames.

Reading a Vortex File into Pandas#

To read a Vortex file into a Pandas DataFrame, open the file, scan the data into memory, and convert:

>>> import vortex as vx
>>> import pyarrow.parquet as pq
>>> vx.io.write(pq.read_table("_static/example.parquet"), 'example.vortex')
>>>
>>> f = vx.open('example.vortex')
>>> df = f.scan().read_all().to_pandas()
>>> df[['tip_amount', 'fare_amount']].head(3)
   tip_amount  fare_amount
0         0.0         61.8
1         5.1         20.5
2        16.54         70.0

VortexFile.scan() returns an ArrayIterator that streams batches from disk. ArrayIterator.read_all() collects all batches into a single in-memory Array, and Array.to_pandas() converts it to a DataFrame.

Converting In-Memory Arrays#

Array.to_pandas() converts any struct-typed Vortex array into a Pandas DataFrame:

>>> struct_arr = vx.array([
... {'name': 'Joseph', 'age': 25},
... {'name': 'Narendra', 'age': 31},
... {'name': 'Angela', 'age': 33},
... {'name': 'Mikhail', 'age': 57},
... ])
>>> struct_arr.to_pandas()
      age      name
   0   25    Joseph
   1   31  Narendra
   2   33    Angela
   3   57   Mikhail

array() converts from a Pandas DataFrame into a Vortex array:

>>> import pandas as pd
>>> df = pd.DataFrame({'age': [25, 31, 33, 57], 'name': ['Joseph', 'Narendra', 'Angela', 'Mikhail']})
>>> vx.array(df).to_arrow_table()
pyarrow.Table
age: int64
name: string_view
----
age: [[25,31,33,57]]
name: [["Joseph","Narendra","Angela","Mikhail"]]