Input and Output

Vortex arrays support reading and writing to local and remote file systems, including plain-old HTTP, S3, Google Cloud Storage, and Azure Blob Storage.

open

VortexFile

read_url

Read a vortex struct array from a URL.

write

Write a vortex struct array to the local filesystem.


vortex.open(path)
class vortex.VortexFile
dtype

The dtype of the file.

scan(projection=None, *, expr=None, indices=None, batch_size=None)

Scan the Vortex file returning a vortex.ArrayIterator.

Parameters:
  • projection (vortex.Expr | None) – The projection expression to read, or else read all columns.

  • expr (vortex.Expr | None) – The predicate used to filter rows. The filter columns do not need to be in the projection.

  • indices (vortex.Array | None) – The indices of the rows to read. Must be sorted and non-null.

  • batch_size (int | None) – The number of rows to read per chunk.

Examples

Scan a file with a structured column and nulls at multiple levels and in multiple columns.

>>> import vortex as vx
>>> import vortex.expr as ve
>>> a = vx.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': None, 'age': 31},
...     {'name': 'Angela', 'age': None},
...     {'name': 'Mikhail', 'age': 57},
...     {'name': None, 'age': None},
... ])
>>> vx.io.write(a, "a.vortex")
>>> vxf = vx.open("a.vortex")
>>> vxf.scan().read_all().to_arrow_array()
<pyarrow.lib.StructArray object at ...>
-- is_valid: all not null
-- child 0 type: int64
  [
    25,
    31,
    null,
    57,
    null
  ]
-- child 1 type: string_view
  [
    "Joseph",
    null,
    "Angela",
    "Mikhail",
    null
  ]

Read just the age column:

>>> vxf.scan(['age']).read_all().to_arrow_array()
<pyarrow.lib.StructArray object at ...>
-- is_valid: all not null
-- child 0 type: int64
  [
    25,
    31,
    null,
    57,
    null
  ]

Keep rows with an age above 35. This will read O(N_KEPT) rows, when the file format allows.

>>> vxf.scan(expr=ve.column("age") > 35).read_all().to_arrow_array()
<pyarrow.lib.StructArray object at ...>
-- is_valid: all not null
-- child 0 type: int64
  [
    57
  ]
-- child 1 type: string_view
  [
    "Mikhail"
  ]
to_arrow(projection=None, *, expr=None, batch_size=None)

Scan the Vortex file as a pyarrow.RecordBatchReader.

to_dataset()

Scan the Vortex file using the pyarrow.dataset.Dataset API.

to_polars()

Read the Vortex file as a pl.LazyFrame, supporting column pruning and predicate pushdown.

vortex.io.read_url(url, *, projection=None, row_filter=None, indices=None)

Read a vortex struct array from a URL.

Parameters:
  • url (str) – The URL to read from.

  • projection (list [ str | int ]) – The columns to read identified either by their index or name.

  • row_filter (Expr) – Keep only the rows for which this expression evaluates to true.

Examples

Read an array from an HTTPS URL:

>>> import vortex as vx
>>> a = vx.io.read_url("https://example.com/dataset.vortex")  

Read an array from an S3 URL:

>>> a = vx.io.read_url("s3://bucket/path/to/dataset.vortex")  

Read an array from an Azure Blob File System URL:

>>> a = vx.io.read_url("abfss://my_file_system@my_account.dfs.core.windows.net/path/to/dataset.vortex")  

Read an array from an Azure Blob Stroage URL:

>>> a = vx.io.read_url("https://my_account.blob.core.windows.net/my_container/path/to/dataset.vortex")  

Read an array from a Google Stroage URL:

>>> a = vx.io.read_url("gs://bucket/path/to/dataset.vortex")  

Read an array from a local file URL:

>>> a = vx.io.read_url("file:/path/to/dataset.vortex")  
vortex.io.write(iter, path)

Write a vortex struct array to the local filesystem.

Parameters:
  • array (Array) – The array. Must be an array of structures.

  • f (str) – The file path.

Examples

Write the array a to the local file a.vortex.

>>> import vortex as vx
>>> a = vx.array([
...     {'x': 1},
...     {'x': 2},
...     {'x': 10},
...     {'x': 11},
...     {'x': None},
... ])
>>> vx.io.write(a, "a.vortex")