Input and Output¶
Vortex arrays support reading and writing to local and remote file systems, including plain-old HTTP, S3, Google Cloud Storage, and Azure Blob Storage.
Read a vortex struct array from a URL. |
|
Write a vortex struct array to the local filesystem. |
- vortex.open(path)¶
- class vortex.VortexFile¶
- dtype¶
The dtype of the file.
- scan(projection=None, *, expr=None, indices=None, batch_size=None)¶
Scan the Vortex file returning a
vortex.ArrayIterator
.- Parameters:
projection (
vortex.Expr
| None) – The projection expression to read, or else read all columns.expr (
vortex.Expr
| None) – The predicate used to filter rows. The filter columns do not need to be in the projection.indices (
vortex.Array
| None) – The indices of the rows to read. Must be sorted and non-null.batch_size (
int
| None) – The number of rows to read per chunk.
Examples
Scan a file with a structured column and nulls at multiple levels and in multiple columns.
>>> import vortex as vx >>> import vortex.expr as ve >>> a = vx.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': None, 'age': 31}, ... {'name': 'Angela', 'age': None}, ... {'name': 'Mikhail', 'age': 57}, ... {'name': None, 'age': None}, ... ]) >>> vx.io.write(a, "a.vortex") >>> vxf = vx.open("a.vortex") >>> vxf.scan().read_all().to_arrow_array() <pyarrow.lib.StructArray object at ...> -- is_valid: all not null -- child 0 type: int64 [ 25, 31, null, 57, null ] -- child 1 type: string_view [ "Joseph", null, "Angela", "Mikhail", null ]
Read just the age column:
>>> vxf.scan(['age']).read_all().to_arrow_array() <pyarrow.lib.StructArray object at ...> -- is_valid: all not null -- child 0 type: int64 [ 25, 31, null, 57, null ]
Keep rows with an age above 35. This will read O(N_KEPT) rows, when the file format allows.
>>> vxf.scan(expr=ve.column("age") > 35).read_all().to_arrow_array() <pyarrow.lib.StructArray object at ...> -- is_valid: all not null -- child 0 type: int64 [ 57 ] -- child 1 type: string_view [ "Mikhail" ]
- to_arrow(projection=None, *, expr=None, batch_size=None)¶
Scan the Vortex file as a
pyarrow.RecordBatchReader
.
- to_dataset()¶
Scan the Vortex file using the
pyarrow.dataset.Dataset
API.
- to_polars()¶
Read the Vortex file as a pl.LazyFrame, supporting column pruning and predicate pushdown.
- vortex.io.read_url(url, *, projection=None, row_filter=None, indices=None)¶
Read a vortex struct array from a URL.
- Parameters:
Examples
Read an array from an HTTPS URL:
>>> import vortex as vx >>> a = vx.io.read_url("https://example.com/dataset.vortex")
Read an array from an S3 URL:
>>> a = vx.io.read_url("s3://bucket/path/to/dataset.vortex")
Read an array from an Azure Blob File System URL:
>>> a = vx.io.read_url("abfss://my_file_system@my_account.dfs.core.windows.net/path/to/dataset.vortex")
Read an array from an Azure Blob Stroage URL:
>>> a = vx.io.read_url("https://my_account.blob.core.windows.net/my_container/path/to/dataset.vortex")
Read an array from a Google Stroage URL:
>>> a = vx.io.read_url("gs://bucket/path/to/dataset.vortex")
Read an array from a local file URL:
>>> a = vx.io.read_url("file:/path/to/dataset.vortex")
- vortex.io.write(iter, path)¶
Write a vortex struct array to the local filesystem.
Examples
Write the array a to the local file a.vortex.
>>> import vortex as vx >>> a = vx.array([ ... {'x': 1}, ... {'x': 2}, ... {'x': 10}, ... {'x': 11}, ... {'x': None}, ... ]) >>> vx.io.write(a, "a.vortex")