Arrays¶
The base class for all Vortex arrays is vortex.Array
.
This class holds the tree of array definitions and buffers that make up the array and can be passed into compute
functions, serialized, and otherwise manipulated as a generic array.
There are two ways of “downcasting” an array for more specific access patterns:
Into an encoding-specific array, like
vortex.FastLanesBitPackedArray
.vortex.Into a type-specific array, like
vortex.BoolTypeArray
.
- Be careful to note that
vortex.BoolArray
represents an array that stores physical data as a bit-buffer of booleans, vs
vortex.BoolTypeArray
which represents any array that has a logical type of boolean.
Factory Functions¶
- vortex.array(obj: Array | list | Any) Array ¶
The main entry point for creating Vortex arrays from other Python objects.
This function is also available as
vortex.array
.- Parameters:
obj (
pyarrow.Array
,list
,pandas.DataFrame
) – The elements of this array or list become the elements of the Vortex array.- Return type:
Examples
A Vortex array containing the first three integers:
>>> vortex.array([1, 2, 3]).to_arrow_array() <pyarrow.lib.Int64Array object at ...> [ 1, 2, 3 ]
The same Vortex array with a null value in the third position:
>>> vortex.array([1, 2, None, 3]).to_arrow_array() <pyarrow.lib.Int64Array object at ...> [ 1, 2, null, 3 ]
Initialize a Vortex array from an Arrow array:
>>> arrow = pyarrow.array(['Hello', 'it', 'is', 'me'], type=pyarrow.string_view()) >>> vortex.array(arrow).to_arrow_array() <pyarrow.lib.StringViewArray object at ...> [ "Hello", "it", "is", "me" ]
Initialize a Vortex array from a Pandas dataframe:
>>> import pandas as pd >>> df = pd.DataFrame({ ... "Name": ["Braund", "Allen", "Bonnell"], ... "Age": [22, 35, 58], ... }) >>> vortex.array(df).to_arrow_array() <pyarrow.lib.ChunkedArray object at ...> [ -- is_valid: all not null -- child 0 type: string_view [ "Braund", "Allen", "Bonnell" ] -- child 1 type: int64 [ 22, 35, 58 ] ]
Base Class¶
- class vortex.Array¶
An array of zero or more rows each with the same set of columns.
Examples
Arrays support all the standard comparison operations:
>>> import vortex as vx >>> a = vx.array(['dog', None, 'cat', 'mouse', 'fish']) >>> b = vx.array(['doug', 'jennifer', 'casper', 'mouse', 'faust']) >>> (a < b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ true, null, false, false, false ] >>> (a <= b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ true, null, false, true, false ] >>> (a == b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ false, null, false, true, false ] >>> (a != b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ true, null, true, false, true ] >>> (a >= b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ false, null, true, true, true ] >>> (a > b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ false, null, true, false, true ]
- __len__()¶
Return len(self).
- dtype¶
Returns the data type of this array.
- Return type:
Examples
By default,
vortex.array()
uses the largest available bit-width:>>> import vortex as vx >>> vx.array([1, 2, 3]).dtype int(64, nullable=False)
Including a
None
forces a nullable type:>>> vx.array([1, None, 2, 3]).dtype int(64, nullable=True)
A UTF-8 string array:
>>> vx.array(['hello, ', 'is', 'it', 'me?']).dtype utf8(nullable=False)
- fill_forward()¶
Fill forward non-null values over runs of nulls.
Leading nulls are replaced with the “zero” for that type. For integral and floating-point types, this is zero. For the Boolean type, this is :obj:`False.
Fill forward sensor values over intermediate missing values. Note that leading nulls are replaced with 0.0:
>>> import vortex as vx >>> a = vx.array([ ... None, None, 30.29, 30.30, 30.30, None, None, 30.27, 30.25, ... 30.22, None, None, None, None, 30.12, 30.11, 30.11, 30.11, ... 30.10, 30.08, None, 30.21, 30.03, 30.03, 30.05, 30.07, 30.07, ... ]) >>> a.fill_forward().to_arrow_array() <pyarrow.lib.DoubleArray object at ...> [ 0, 0, 30.29, 30.3, 30.3, 30.3, 30.3, 30.27, 30.25, 30.22, ... 30.11, 30.1, 30.08, 30.08, 30.21, 30.03, 30.03, 30.05, 30.07, 30.07 ]
- filter(mask)¶
Filter an Array by another Boolean array.
- Parameters:
filter (
Array
) – Keep all the rows inself
for which the correspondingly indexed row in filter is True.- Return type:
Examples
Keep only the single digit positive integers.
>>> import vortex as vx >>> a = vx.array([0, 42, 1_000, -23, 10, 9, 5]) >>> filter = vx.array([True, False, False, False, False, True, True]) >>> a.filter(filter).to_arrow_array() <pyarrow.lib.Int64Array object at ...> [ 0, 9, 5 ]
- static from_arrow(obj)¶
Convert a PyArrow object into a Vortex array.
One of
pyarrow.Array
,pyarrow.ChunkedArray
, orpyarrow.Table
.- Return type:
- id¶
Returns the encoding ID of this array.
- nbytes¶
Returns the number of bytes used by this array.
- scalar_at(index)¶
Retrieve a row by its index.
- Parameters:
index (
int
) – The index of interest. Must be greater than or equal to zero and less than the length of this array.- Return type:
Examples
Retrieve the last element from an array of integers:
>>> import vortex as vx >>> vx.array([10, 42, 999, 1992]).scalar_at(3).as_py() 1992
Retrieve the third element from an array of strings:
>>> array = vx.array(["hello", "goodbye", "it", "is"]) >>> array.scalar_at(2).as_py() 'it'
Retrieve an element from an array of structures:
>>> array = vx.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... None, ... {'name': 'Mikhail', 'age': 57}, ... ]) >>> array.scalar_at(2).as_py() {'age': 33, 'name': 'Angela'}
Retrieve a missing element from an array of structures:
>>> array.scalar_at(3).as_py() is None True
Out of bounds accesses are prohibited:
>>> vx.array([10, 42, 999, 1992]).scalar_at(10) Traceback (most recent call last): ... ValueError: index 10 out of bounds from 0 to 4 ...
Unlike Python, negative indices are not supported:
>>> vx.array([10, 42, 999, 1992]).scalar_at(-2) Traceback (most recent call last): ... OverflowError: can't convert negative int to unsigned
- slice(start, end)¶
Slice this array.
- Parameters:
- Return type:
Examples
Keep only the second through third elements:
>>> import vortex as vx >>> a = vx.array(['a', 'b', 'c', 'd']) >>> a.slice(1, 3).to_arrow_array() <pyarrow.lib.StringArray object at ...> [ "b", "c" ]
Keep none of the elements:
>>> a = vx.array(['a', 'b', 'c', 'd']) >>> a.slice(3, 3).to_arrow_array() <pyarrow.lib.StringViewArray object at ...> []
Unlike Python, it is an error to slice outside the bounds of the array:
>>> a = vx.array(['a', 'b', 'c', 'd']) >>> a.slice(2, 10).to_arrow_array() Traceback (most recent call last): ... ValueError: index 10 out of bounds from 0 to 4
Or to slice with a negative value:
>>> a = vx.array(['a', 'b', 'c', 'd']) >>> a.slice(-2, -1).to_arrow_array() Traceback (most recent call last): ... OverflowError: can't convert negative int to unsigned
- take(indices)¶
Filter, permute, and/or repeat elements by their index.
Examples
Keep only the first and third elements:
>>> a = vx.array(['a', 'b', 'c', 'd']) >>> indices = vx.array([0, 2]) >>> a.take(indices).to_arrow_array() <pyarrow.lib.StringArray object at ...> [ "a", "c" ]
Permute and repeat the first and second elements:
>>> a = vx.array(['a', 'b', 'c', 'd']) >>> indices = vx.array([0, 1, 1, 0]) >>> a.take(indices).to_arrow_array() <pyarrow.lib.StringArray object at ...> [ "a", "b", "b", "a" ]
- to_arrow_array()¶
Convert this array to a PyArrow array.
Convert this array to an Arrow array.
See also
- Return type:
Examples
Round-trip an Arrow array through a Vortex array:
>>> import vortex as vx >>> vx.array([1, 2, 3]).to_arrow_array() <pyarrow.lib.Int64Array object at ...> [ 1, 2, 3 ]
- to_arrow_table() Table ¶
Construct an Arrow table from this Vortex array.
See also
Warning
Only struct-typed arrays can be converted to Arrow tables.
- Return type:
Examples
>>> array = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... {'name': 'Mikhail', 'age': 57}, ... ]) >>> array.to_arrow_table() pyarrow.Table age: int64 name: string_view ---- age: [[25,31,33,57]] name: [["Joseph","Narendra","Angela","Mikhail"]]
- to_numpy(*, zero_copy_only: bool = True) numpy.ndarray ¶
Construct a NumPy array from this Vortex array.
This is an alias for
self.to_arrow_array().to_numpy(zero_copy_only)
- Parameters:
zero_copy_only (
bool
) – WhenTrue
, this method will raise an error unless a NumPy array can be created without copying the data. This is only possible when the array is a primitive array without nulls.- Return type:
Examples
Construct an immutable ndarray from a Vortex array:
>>> array = vortex.array([1, 0, 0, 1]) >>> array.to_numpy() array([1, 0, 0, 1])
- to_pandas_df() DataFrame ¶
Construct a Pandas dataframe from this Vortex array.
Warning
Only struct-typed arrays can be converted to Pandas dataframes.
- Return type:
Examples
Construct a dataframe from a Vortex array:
>>> array = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... {'name': 'Mikhail', 'age': 57}, ... ]) >>> array.to_pandas_df() age name 0 25 Joseph 1 31 Narendra 2 33 Angela 3 57 Mikhail
- to_polars_dataframe()¶
Construct a Polars dataframe from this Vortex array.
See also
Warning
Only struct-typed arrays can be converted to Polars dataframes.
- Returns:
.. – Polars excludes the DataFrame class from their Intersphinx index https://github.com/pola-rs/polars/issues/7027
Examples
>>> array = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... {'name': 'Mikhail', 'age': 57}, ... ]) >>> array.to_polars_dataframe() shape: (4, 2) ┌─────┬──────────┐ │ age ┆ name │ │ --- ┆ --- │ │ i64 ┆ str │ ╞═════╪══════════╡ │ 25 ┆ Joseph │ │ 31 ┆ Narendra │ │ 33 ┆ Angela │ │ 57 ┆ Mikhail │ └─────┴──────────┘
- to_polars_series()¶
Construct a Polars series from this Vortex array.
See also
- Returns:
.. – Polars excludes the Series class from their Intersphinx index https://github.com/pola-rs/polars/issues/7027
Examples
Convert a numeric array with nulls to a Polars Series:
>>> vortex.array([1, None, 2, 3]).to_polars_series() shape: (4,) Series: '' [i64] [ 1 null 2 3 ]
Convert a UTF-8 string array to a Polars Series:
>>> vortex.array(['hello, ', 'is', 'it', 'me?']).to_polars_series() shape: (4,) Series: '' [str] [ "hello, " "is" "it" "me?" ]
Convert a struct array to a Polars Series:
>>> array = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... {'name': 'Mikhail', 'age': 57}, ... ]) >>> array.to_polars_series() shape: (4,) Series: '' [struct[2]] [ {25,"Joseph"} {31,"Narendra"} {33,"Angela"} {57,"Mikhail"} ]
- to_pylist() list[Any] ¶
Deeply copy an Array into a Python list.
- Return type:
Examples
>>> array = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... ]) >>> array.to_pylist() [{'age': 25, 'name': 'Joseph'}, {'age': 31, 'name': 'Narendra'}, {'age': 33, 'name': 'Angela'}]
- tree_display()¶
Internal technical details about the encoding of this Array.
Warning
The format of the returned string may change without notice.
- Return type:
Examples
Uncompressed arrays have straightforward encodings:
>>> import vortex as vx >>> arr = vx.array([1, 2, None, 3]) >>> print(arr.tree_display()) root: vortex.primitive(i64?, len=4) nbytes=33 B (100.00%) metadata: EmptyMetadata buffer (align=8): 32 B validity: vortex.bool(bool, len=4) nbytes=1 B (3.03%) metadata: BoolMetadata { offset: 0 } buffer (align=1): 1 B
Compressed arrays often have more complex, deeply nested encoding trees.
Typed Arrays¶
By default, the array subclass returned from PyVortex will be specific to the DType
of the array.
These subclasses expose type-specific functionality that is more useful for the average use-case than encoding-specific
functionality.
- class vortex.PrimitiveTypeArray¶
Concrete class for arrays of any primitive type
PrimitiveDType
.
- class vortex.UIntTypeArray¶
Concrete class for arrays of any primitive unsigned integer type
PrimitiveDType
.
- class vortex.UInt8TypeArray¶
Concrete class for arrays of u8
PrimitiveDType
.
- class vortex.UInt16TypeArray¶
Concrete class for arrays of u16
PrimitiveDType
.
- class vortex.UInt32TypeArray¶
Concrete class for arrays of u32
PrimitiveDType
.
- class vortex.UInt64TypeArray¶
Concrete class for arrays of u64
PrimitiveDType
.
- class vortex.IntTypeArray¶
Concrete class for arrays of any primitive signed integer type
PrimitiveDType
.
- class vortex.Int8TypeArray¶
Concrete class for arrays of i8
PrimitiveDType
.
- class vortex.Int16TypeArray¶
Concrete class for arrays of i16
PrimitiveDType
.
- class vortex.Int32TypeArray¶
Concrete class for arrays of i32
PrimitiveDType
.
- class vortex.Int64TypeArray¶
Concrete class for arrays of i64
PrimitiveDType
.
- class vortex.FloatTypeArray¶
Concrete class for arrays of any primitive floating point type
PrimitiveDType
.
- class vortex.Float16TypeArray¶
Concrete class for arrays of f16
PrimitiveDType
.
- class vortex.Float32TypeArray¶
Concrete class for arrays of f32
PrimitiveDType
.
- class vortex.Float64TypeArray¶
Concrete class for arrays of f64
PrimitiveDType
.
- class vortex.BinaryTypeArray¶
Concrete class for arrays of
BinaryDType
.
- class vortex.StructTypeArray¶
Concrete class for arrays of
StructDType
.
- class vortex.ExtensionTypeArray¶
Concrete class for arrays of
ExtensionDType
.
Canonical Encodings¶
Each DType
has a corresponding canonical encoding. These encodings represent the uncompressed version
of the array, and are also zero-copy to Apache Arrow.
- class vortex.NullArray¶
Concrete class for arrays with vortex.null encoding.
- class vortex.BoolArray¶
Concrete class for arrays with vortex.bool encoding.
- class vortex.PrimitiveArray¶
Concrete class for arrays with vortex.primitive encoding.
- class vortex.VarBinArray¶
Concrete class for arrays with vortex.varbin encoding.
- class vortex.VarBinViewArray¶
Concrete class for arrays with vortex.varbinview encoding.
- class vortex.StructArray¶
Concrete class for arrays with vortex.struct encoding.
- field(name)¶
Returns the given field of the struct array.
- class vortex.ListArray¶
Concrete class for arrays with vortex.list encoding.
- class vortex.ExtensionArray¶
Concrete class for arrays with vortex.ext encoding.
Utility Encodings¶
- class vortex.ChunkedArray¶
Concrete class for arrays with vortex.chunked encoding.
- class vortex.ConstantArray¶
Concrete class for arrays with vortex.constant encoding.
- scalar()¶
Return the scalar value of the constant array.
- class vortex.SparseArray¶
Concrete class for arrays with vortex.sparse encoding.
Compressed Encodings¶
- class vortex.AlpArray¶
Concrete class for arrays with vortex.alp encoding.
- class vortex.AlpRdArray¶
Concrete class for arrays with vortex.alprd encoding.
- class vortex.DateTimePartsArray¶
Concrete class for arrays with vortex.datetimeparts encoding.
- class vortex.DictArray¶
Concrete class for arrays with vortex.dict encoding.
- class vortex.FsstArray¶
Concrete class for arrays with vortex.fsst encoding.
- class vortex.RunEndArray¶
Concrete class for arrays with vortex.runend encoding.
- class vortex.ZigZagArray¶
Concrete class for arrays with vortex.zigzag encoding.
- class vortex.FastLanesBitPackedArray¶
Concrete class for arrays with fastlanes.bitpacked encoding.
- bit_width¶
Returns the bit width of the packed values.
- class vortex.FastLanesDeltaArray¶
Concrete class for arrays with fastlanes.delta encoding.
- class vortex.FastLanesFoRArray¶
Concrete class for arrays with fastlanes.for encoding.
Pluggable Encodings¶
Subclasses of PyArray
can be used to implement custom Vortex encodings in Python. These encodings
can be registered with the registry
so they are available to use when reading Vortex files.
- class vortex.PyArray¶
Abstract base class for Python-based Vortex arrays.
- abstract classmethod decode(parts: ArrayParts, ctx: ArrayContext, dtype: DType, len: int) Array ¶
Decode an array from its component parts.
ArrayParts
contains the metadata, buffers and childArrayParts
that represent the current array. Implementations of this function should validate this information, and then construct a new array.
Registry and Serde¶
- vortex.registry = <vortex.Registry object>¶
The default registry for Vortex
- class vortex.Registry¶
A register of known array and layout encodings.
- array_ctx(encodings)¶
Create an
ArrayContext
containing the given encodings.
- register(cls)¶
Register an array encoding implemented by subclassing PyArray.
It’s not currently possible to register a layout encoding from Python.
- class vortex.ArrayContext¶
An ArrayContext captures an ordered set of encodings.
In a serialized array, encodings are identified by a positional index into such an
ArrayContext
.
- class vortex.ArrayParts¶
ArrayParts is a parsed representation of a serialized array.
It can be decoded into a full array using the decode method.
- buffers¶
Return the buffers of the array, currently as
pyarrow.Buffer
.
- children¶
Return the child
ArrayParts
of the array.
- decode(ctx, dtype, len)¶
Decode the array parts into a full array.
# Returns
The decoded array.
- metadata¶
Fetch the serialized metadata of the array.
- nbuffers¶
The number of buffers the array has.
- nchildren¶
The number of child arrays the array has.
- static parse(data)¶
Parse a serialized array into its parts.