Arrays

The base class for all Vortex arrays is vortex.Array. This class holds the tree of array definitions and buffers that make up the array and can be passed into compute functions, serialized, and otherwise manipulated as a generic array.

There are two ways of “downcasting” an array for more specific access patterns:

  1. Into an encoding-specific array, like vortex.FastLanesBitPackedArray.vortex.

  2. Into a type-specific array, like vortex.BoolTypeArray.

Be careful to note that vortex.BoolArray represents an array that stores physical data

as a bit-buffer of booleans, vs vortex.BoolTypeArray which represents any array that has a logical type of boolean.

Factory Functions

vortex.array(obj: Array | list | Any) Array

The main entry point for creating Vortex arrays from other Python objects.

This function is also available as vortex.array.

Parameters:

obj (pyarrow.Array, list, pandas.DataFrame) – The elements of this array or list become the elements of the Vortex array.

Return type:

vortex.Array

Examples

A Vortex array containing the first three integers:

>>> vortex.array([1, 2, 3]).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  1,
  2,
  3
]

The same Vortex array with a null value in the third position:

>>> vortex.array([1, 2, None, 3]).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  1,
  2,
  null,
  3
]

Initialize a Vortex array from an Arrow array:

>>> arrow = pyarrow.array(['Hello', 'it', 'is', 'me'], type=pyarrow.string_view())
>>> vortex.array(arrow).to_arrow_array()
<pyarrow.lib.StringViewArray object at ...>
[
  "Hello",
  "it",
  "is",
  "me"
]

Initialize a Vortex array from a Pandas dataframe:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     "Name": ["Braund", "Allen", "Bonnell"],
...     "Age": [22, 35, 58],
... })
>>> vortex.array(df).to_arrow_array()
<pyarrow.lib.ChunkedArray object at ...>
[
  -- is_valid: all not null
  -- child 0 type: string_view
    [
      "Braund",
      "Allen",
      "Bonnell"
    ]
  -- child 1 type: int64
    [
      22,
      35,
      58
    ]
]

Base Class

class vortex.Array

An array of zero or more rows each with the same set of columns.

Examples

Arrays support all the standard comparison operations:

>>> import vortex as vx
>>> a = vx.array(['dog', None, 'cat', 'mouse', 'fish'])
>>> b = vx.array(['doug', 'jennifer', 'casper', 'mouse', 'faust'])
>>> (a < b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   true,
   null,
   false,
   false,
   false
]
>>> (a <= b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   true,
   null,
   false,
   true,
   false
]
>>> (a == b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   false,
   null,
   false,
   true,
   false
]
>>> (a != b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   true,
   null,
   true,
   false,
   true
]
>>> (a >= b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   false,
   null,
   true,
   true,
   true
]
>>> (a > b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   false,
   null,
   true,
   false,
   true
]
__len__()

Return len(self).

dtype

Returns the data type of this array.

Return type:

vortex.DType

Examples

By default, vortex.array() uses the largest available bit-width:

>>> import vortex as vx
>>> vx.array([1, 2, 3]).dtype
int(64, nullable=False)

Including a None forces a nullable type:

>>> vx.array([1, None, 2, 3]).dtype
int(64, nullable=True)

A UTF-8 string array:

>>> vx.array(['hello, ', 'is', 'it', 'me?']).dtype
utf8(nullable=False)
fill_forward()

Fill forward non-null values over runs of nulls.

Leading nulls are replaced with the “zero” for that type. For integral and floating-point types, this is zero. For the Boolean type, this is :obj:`False.

Fill forward sensor values over intermediate missing values. Note that leading nulls are replaced with 0.0:

>>> import vortex as vx
>>> a = vx.array([
...      None,  None, 30.29, 30.30, 30.30,  None,  None, 30.27, 30.25,
...     30.22,  None,  None,  None,  None, 30.12, 30.11, 30.11, 30.11,
...     30.10, 30.08,  None, 30.21, 30.03, 30.03, 30.05, 30.07, 30.07,
... ])
>>> a.fill_forward().to_arrow_array()
<pyarrow.lib.DoubleArray object at ...>
[
  0,
  0,
  30.29,
  30.3,
  30.3,
  30.3,
  30.3,
  30.27,
  30.25,
  30.22,
  ...
  30.11,
  30.1,
  30.08,
  30.08,
  30.21,
  30.03,
  30.03,
  30.05,
  30.07,
  30.07
]
filter(mask)

Filter an Array by another Boolean array.

Parameters:

filter (Array) – Keep all the rows in self for which the correspondingly indexed row in filter is True.

Return type:

Array

Examples

Keep only the single digit positive integers.

>>> import vortex as vx
>>> a = vx.array([0, 42, 1_000, -23, 10, 9, 5])
>>> filter = vx.array([True, False, False, False, False, True, True])
>>> a.filter(filter).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  0,
  9,
  5
]
static from_arrow(obj)

Convert a PyArrow object into a Vortex array.

One of pyarrow.Array, pyarrow.ChunkedArray, or pyarrow.Table.

Return type:

Array

id

Returns the encoding ID of this array.

nbytes

Returns the number of bytes used by this array.

scalar_at(index)

Retrieve a row by its index.

Parameters:

index (int) – The index of interest. Must be greater than or equal to zero and less than the length of this array.

Return type:

vortex.Scalar

Examples

Retrieve the last element from an array of integers:

>>> import vortex as vx
>>> vx.array([10, 42, 999, 1992]).scalar_at(3).as_py()
1992

Retrieve the third element from an array of strings:

>>> array = vx.array(["hello", "goodbye", "it", "is"])
>>> array.scalar_at(2).as_py()
'it'

Retrieve an element from an array of structures:

>>> array = vx.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     None,
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.scalar_at(2).as_py()
{'age': 33, 'name': 'Angela'}

Retrieve a missing element from an array of structures:

>>> array.scalar_at(3).as_py() is None
True

Out of bounds accesses are prohibited:

>>> vx.array([10, 42, 999, 1992]).scalar_at(10)
Traceback (most recent call last):
...
ValueError: index 10 out of bounds from 0 to 4
...

Unlike Python, negative indices are not supported:

>>> vx.array([10, 42, 999, 1992]).scalar_at(-2)
Traceback (most recent call last):
...
OverflowError: can't convert negative int to unsigned
slice(start, end)

Slice this array.

Parameters:
  • start (int) – The start index of the range to keep, inclusive.

  • end (int) – The end index, exclusive.

Return type:

Array

Examples

Keep only the second through third elements:

>>> import vortex as vx
>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> a.slice(1, 3).to_arrow_array()
<pyarrow.lib.StringArray object at ...>
[
  "b",
  "c"
]

Keep none of the elements:

>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> a.slice(3, 3).to_arrow_array()
<pyarrow.lib.StringViewArray object at ...>
[]

Unlike Python, it is an error to slice outside the bounds of the array:

>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> a.slice(2, 10).to_arrow_array()
Traceback (most recent call last):
...
ValueError: index 10 out of bounds from 0 to 4

Or to slice with a negative value:

>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> a.slice(-2, -1).to_arrow_array()
Traceback (most recent call last):
...
OverflowError: can't convert negative int to unsigned
take(indices)

Filter, permute, and/or repeat elements by their index.

Parameters:

indices (Array) – An array of indices to keep.

Return type:

Array

Examples

Keep only the first and third elements:

>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> indices = vx.array([0, 2])
>>> a.take(indices).to_arrow_array()
<pyarrow.lib.StringArray object at ...>
[
  "a",
  "c"
]

Permute and repeat the first and second elements:

>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> indices = vx.array([0, 1, 1, 0])
>>> a.take(indices).to_arrow_array()
<pyarrow.lib.StringArray object at ...>
[
  "a",
  "b",
  "b",
  "a"
]
to_arrow_array()

Convert this array to a PyArrow array.

Convert this array to an Arrow array.

See also

to_arrow_table()

Return type:

pyarrow.Array

Examples

Round-trip an Arrow array through a Vortex array:

>>> import vortex as vx
>>> vx.array([1, 2, 3]).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  1,
  2,
  3
]
to_arrow_table() Table

Construct an Arrow table from this Vortex array.

See also

to_arrow_array()

Warning

Only struct-typed arrays can be converted to Arrow tables.

Return type:

pyarrow.Table

Examples

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_arrow_table()
pyarrow.Table
age: int64
name: string_view
----
age: [[25,31,33,57]]
name: [["Joseph","Narendra","Angela","Mikhail"]]
to_numpy(*, zero_copy_only: bool = True) numpy.ndarray

Construct a NumPy array from this Vortex array.

This is an alias for self.to_arrow_array().to_numpy(zero_copy_only)

Parameters:

zero_copy_only (bool) – When True, this method will raise an error unless a NumPy array can be created without copying the data. This is only possible when the array is a primitive array without nulls.

Return type:

numpy.ndarray

Examples

Construct an immutable ndarray from a Vortex array:

>>> array = vortex.array([1, 0, 0, 1])
>>> array.to_numpy()
array([1, 0, 0, 1])
to_pandas_df() DataFrame

Construct a Pandas dataframe from this Vortex array.

Warning

Only struct-typed arrays can be converted to Pandas dataframes.

Return type:

pandas.DataFrame

Examples

Construct a dataframe from a Vortex array:

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_pandas_df()
   age      name
0   25    Joseph
1   31  Narendra
2   33    Angela
3   57   Mikhail
to_polars_dataframe()

Construct a Polars dataframe from this Vortex array.

Warning

Only struct-typed arrays can be converted to Polars dataframes.

Returns:

Examples

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_polars_dataframe()
shape: (4, 2)
┌─────┬──────────┐
│ age ┆ name     │
│ --- ┆ ---      │
│ i64 ┆ str      │
╞═════╪══════════╡
│ 25  ┆ Joseph   │
│ 31  ┆ Narendra │
│ 33  ┆ Angela   │
│ 57  ┆ Mikhail  │
└─────┴──────────┘
to_polars_series()

Construct a Polars series from this Vortex array.

Returns:

Examples

Convert a numeric array with nulls to a Polars Series:

>>> vortex.array([1, None, 2, 3]).to_polars_series()  
shape: (4,)
Series: '' [i64]
[
    1
    null
    2
    3
]

Convert a UTF-8 string array to a Polars Series:

>>> vortex.array(['hello, ', 'is', 'it', 'me?']).to_polars_series()  
shape: (4,)
Series: '' [str]
[
    "hello, "
    "is"
    "it"
    "me?"
]

Convert a struct array to a Polars Series:

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_polars_series()  
shape: (4,)
Series: '' [struct[2]]
[
    {25,"Joseph"}
    {31,"Narendra"}
    {33,"Angela"}
    {57,"Mikhail"}
]
to_pylist() list[Any]

Deeply copy an Array into a Python list.

Return type:

list

Examples

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
... ])
>>> array.to_pylist()
[{'age': 25, 'name': 'Joseph'}, {'age': 31, 'name': 'Narendra'}, {'age': 33, 'name': 'Angela'}]
tree_display()

Internal technical details about the encoding of this Array.

Warning

The format of the returned string may change without notice.

Return type:

str

Examples

Uncompressed arrays have straightforward encodings:

>>> import vortex as vx
>>> arr = vx.array([1, 2, None, 3])
>>> print(arr.tree_display())
root: vortex.primitive(i64?, len=4) nbytes=33 B (100.00%)
  metadata: EmptyMetadata
  buffer (align=8): 32 B
  validity: vortex.bool(bool, len=4) nbytes=1 B (3.03%)
    metadata: BoolMetadata { offset: 0 }
    buffer (align=1): 1 B

Compressed arrays often have more complex, deeply nested encoding trees.

Typed Arrays

By default, the array subclass returned from PyVortex will be specific to the DType of the array. These subclasses expose type-specific functionality that is more useful for the average use-case than encoding-specific functionality.

class vortex.NullTypeArray

Concrete class for arrays of NullDType.

class vortex.BoolTypeArray

Concrete class for arrays of BoolDType.

class vortex.PrimitiveTypeArray

Concrete class for arrays of any primitive type PrimitiveDType.

class vortex.UIntTypeArray

Concrete class for arrays of any primitive unsigned integer type PrimitiveDType.

class vortex.UInt8TypeArray

Concrete class for arrays of u8 PrimitiveDType.

class vortex.UInt16TypeArray

Concrete class for arrays of u16 PrimitiveDType.

class vortex.UInt32TypeArray

Concrete class for arrays of u32 PrimitiveDType.

class vortex.UInt64TypeArray

Concrete class for arrays of u64 PrimitiveDType.

class vortex.IntTypeArray

Concrete class for arrays of any primitive signed integer type PrimitiveDType.

class vortex.Int8TypeArray

Concrete class for arrays of i8 PrimitiveDType.

class vortex.Int16TypeArray

Concrete class for arrays of i16 PrimitiveDType.

class vortex.Int32TypeArray

Concrete class for arrays of i32 PrimitiveDType.

class vortex.Int64TypeArray

Concrete class for arrays of i64 PrimitiveDType.

class vortex.FloatTypeArray

Concrete class for arrays of any primitive floating point type PrimitiveDType.

class vortex.Float16TypeArray

Concrete class for arrays of f16 PrimitiveDType.

class vortex.Float32TypeArray

Concrete class for arrays of f32 PrimitiveDType.

class vortex.Float64TypeArray

Concrete class for arrays of f64 PrimitiveDType.

class vortex.Utf8TypeArray

Concrete class for arrays of Utf8DType.

class vortex.BinaryTypeArray

Concrete class for arrays of BinaryDType.

class vortex.StructTypeArray

Concrete class for arrays of StructDType.

class vortex.ListTypeArray

Concrete class for arrays of ListDType.

class vortex.ExtensionTypeArray

Concrete class for arrays of ExtensionDType.

Canonical Encodings

Each DType has a corresponding canonical encoding. These encodings represent the uncompressed version of the array, and are also zero-copy to Apache Arrow.

class vortex.NullArray

Concrete class for arrays with vortex.null encoding.

class vortex.BoolArray

Concrete class for arrays with vortex.bool encoding.

class vortex.PrimitiveArray

Concrete class for arrays with vortex.primitive encoding.

class vortex.VarBinArray

Concrete class for arrays with vortex.varbin encoding.

class vortex.VarBinViewArray

Concrete class for arrays with vortex.varbinview encoding.

class vortex.StructArray

Concrete class for arrays with vortex.struct encoding.

field(name)

Returns the given field of the struct array.

class vortex.ListArray

Concrete class for arrays with vortex.list encoding.

class vortex.ExtensionArray

Concrete class for arrays with vortex.ext encoding.

Utility Encodings

class vortex.ChunkedArray

Concrete class for arrays with vortex.chunked encoding.

class vortex.ConstantArray

Concrete class for arrays with vortex.constant encoding.

scalar()

Return the scalar value of the constant array.

class vortex.SparseArray

Concrete class for arrays with vortex.sparse encoding.

Compressed Encodings

class vortex.AlpArray

Concrete class for arrays with vortex.alp encoding.

class vortex.AlpRdArray

Concrete class for arrays with vortex.alprd encoding.

class vortex.DateTimePartsArray

Concrete class for arrays with vortex.datetimeparts encoding.

class vortex.DictArray

Concrete class for arrays with vortex.dict encoding.

class vortex.FsstArray

Concrete class for arrays with vortex.fsst encoding.

class vortex.RunEndArray

Concrete class for arrays with vortex.runend encoding.

class vortex.ZigZagArray

Concrete class for arrays with vortex.zigzag encoding.

class vortex.FastLanesBitPackedArray

Concrete class for arrays with fastlanes.bitpacked encoding.

bit_width

Returns the bit width of the packed values.

class vortex.FastLanesDeltaArray

Concrete class for arrays with fastlanes.delta encoding.

class vortex.FastLanesFoRArray

Concrete class for arrays with fastlanes.for encoding.

Pluggable Encodings

Subclasses of PyArray can be used to implement custom Vortex encodings in Python. These encodings can be registered with the registry so they are available to use when reading Vortex files.

class vortex.PyArray

Abstract base class for Python-based Vortex arrays.

abstract classmethod decode(parts: ArrayParts, ctx: ArrayContext, dtype: DType, len: int) Array

Decode an array from its component parts.

ArrayParts contains the metadata, buffers and child ArrayParts that represent the current array. Implementations of this function should validate this information, and then construct a new array.

Registry and Serde

vortex.registry = <vortex.Registry object>

The default registry for Vortex

class vortex.Registry

A register of known array and layout encodings.

array_ctx(encodings)

Create an ArrayContext containing the given encodings.

register(cls)

Register an array encoding implemented by subclassing PyArray.

It’s not currently possible to register a layout encoding from Python.

class vortex.ArrayContext

An ArrayContext captures an ordered set of encodings.

In a serialized array, encodings are identified by a positional index into such an ArrayContext.

class vortex.ArrayParts

ArrayParts is a parsed representation of a serialized array.

It can be decoded into a full array using the decode method.

buffers

Return the buffers of the array, currently as pyarrow.Buffer.

children

Return the child ArrayParts of the array.

decode(ctx, dtype, len)

Decode the array parts into a full array.

# Returns

The decoded array.

metadata

Fetch the serialized metadata of the array.

nbuffers

The number of buffers the array has.

nchildren

The number of child arrays the array has.

static parse(data)

Parse a serialized array into its parts.