Arrays

The base class for all Vortex arrays is vortex.Array. This class holds the tree of array definitions and buffers that make up the array and can be passed into compute functions, serialized, and otherwise manipulated as a generic array.

Factory Functions

vortex.array(obj: Array | list | Any) Array

The main entry point for creating Vortex arrays from other Python objects.

This function is also available as vortex.array.

Parameters:

obj (pyarrow.Array, list, pandas.DataFrame) – The elements of this array or list become the elements of the Vortex array.

Return type:

vortex.Array

Examples

A Vortex array containing the first three integers:

>>> vortex.array([1, 2, 3]).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  1,
  2,
  3
]

The same Vortex array with a null value in the third position:

>>> vortex.array([1, 2, None, 3]).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  1,
  2,
  null,
  3
]

Initialize a Vortex array from an Arrow array:

>>> arrow = pyarrow.array(['Hello', 'it', 'is', 'me'], type=pyarrow.string_view())
>>> vortex.array(arrow).to_arrow_array()
<pyarrow.lib.StringViewArray object at ...>
[
  "Hello",
  "it",
  "is",
  "me"
]

Initialize a Vortex array from a Pandas dataframe:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     "Name": ["Braund", "Allen", "Bonnell"],
...     "Age": [22, 35, 58],
... })
>>> vortex.array(df).to_arrow_array()
<pyarrow.lib.ChunkedArray object at ...>
[
  -- is_valid: all not null
  -- child 0 type: string_view
    [
      "Braund",
      "Allen",
      "Bonnell"
    ]
  -- child 1 type: int64
    [
      22,
      35,
      58
    ]
]

Base Class

class vortex.Array(*args, **kwargs)

An array of zero or more rows each with the same set of columns.

Examples

Arrays support all the standard comparison operations:

>>> import vortex as vx
>>> a = vx.array(['dog', None, 'cat', 'mouse', 'fish'])
>>> b = vx.array(['doug', 'jennifer', 'casper', 'mouse', 'faust'])
>>> (a < b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   true,
   null,
   false,
   false,
   false
]
>>> (a <= b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   true,
   null,
   false,
   true,
   false
]
>>> (a == b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   false,
   null,
   false,
   true,
   false
]
>>> (a != b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   true,
   null,
   true,
   false,
   true
]
>>> (a >= b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   false,
   null,
   true,
   true,
   true
]
>>> (a > b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
   false,
   null,
   true,
   false,
   true
]
__len__()

Return len(self).

dtype

Returns the data type of this array.

Return type:

vortex.DType

Examples

By default, vortex.array() uses the largest available bit-width:

>>> import vortex as vx
>>> vx.array([1, 2, 3]).dtype
int(64, nullable=False)

Including a None forces a nullable type:

>>> vx.array([1, None, 2, 3]).dtype
int(64, nullable=True)

A UTF-8 string array:

>>> vx.array(['hello, ', 'is', 'it', 'me?']).dtype
utf8(nullable=False)
fill_forward()

Fill forward non-null values over runs of nulls.

Leading nulls are replaced with the “zero” for that type. For integral and floating-point types, this is zero. For the Boolean type, this is :obj:`False.

Fill forward sensor values over intermediate missing values. Note that leading nulls are replaced with 0.0:

>>> import vortex as vx
>>> a = vx.array([
...      None,  None, 30.29, 30.30, 30.30,  None,  None, 30.27, 30.25,
...     30.22,  None,  None,  None,  None, 30.12, 30.11, 30.11, 30.11,
...     30.10, 30.08,  None, 30.21, 30.03, 30.03, 30.05, 30.07, 30.07,
... ])
>>> a.fill_forward().to_arrow_array()
<pyarrow.lib.DoubleArray object at ...>
[
  0,
  0,
  30.29,
  30.3,
  30.3,
  30.3,
  30.3,
  30.27,
  30.25,
  30.22,
  ...
  30.11,
  30.1,
  30.08,
  30.08,
  30.21,
  30.03,
  30.03,
  30.05,
  30.07,
  30.07
]
filter(mask)

Filter an Array by another Boolean array.

Parameters:

filter (Array) – Keep all the rows in self for which the correspondingly indexed row in filter is True.

Return type:

Array

Examples

Keep only the single digit positive integers.

>>> import vortex as vx
>>> a = vx.array([0, 42, 1_000, -23, 10, 9, 5])
>>> filter = vx.array([True, False, False, False, False, True, True])
>>> a.filter(filter).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  0,
  9,
  5
]
static from_arrow(obj)

Convert a PyArrow object into a Vortex array.

One of pyarrow.Array, pyarrow.ChunkedArray, or pyarrow.Table.

Return type:

Array

id

Returns the encoding ID of this array.

nbytes

Returns the number of bytes used by this array.

scalar_at(index)

Retrieve a row by its index.

Parameters:

index (int) – The index of interest. Must be greater than or equal to zero and less than the length of this array.

Return type:

vortex.Scalar

Examples

Retrieve the last element from an array of integers:

>>> import vortex as vx
>>> vx.array([10, 42, 999, 1992]).scalar_at(3).as_py()
1992

Retrieve the third element from an array of strings:

>>> array = vx.array(["hello", "goodbye", "it", "is"])
>>> array.scalar_at(2).as_py()
'it'

Retrieve an element from an array of structures:

>>> array = vx.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     None,
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.scalar_at(2).as_py()
{'age': 33, 'name': 'Angela'}

Retrieve a missing element from an array of structures:

>>> array.scalar_at(3).as_py() is None
True

Out of bounds accesses are prohibited:

>>> vx.array([10, 42, 999, 1992]).scalar_at(10)
Traceback (most recent call last):
...
ValueError: index 10 out of bounds from 0 to 4
...

Unlike Python, negative indices are not supported:

>>> vx.array([10, 42, 999, 1992]).scalar_at(-2)
Traceback (most recent call last):
...
OverflowError: can't convert negative int to unsigned
slice(start, end)

Slice this array.

Parameters:
  • start (int) – The start index of the range to keep, inclusive.

  • end (int) – The end index, exclusive.

Return type:

Array

Examples

Keep only the second through third elements:

>>> import vortex as vx
>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> a.slice(1, 3).to_arrow_array()
<pyarrow.lib.StringArray object at ...>
[
  "b",
  "c"
]

Keep none of the elements:

>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> a.slice(3, 3).to_arrow_array()
<pyarrow.lib.StringViewArray object at ...>
[]

Unlike Python, it is an error to slice outside the bounds of the array:

>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> a.slice(2, 10).to_arrow_array()
Traceback (most recent call last):
...
ValueError: index 10 out of bounds from 0 to 4

Or to slice with a negative value:

>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> a.slice(-2, -1).to_arrow_array()
Traceback (most recent call last):
...
OverflowError: can't convert negative int to unsigned
take(indices)

Filter, permute, and/or repeat elements by their index.

Parameters:

indices (Array) – An array of indices to keep.

Return type:

Array

Examples

Keep only the first and third elements:

>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> indices = vx.array([0, 2])
>>> a.take(indices).to_arrow_array()
<pyarrow.lib.StringArray object at ...>
[
  "a",
  "c"
]

Permute and repeat the first and second elements:

>>> a = vx.array(['a', 'b', 'c', 'd'])
>>> indices = vx.array([0, 1, 1, 0])
>>> a.take(indices).to_arrow_array()
<pyarrow.lib.StringArray object at ...>
[
  "a",
  "b",
  "b",
  "a"
]
to_arrow_array()

Convert this array to a PyArrow array.

Convert this array to an Arrow array.

See also

to_arrow_table()

Return type:

pyarrow.Array

Examples

Round-trip an Arrow array through a Vortex array:

>>> import vortex as vx
>>> vx.array([1, 2, 3]).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  1,
  2,
  3
]
to_arrow_table() Table

Construct an Arrow table from this Vortex array.

See also

to_arrow_array()

Warning

Only struct-typed arrays can be converted to Arrow tables.

Return type:

pyarrow.Table

Examples

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_arrow_table()
pyarrow.Table
age: int64
name: string_view
----
age: [[25,31,33,57]]
name: [["Joseph","Narendra","Angela","Mikhail"]]
to_numpy(*, zero_copy_only: bool = True) numpy.ndarray

Construct a NumPy array from this Vortex array.

This is an alias for self.to_arrow_array().to_numpy(zero_copy_only)

Parameters:

zero_copy_only (bool) – When True, this method will raise an error unless a NumPy array can be created without copying the data. This is only possible when the array is a primitive array without nulls.

Return type:

numpy.ndarray

Examples

Construct an immutable ndarray from a Vortex array:

>>> array = vortex.array([1, 0, 0, 1])
>>> array.to_numpy()
array([1, 0, 0, 1])
to_pandas_df() DataFrame

Construct a Pandas dataframe from this Vortex array.

Warning

Only struct-typed arrays can be converted to Pandas dataframes.

Return type:

pandas.DataFrame

Examples

Construct a dataframe from a Vortex array:

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_pandas_df()
   age      name
0   25    Joseph
1   31  Narendra
2   33    Angela
3   57   Mikhail
to_polars_dataframe()

Construct a Polars dataframe from this Vortex array.

Warning

Only struct-typed arrays can be converted to Polars dataframes.

Returns:

Examples

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_polars_dataframe()
shape: (4, 2)
┌─────┬──────────┐
│ age ┆ name     │
│ --- ┆ ---      │
│ i64 ┆ str      │
╞═════╪══════════╡
│ 25  ┆ Joseph   │
│ 31  ┆ Narendra │
│ 33  ┆ Angela   │
│ 57  ┆ Mikhail  │
└─────┴──────────┘
to_polars_series()

Construct a Polars series from this Vortex array.

Returns:

Examples

Convert a numeric array with nulls to a Polars Series:

>>> vortex.array([1, None, 2, 3]).to_polars_series()  
shape: (4,)
Series: '' [i64]
[
    1
    null
    2
    3
]

Convert a UTF-8 string array to a Polars Series:

>>> vortex.array(['hello, ', 'is', 'it', 'me?']).to_polars_series()  
shape: (4,)
Series: '' [str]
[
    "hello, "
    "is"
    "it"
    "me?"
]

Convert a struct array to a Polars Series:

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_polars_series()  
shape: (4,)
Series: '' [struct[2]]
[
    {25,"Joseph"}
    {31,"Narendra"}
    {33,"Angela"}
    {57,"Mikhail"}
]
to_pylist() list[Any]

Deeply copy an Array into a Python list.

Return type:

list

Examples

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
... ])
>>> array.to_pylist()
[{'age': 25, 'name': 'Joseph'}, {'age': 31, 'name': 'Narendra'}, {'age': 33, 'name': 'Angela'}]
tree_display()

Internal technical details about the encoding of this Array.

Warning

The format of the returned string may change without notice.

Return type:

str

Examples

Uncompressed arrays have straightforward encodings:

>>> import vortex as vx
>>> arr = vx.array([1, 2, None, 3])
>>> print(arr.tree_display())
root: vortex.primitive(i64?, len=4) nbytes=33 B (100.00%)
  metadata: EmptyMetadata
  buffer (align=8): 32 B
  validity: vortex.bool(bool, len=4) nbytes=1 B (3.03%)
    metadata: BoolMetadata { offset: 0 }
    buffer (align=1): 1 B

Compressed arrays often have more complex, deeply nested encoding trees.

Canonical Encodings

Each DType has a corresponding canonical encoding. These encodings represent the uncompressed version of the array, and are also zero-copy to Apache Arrow.

class vortex.NullArray(*args, **kwargs)

Concrete class for arrays with vortex.null encoding.

class vortex.BoolArray(*args, **kwargs)

Concrete class for arrays with vortex.bool encoding.

class vortex.PrimitiveArray(*args, **kwargs)

Concrete class for arrays with vortex.primitive encoding.

class vortex.VarBinArray(*args, **kwargs)

Concrete class for arrays with vortex.varbin encoding.

class vortex.VarBinViewArray(*args, **kwargs)

Concrete class for arrays with vortex.varbinview encoding.

class vortex.StructArray(*args, **kwargs)

Concrete class for arrays with vortex.struct encoding.

field(name)

Returns the given field of the struct array.

class vortex.ListArray(*args, **kwargs)

Concrete class for arrays with vortex.list encoding.

class vortex.ExtensionArray(*args, **kwargs)

Concrete class for arrays with vortex.ext encoding.

Utility Encodings

class vortex.ChunkedArray(*args, **kwargs)

Concrete class for arrays with vortex.chunked encoding.

class vortex.ConstantArray(*args, **kwargs)

Concrete class for arrays with vortex.constant encoding.

scalar()

Return the scalar value of the constant array.

class vortex.ByteBoolArray(*args, **kwargs)

Concrete class for arrays with vortex.bytebool encoding.

class vortex.SparseArray(*args, **kwargs)

Concrete class for arrays with vortex.sparse encoding.

Compressed Encodings

class vortex.AlpArray(*args, **kwargs)

Concrete class for arrays with vortex.alp encoding.

class vortex.AlpRdArray(*args, **kwargs)

Concrete class for arrays with vortex.alprd encoding.

class vortex.DateTimePartsArray(*args, **kwargs)

Concrete class for arrays with vortex.datetimeparts encoding.

class vortex.DictArray(*args, **kwargs)

Concrete class for arrays with vortex.dict encoding.

class vortex.FsstArray(*args, **kwargs)

Concrete class for arrays with vortex.fsst encoding.

class vortex.RunEndArray(*args, **kwargs)

Concrete class for arrays with vortex.runend encoding.

class vortex.ZigZagArray(*args, **kwargs)

Concrete class for arrays with vortex.zigzag encoding.

class vortex.FastLanesBitPackedArray(*args, **kwargs)

Concrete class for arrays with fastlanes.bitpacked encoding.

bit_width

Returns the bit width of the packed values.

class vortex.FastLanesDeltaArray(*args, **kwargs)

Concrete class for arrays with fastlanes.delta encoding.

class vortex.FastLanesFoRArray(*args, **kwargs)

Concrete class for arrays with fastlanes.for encoding.

Pluggable Encodings

Subclasses of PyArray can be used to implement custom Vortex encodings in Python. These encodings can be registered with the registry so they are available to use when reading Vortex files.

class vortex.PyArray(*args, **kwargs)

Abstract base class for Python-based Vortex arrays.

abstract classmethod decode(parts: ArrayParts, ctx: ArrayContext, dtype: DType, len: int) Array

Decode an array from its component parts.

ArrayParts contains the metadata, buffers and child ArrayParts that represent the current array. Implementations of this function should validate this information, and then construct a new array.

abstract property dtype: DType

The data type of the array.

Registry and Serde

vortex.registry = <vortex.Registry object>

The default registry for Vortex

class vortex.Registry

A register of known array and layout encodings.

array_ctx(encodings)

Create an ArrayContext containing the given encodings.

register(cls)

Register an array encoding implemented by subclassing PyArray.

It’s not currently possible to register a layout encoding from Python.

class vortex.ArrayContext

An ArrayContext captures an ordered set of encodings.

In a serialized array, encodings are identified by a positional index into such an ArrayContext.

class vortex.ArrayParts

ArrayParts is a parsed representation of a serialized array.

It can be decoded into a full array using the decode method.

buffers

Return the buffers of the array, currently as pyarrow.Buffer.

children

Return the child ArrayParts of the array.

decode(ctx, dtype, len)

Decode the array parts into a full array.

# Returns

The decoded array.

metadata

Fetch the serialized metadata of the array.

nbuffers

The number of buffers the array has.

nchildren

The number of child arrays the array has.

static parse(data)

Parse a serialized array into its parts.

Streams and Iterators

class vortex.ArrayIterator
dtype

Return the vortex.DType for all chunks of this iterator.

static from_iter(dtype, iter)

Create a vortex.ArrayIterator from an iterator of vortex.Array.

read_all()

Read all chunks into a single vortex.Array. If there are multiple chunks, this will be a vortex.ChunkedArray, otherwise it will be a single array.

to_arrow()

Convert the vortex.ArrayIterator into a pyarrow.RecordBatchReader.