Class VortexArrowColumnVector
- All Implemented Interfaces:
AutoCloseable
This class provides a bridge between Vortex's Arrow-based data representation and Spark's ColumnVector interface. It supports all major Arrow data types including primitives, strings, binary data, decimals, dates, timestamps, arrays, maps, and structs.
The implementation uses type-specific accessors to efficiently retrieve values from the underlying Arrow vectors while maintaining Spark's expected API contract.
- See Also:
-
ColumnVector
ValueVector
-
Field Summary
Fields inherited from class org.apache.spark.sql.vectorized.ColumnVector
type
-
Constructor Summary
ConstructorsConstructorDescriptionVortexArrowColumnVector
(dev.vortex.relocated.org.apache.arrow.vector.ValueVector vector) Creates a new VortexArrowColumnVector wrapping the specified Arrow ValueVector. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Closes this column vector and releases any associated resources.org.apache.spark.sql.vectorized.ColumnarArray
getArray
(int rowId) Returns the array value at the specified row.byte[]
getBinary
(int rowId) Returns the binary data (byte array) at the specified row.boolean
getBoolean
(int rowId) Returns the boolean value at the specified row.byte
getByte
(int rowId) Returns the byte value at the specified row.getChild
(int ordinal) Returns the child column at the specified ordinal.org.apache.spark.sql.types.Decimal
getDecimal
(int rowId, int precision, int scale) Returns the decimal value at the specified row with the given precision and scale.double
getDouble
(int rowId) Returns the double value at the specified row.float
getFloat
(int rowId) Returns the float value at the specified row.int
getInt
(int rowId) Returns the int value at the specified row.long
getLong
(int rowId) Returns the long value at the specified row.org.apache.spark.sql.vectorized.ColumnarMap
getMap
(int rowId) Returns the map value at the specified row.short
getShort
(int rowId) Returns the short value at the specified row.org.apache.spark.unsafe.types.UTF8String
getUTF8String
(int rowId) Returns the UTF8String value at the specified row.dev.vortex.relocated.org.apache.arrow.vector.ValueVector
Returns the underlying Apache Arrow ValueVector wrapped by this column vector.boolean
hasNull()
Returns whether this column contains any null values.boolean
isNullAt
(int rowId) Returns whether the value at the specified row is null.int
numNulls()
Returns the total number of null values in this column.Methods inherited from class org.apache.spark.sql.vectorized.ColumnVector
closeIfFreeable, dataType, getBooleans, getBytes, getDoubles, getFloats, getInterval, getInts, getLongs, getShorts, getStruct
-
Constructor Details
-
VortexArrowColumnVector
public VortexArrowColumnVector(dev.vortex.relocated.org.apache.arrow.vector.ValueVector vector) Creates a new VortexArrowColumnVector wrapping the specified Arrow ValueVector.This constructor automatically determines the appropriate Spark DataType from the Arrow field and initializes the type-specific accessor.
- Parameters:
vector
- the Arrow ValueVector to wrap- Throws:
UnsupportedOperationException
- if the vector type is not supported
-
-
Method Details
-
getValueVector
public dev.vortex.relocated.org.apache.arrow.vector.ValueVector getValueVector()Returns the underlying Apache Arrow ValueVector wrapped by this column vector.- Returns:
- the Arrow ValueVector containing the actual data
-
hasNull
public boolean hasNull()Returns whether this column contains any null values.- Specified by:
hasNull
in classorg.apache.spark.sql.vectorized.ColumnVector
- Returns:
- true if the column contains at least one null value, false otherwise
-
numNulls
public int numNulls()Returns the total number of null values in this column.- Specified by:
numNulls
in classorg.apache.spark.sql.vectorized.ColumnVector
- Returns:
- the count of null values
-
close
public void close()Closes this column vector and releases any associated resources.This method recursively closes any child columns (for complex types like structs) and then closes the underlying Arrow vector accessor.
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in classorg.apache.spark.sql.vectorized.ColumnVector
-
isNullAt
public boolean isNullAt(int rowId) Returns whether the value at the specified row is null.- Specified by:
isNullAt
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index to check- Returns:
- true if the value at rowId is null, false otherwise
-
getBoolean
public boolean getBoolean(int rowId) Returns the boolean value at the specified row.- Specified by:
getBoolean
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index- Returns:
- the boolean value at rowId
- Throws:
UnsupportedOperationException
- if this column is not of boolean type
-
getByte
public byte getByte(int rowId) Returns the byte value at the specified row.- Specified by:
getByte
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index- Returns:
- the byte value at rowId
- Throws:
UnsupportedOperationException
- if this column is not of byte type
-
getShort
public short getShort(int rowId) Returns the short value at the specified row.- Specified by:
getShort
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index- Returns:
- the short value at rowId
- Throws:
UnsupportedOperationException
- if this column is not of short type
-
getInt
public int getInt(int rowId) Returns the int value at the specified row.- Specified by:
getInt
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index- Returns:
- the int value at rowId
- Throws:
UnsupportedOperationException
- if this column is not of int type
-
getLong
public long getLong(int rowId) Returns the long value at the specified row.- Specified by:
getLong
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index- Returns:
- the long value at rowId
- Throws:
UnsupportedOperationException
- if this column is not of long type
-
getFloat
public float getFloat(int rowId) Returns the float value at the specified row.- Specified by:
getFloat
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index- Returns:
- the float value at rowId
- Throws:
UnsupportedOperationException
- if this column is not of float type
-
getDouble
public double getDouble(int rowId) Returns the double value at the specified row.- Specified by:
getDouble
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index- Returns:
- the double value at rowId
- Throws:
UnsupportedOperationException
- if this column is not of double type
-
getDecimal
public org.apache.spark.sql.types.Decimal getDecimal(int rowId, int precision, int scale) Returns the decimal value at the specified row with the given precision and scale.- Specified by:
getDecimal
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row indexprecision
- the precision of the decimalscale
- the scale of the decimal- Returns:
- the Decimal value at rowId, or null if the value is null
- Throws:
UnsupportedOperationException
- if this column is not of decimal type
-
getUTF8String
public org.apache.spark.unsafe.types.UTF8String getUTF8String(int rowId) Returns the UTF8String value at the specified row.- Specified by:
getUTF8String
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index- Returns:
- the UTF8String value at rowId, or null if the value is null
- Throws:
UnsupportedOperationException
- if this column is not of string type
-
getBinary
public byte[] getBinary(int rowId) Returns the binary data (byte array) at the specified row.- Specified by:
getBinary
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index- Returns:
- the byte array at rowId, or null if the value is null
- Throws:
UnsupportedOperationException
- if this column is not of binary type
-
getArray
public org.apache.spark.sql.vectorized.ColumnarArray getArray(int rowId) Returns the array value at the specified row.- Specified by:
getArray
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index- Returns:
- the ColumnarArray at rowId, or null if the value is null
- Throws:
UnsupportedOperationException
- if this column is not of array type
-
getMap
public org.apache.spark.sql.vectorized.ColumnarMap getMap(int rowId) Returns the map value at the specified row.- Specified by:
getMap
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
rowId
- the row index- Returns:
- the ColumnarMap at rowId, or null if the value is null
- Throws:
UnsupportedOperationException
- if this column is not of map type
-
getChild
Returns the child column at the specified ordinal.This is used for complex types like structs where each field is represented as a child column.
- Specified by:
getChild
in classorg.apache.spark.sql.vectorized.ColumnVector
- Parameters:
ordinal
- the index of the child column- Returns:
- the child VortexArrowColumnVector at the specified ordinal
- Throws:
ArrayIndexOutOfBoundsException
- if ordinal is out of bounds
-