Class VortexArrowColumnVector
- All Implemented Interfaces:
AutoCloseable
This class provides a bridge between Vortex's Arrow-based data representation and Spark's ColumnVector interface. It supports all major Arrow data types including primitives, strings, binary data, decimals, dates, timestamps, arrays, maps, and structs.
The implementation uses type-specific accessors to efficiently retrieve values from the underlying Arrow vectors while maintaining Spark's expected API contract.
- See Also:
-
ColumnVectorValueVector
-
Field Summary
Fields inherited from class org.apache.spark.sql.vectorized.ColumnVector
type -
Constructor Summary
ConstructorsConstructorDescriptionVortexArrowColumnVector(dev.vortex.relocated.org.apache.arrow.vector.ValueVector vector) Creates a new VortexArrowColumnVector wrapping the specified Arrow ValueVector. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Closes this column vector and releases any associated resources.org.apache.spark.sql.vectorized.ColumnarArraygetArray(int rowId) Returns the array value at the specified row.byte[]getBinary(int rowId) Returns the binary data (byte array) at the specified row.booleangetBoolean(int rowId) Returns the boolean value at the specified row.bytegetByte(int rowId) Returns the byte value at the specified row.getChild(int ordinal) Returns the child column at the specified ordinal.org.apache.spark.sql.types.DecimalgetDecimal(int rowId, int precision, int scale) Returns the decimal value at the specified row with the given precision and scale.doublegetDouble(int rowId) Returns the double value at the specified row.floatgetFloat(int rowId) Returns the float value at the specified row.intgetInt(int rowId) Returns the int value at the specified row.longgetLong(int rowId) Returns the long value at the specified row.org.apache.spark.sql.vectorized.ColumnarMapgetMap(int rowId) Returns the map value at the specified row.shortgetShort(int rowId) Returns the short value at the specified row.org.apache.spark.unsafe.types.UTF8StringgetUTF8String(int rowId) Returns the UTF8String value at the specified row.dev.vortex.relocated.org.apache.arrow.vector.ValueVectorReturns the underlying Apache Arrow ValueVector wrapped by this column vector.booleanhasNull()Returns whether this column contains any null values.booleanisNullAt(int rowId) Returns whether the value at the specified row is null.intnumNulls()Returns the total number of null values in this column.Methods inherited from class org.apache.spark.sql.vectorized.ColumnVector
closeIfFreeable, dataType, getBooleans, getBytes, getDoubles, getFloats, getInterval, getInts, getLongs, getShorts, getStruct
-
Constructor Details
-
VortexArrowColumnVector
public VortexArrowColumnVector(dev.vortex.relocated.org.apache.arrow.vector.ValueVector vector) Creates a new VortexArrowColumnVector wrapping the specified Arrow ValueVector.This constructor automatically determines the appropriate Spark DataType from the Arrow field and initializes the type-specific accessor.
- Parameters:
vector- the Arrow ValueVector to wrap- Throws:
UnsupportedOperationException- if the vector type is not supported
-
-
Method Details
-
getValueVector
public dev.vortex.relocated.org.apache.arrow.vector.ValueVector getValueVector()Returns the underlying Apache Arrow ValueVector wrapped by this column vector.- Returns:
- the Arrow ValueVector containing the actual data
-
hasNull
public boolean hasNull()Returns whether this column contains any null values.- Specified by:
hasNullin classorg.apache.spark.sql.vectorized.ColumnVector- Returns:
- true if the column contains at least one null value, false otherwise
-
numNulls
public int numNulls()Returns the total number of null values in this column.- Specified by:
numNullsin classorg.apache.spark.sql.vectorized.ColumnVector- Returns:
- the count of null values
-
close
public void close()Closes this column vector and releases any associated resources.This method recursively closes any child columns (for complex types like structs) and then closes the underlying Arrow vector accessor.
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein classorg.apache.spark.sql.vectorized.ColumnVector
-
isNullAt
public boolean isNullAt(int rowId) Returns whether the value at the specified row is null.- Specified by:
isNullAtin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index to check- Returns:
- true if the value at rowId is null, false otherwise
-
getBoolean
public boolean getBoolean(int rowId) Returns the boolean value at the specified row.- Specified by:
getBooleanin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index- Returns:
- the boolean value at rowId
- Throws:
UnsupportedOperationException- if this column is not of boolean type
-
getByte
public byte getByte(int rowId) Returns the byte value at the specified row.- Specified by:
getBytein classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index- Returns:
- the byte value at rowId
- Throws:
UnsupportedOperationException- if this column is not of byte type
-
getShort
public short getShort(int rowId) Returns the short value at the specified row.- Specified by:
getShortin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index- Returns:
- the short value at rowId
- Throws:
UnsupportedOperationException- if this column is not of short type
-
getInt
public int getInt(int rowId) Returns the int value at the specified row.- Specified by:
getIntin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index- Returns:
- the int value at rowId
- Throws:
UnsupportedOperationException- if this column is not of int type
-
getLong
public long getLong(int rowId) Returns the long value at the specified row.- Specified by:
getLongin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index- Returns:
- the long value at rowId
- Throws:
UnsupportedOperationException- if this column is not of long type
-
getFloat
public float getFloat(int rowId) Returns the float value at the specified row.- Specified by:
getFloatin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index- Returns:
- the float value at rowId
- Throws:
UnsupportedOperationException- if this column is not of float type
-
getDouble
public double getDouble(int rowId) Returns the double value at the specified row.- Specified by:
getDoublein classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index- Returns:
- the double value at rowId
- Throws:
UnsupportedOperationException- if this column is not of double type
-
getDecimal
public org.apache.spark.sql.types.Decimal getDecimal(int rowId, int precision, int scale) Returns the decimal value at the specified row with the given precision and scale.- Specified by:
getDecimalin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row indexprecision- the precision of the decimalscale- the scale of the decimal- Returns:
- the Decimal value at rowId, or null if the value is null
- Throws:
UnsupportedOperationException- if this column is not of decimal type
-
getUTF8String
public org.apache.spark.unsafe.types.UTF8String getUTF8String(int rowId) Returns the UTF8String value at the specified row.- Specified by:
getUTF8Stringin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index- Returns:
- the UTF8String value at rowId, or null if the value is null
- Throws:
UnsupportedOperationException- if this column is not of string type
-
getBinary
public byte[] getBinary(int rowId) Returns the binary data (byte array) at the specified row.- Specified by:
getBinaryin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index- Returns:
- the byte array at rowId, or null if the value is null
- Throws:
UnsupportedOperationException- if this column is not of binary type
-
getArray
public org.apache.spark.sql.vectorized.ColumnarArray getArray(int rowId) Returns the array value at the specified row.- Specified by:
getArrayin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index- Returns:
- the ColumnarArray at rowId, or null if the value is null
- Throws:
UnsupportedOperationException- if this column is not of array type
-
getMap
public org.apache.spark.sql.vectorized.ColumnarMap getMap(int rowId) Returns the map value at the specified row.- Specified by:
getMapin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
rowId- the row index- Returns:
- the ColumnarMap at rowId, or null if the value is null
- Throws:
UnsupportedOperationException- if this column is not of map type
-
getChild
Returns the child column at the specified ordinal.This is used for complex types like structs where each field is represented as a child column.
- Specified by:
getChildin classorg.apache.spark.sql.vectorized.ColumnVector- Parameters:
ordinal- the index of the child column- Returns:
- the child VortexArrowColumnVector at the specified ordinal
- Throws:
ArrayIndexOutOfBoundsException- if ordinal is out of bounds
-