org.apache.spark.sql.vectorized.ColumnVector

dev.vortex.spark.read.VortexArrowColumnVector

All Implemented Interfaces:: AutoCloseable

public class VortexArrowColumnVector extends org.apache.spark.sql.vectorized.ColumnVector

Spark ColumnVector implementation that wraps Apache Arrow vectors from Vortex data.

This class provides a bridge between Vortex's Arrow-based data representation and Spark's ColumnVector interface. It supports all major Arrow data types including primitives, strings, binary data, decimals, dates, timestamps, arrays, maps, and structs.

The implementation uses type-specific accessors to efficiently retrieve values from the underlying Arrow vectors while maintaining Spark's expected API contract.

See Also:

ColumnVector
ValueVector

Field Summary

Fields inherited from class org.apache.spark.sql.vectorized.ColumnVector
type
Constructor Summary

Constructors

Constructor

Description

VortexArrowColumnVector(dev.vortex.relocated.org.apache.arrow.vector.ValueVector vector)

Creates a new VortexArrowColumnVector wrapping the specified Arrow ValueVector.
Method Summary

Modifier and Type

Method

Description

void

close()

Closes this column vector and releases any associated resources.

org.apache.spark.sql.vectorized.ColumnarArray

getArray(int rowId)

Returns the array value at the specified row.

byte[]

getBinary(int rowId)

Returns the binary data (byte array) at the specified row.

boolean

getBoolean(int rowId)

Returns the boolean value at the specified row.

byte

getByte(int rowId)

Returns the byte value at the specified row.

VortexArrowColumnVector

getChild(int ordinal)

Returns the child column at the specified ordinal.

org.apache.spark.sql.types.Decimal

getDecimal(int rowId, int precision, int scale)

Returns the decimal value at the specified row with the given precision and scale.

double

getDouble(int rowId)

Returns the double value at the specified row.

float

getFloat(int rowId)

Returns the float value at the specified row.

int

getInt(int rowId)

Returns the int value at the specified row.

long

getLong(int rowId)

Returns the long value at the specified row.

org.apache.spark.sql.vectorized.ColumnarMap

getMap(int rowId)

Returns the map value at the specified row.

short

getShort(int rowId)

Returns the short value at the specified row.

org.apache.spark.unsafe.types.UTF8String

getUTF8String(int rowId)

Returns the UTF8String value at the specified row.

dev.vortex.relocated.org.apache.arrow.vector.ValueVector

getValueVector()

Returns the underlying Apache Arrow ValueVector wrapped by this column vector.

boolean

hasNull()

Returns whether this column contains any null values.

boolean

isNullAt(int rowId)

Returns whether the value at the specified row is null.

int

numNulls()

Returns the total number of null values in this column.

Methods inherited from class org.apache.spark.sql.vectorized.ColumnVector
closeIfFreeable, dataType, getBooleans, getBytes, getDoubles, getFloats, getInterval, getInts, getLongs, getShorts, getStruct

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- VortexArrowColumnVector
  
  public VortexArrowColumnVector(dev.vortex.relocated.org.apache.arrow.vector.ValueVector vector)
  
  Creates a new VortexArrowColumnVector wrapping the specified Arrow ValueVector.
  This constructor automatically determines the appropriate Spark DataType from the Arrow field and initializes the type-specific accessor.
  
  Parameters:
  
  vector - the Arrow ValueVector to wrap
  
  Throws:
  
  UnsupportedOperationException - if the vector type is not supported
Method Details
- getValueVector
  
  public dev.vortex.relocated.org.apache.arrow.vector.ValueVector getValueVector()
  
  Returns the underlying Apache Arrow ValueVector wrapped by this column vector.
  
  Returns:
  
  the Arrow ValueVector containing the actual data
- hasNull
  
  public boolean hasNull()
  
  Returns whether this column contains any null values.
  
  Specified by:
  
  hasNull in class org.apache.spark.sql.vectorized.ColumnVector
  
  Returns:
  
  true if the column contains at least one null value, false otherwise
- numNulls
  
  public int numNulls()
  
  Returns the total number of null values in this column.
  
  Specified by:
  
  numNulls in class org.apache.spark.sql.vectorized.ColumnVector
  
  Returns:
  
  the count of null values
- close
  
  public void close()
  
  Closes this column vector and releases any associated resources.
  This method recursively closes any child columns (for complex types like structs) and then closes the underlying Arrow vector accessor.
  
  Specified by:
  
  close in interface AutoCloseable
  
  Specified by:
  
  close in class org.apache.spark.sql.vectorized.ColumnVector
- isNullAt
  
  public boolean isNullAt(int rowId)
  
  Returns whether the value at the specified row is null.
  
  Specified by:
  
  isNullAt in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index to check
  
  Returns:
  
  true if the value at rowId is null, false otherwise
- getBoolean
  
  public boolean getBoolean(int rowId)
  
  Returns the boolean value at the specified row.
  
  Specified by:
  
  getBoolean in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  Returns:
  
  the boolean value at rowId
  
  Throws:
  
  UnsupportedOperationException - if this column is not of boolean type
- getByte
  
  public byte getByte(int rowId)
  
  Returns the byte value at the specified row.
  
  Specified by:
  
  getByte in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  Returns:
  
  the byte value at rowId
  
  Throws:
  
  UnsupportedOperationException - if this column is not of byte type
- getShort
  
  public short getShort(int rowId)
  
  Returns the short value at the specified row.
  
  Specified by:
  
  getShort in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  Returns:
  
  the short value at rowId
  
  Throws:
  
  UnsupportedOperationException - if this column is not of short type
- getInt
  
  public int getInt(int rowId)
  
  Returns the int value at the specified row.
  
  Specified by:
  
  getInt in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  Returns:
  
  the int value at rowId
  
  Throws:
  
  UnsupportedOperationException - if this column is not of int type
- getLong
  
  public long getLong(int rowId)
  
  Returns the long value at the specified row.
  
  Specified by:
  
  getLong in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  Returns:
  
  the long value at rowId
  
  Throws:
  
  UnsupportedOperationException - if this column is not of long type
- getFloat
  
  public float getFloat(int rowId)
  
  Returns the float value at the specified row.
  
  Specified by:
  
  getFloat in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  Returns:
  
  the float value at rowId
  
  Throws:
  
  UnsupportedOperationException - if this column is not of float type
- getDouble
  
  public double getDouble(int rowId)
  
  Returns the double value at the specified row.
  
  Specified by:
  
  getDouble in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  Returns:
  
  the double value at rowId
  
  Throws:
  
  UnsupportedOperationException - if this column is not of double type
- getDecimal
  
  public org.apache.spark.sql.types.Decimal getDecimal(int rowId, int precision, int scale)
  
  Returns the decimal value at the specified row with the given precision and scale.
  
  Specified by:
  
  getDecimal in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  precision - the precision of the decimal
  
  scale - the scale of the decimal
  
  Returns:
  
  the Decimal value at rowId, or null if the value is null
  
  Throws:
  
  UnsupportedOperationException - if this column is not of decimal type
- getUTF8String
  
  public org.apache.spark.unsafe.types.UTF8String getUTF8String(int rowId)
  
  Returns the UTF8String value at the specified row.
  
  Specified by:
  
  getUTF8String in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  Returns:
  
  the UTF8String value at rowId, or null if the value is null
  
  Throws:
  
  UnsupportedOperationException - if this column is not of string type
- getBinary
  
  public byte[] getBinary(int rowId)
  
  Returns the binary data (byte array) at the specified row.
  
  Specified by:
  
  getBinary in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  Returns:
  
  the byte array at rowId, or null if the value is null
  
  Throws:
  
  UnsupportedOperationException - if this column is not of binary type
- getArray
  
  public org.apache.spark.sql.vectorized.ColumnarArray getArray(int rowId)
  
  Returns the array value at the specified row.
  
  Specified by:
  
  getArray in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  Returns:
  
  the ColumnarArray at rowId, or null if the value is null
  
  Throws:
  
  UnsupportedOperationException - if this column is not of array type
- getMap
  
  public org.apache.spark.sql.vectorized.ColumnarMap getMap(int rowId)
  
  Returns the map value at the specified row.
  
  Specified by:
  
  getMap in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  rowId - the row index
  
  Returns:
  
  the ColumnarMap at rowId, or null if the value is null
  
  Throws:
  
  UnsupportedOperationException - if this column is not of map type
- getChild
  
  public VortexArrowColumnVector getChild(int ordinal)
  
  Returns the child column at the specified ordinal.
  This is used for complex types like structs where each field is represented as a child column.
  
  Specified by:
  
  getChild in class org.apache.spark.sql.vectorized.ColumnVector
  
  Parameters:
  
  ordinal - the index of the child column
  
  Returns:
  
  the child VortexArrowColumnVector at the specified ordinal
  
  Throws:
  
  ArrayIndexOutOfBoundsException - if ordinal is out of bounds

Class VortexArrowColumnVector

Field Summary

Fields inherited from class org.apache.spark.sql.vectorized.ColumnVector

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.sql.vectorized.ColumnVector

Methods inherited from class java.lang.Object

Constructor Details

VortexArrowColumnVector

Method Details

getValueVector

hasNull

numNulls

close

isNullAt

getBoolean

getByte

getShort

getInt

getLong

getFloat

getDouble

getDecimal

getUTF8String

getBinary

getArray

getMap

getChild