Class VortexArrowColumnVector

java.lang.Object
org.apache.spark.sql.vectorized.ColumnVector
dev.vortex.spark.read.VortexArrowColumnVector
All Implemented Interfaces:
AutoCloseable

public class VortexArrowColumnVector extends org.apache.spark.sql.vectorized.ColumnVector
Spark ColumnVector implementation that wraps Apache Arrow vectors from Vortex data.

This class provides a bridge between Vortex's Arrow-based data representation and Spark's ColumnVector interface. It supports all major Arrow data types including primitives, strings, binary data, decimals, dates, timestamps, arrays, maps, and structs.

The implementation uses type-specific accessors to efficiently retrieve values from the underlying Arrow vectors while maintaining Spark's expected API contract.

See Also:
  • ColumnVector
  • ValueVector
  • Field Summary

    Fields inherited from class org.apache.spark.sql.vectorized.ColumnVector

    type
  • Constructor Summary

    Constructors
    Constructor
    Description
    VortexArrowColumnVector(dev.vortex.relocated.org.apache.arrow.vector.ValueVector vector)
    Creates a new VortexArrowColumnVector wrapping the specified Arrow ValueVector.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Closes this column vector and releases any associated resources.
    org.apache.spark.sql.vectorized.ColumnarArray
    getArray(int rowId)
    Returns the array value at the specified row.
    byte[]
    getBinary(int rowId)
    Returns the binary data (byte array) at the specified row.
    boolean
    getBoolean(int rowId)
    Returns the boolean value at the specified row.
    byte
    getByte(int rowId)
    Returns the byte value at the specified row.
    getChild(int ordinal)
    Returns the child column at the specified ordinal.
    org.apache.spark.sql.types.Decimal
    getDecimal(int rowId, int precision, int scale)
    Returns the decimal value at the specified row with the given precision and scale.
    double
    getDouble(int rowId)
    Returns the double value at the specified row.
    float
    getFloat(int rowId)
    Returns the float value at the specified row.
    int
    getInt(int rowId)
    Returns the int value at the specified row.
    long
    getLong(int rowId)
    Returns the long value at the specified row.
    org.apache.spark.sql.vectorized.ColumnarMap
    getMap(int rowId)
    Returns the map value at the specified row.
    short
    getShort(int rowId)
    Returns the short value at the specified row.
    org.apache.spark.unsafe.types.UTF8String
    getUTF8String(int rowId)
    Returns the UTF8String value at the specified row.
    dev.vortex.relocated.org.apache.arrow.vector.ValueVector
    Returns the underlying Apache Arrow ValueVector wrapped by this column vector.
    boolean
    Returns whether this column contains any null values.
    boolean
    isNullAt(int rowId)
    Returns whether the value at the specified row is null.
    int
    Returns the total number of null values in this column.

    Methods inherited from class org.apache.spark.sql.vectorized.ColumnVector

    closeIfFreeable, dataType, getBooleans, getBytes, getDoubles, getFloats, getInterval, getInts, getLongs, getShorts, getStruct

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • VortexArrowColumnVector

      public VortexArrowColumnVector(dev.vortex.relocated.org.apache.arrow.vector.ValueVector vector)
      Creates a new VortexArrowColumnVector wrapping the specified Arrow ValueVector.

      This constructor automatically determines the appropriate Spark DataType from the Arrow field and initializes the type-specific accessor.

      Parameters:
      vector - the Arrow ValueVector to wrap
      Throws:
      UnsupportedOperationException - if the vector type is not supported
  • Method Details

    • getValueVector

      public dev.vortex.relocated.org.apache.arrow.vector.ValueVector getValueVector()
      Returns the underlying Apache Arrow ValueVector wrapped by this column vector.
      Returns:
      the Arrow ValueVector containing the actual data
    • hasNull

      public boolean hasNull()
      Returns whether this column contains any null values.
      Specified by:
      hasNull in class org.apache.spark.sql.vectorized.ColumnVector
      Returns:
      true if the column contains at least one null value, false otherwise
    • numNulls

      public int numNulls()
      Returns the total number of null values in this column.
      Specified by:
      numNulls in class org.apache.spark.sql.vectorized.ColumnVector
      Returns:
      the count of null values
    • close

      public void close()
      Closes this column vector and releases any associated resources.

      This method recursively closes any child columns (for complex types like structs) and then closes the underlying Arrow vector accessor.

      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in class org.apache.spark.sql.vectorized.ColumnVector
    • isNullAt

      public boolean isNullAt(int rowId)
      Returns whether the value at the specified row is null.
      Specified by:
      isNullAt in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index to check
      Returns:
      true if the value at rowId is null, false otherwise
    • getBoolean

      public boolean getBoolean(int rowId)
      Returns the boolean value at the specified row.
      Specified by:
      getBoolean in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      Returns:
      the boolean value at rowId
      Throws:
      UnsupportedOperationException - if this column is not of boolean type
    • getByte

      public byte getByte(int rowId)
      Returns the byte value at the specified row.
      Specified by:
      getByte in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      Returns:
      the byte value at rowId
      Throws:
      UnsupportedOperationException - if this column is not of byte type
    • getShort

      public short getShort(int rowId)
      Returns the short value at the specified row.
      Specified by:
      getShort in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      Returns:
      the short value at rowId
      Throws:
      UnsupportedOperationException - if this column is not of short type
    • getInt

      public int getInt(int rowId)
      Returns the int value at the specified row.
      Specified by:
      getInt in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      Returns:
      the int value at rowId
      Throws:
      UnsupportedOperationException - if this column is not of int type
    • getLong

      public long getLong(int rowId)
      Returns the long value at the specified row.
      Specified by:
      getLong in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      Returns:
      the long value at rowId
      Throws:
      UnsupportedOperationException - if this column is not of long type
    • getFloat

      public float getFloat(int rowId)
      Returns the float value at the specified row.
      Specified by:
      getFloat in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      Returns:
      the float value at rowId
      Throws:
      UnsupportedOperationException - if this column is not of float type
    • getDouble

      public double getDouble(int rowId)
      Returns the double value at the specified row.
      Specified by:
      getDouble in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      Returns:
      the double value at rowId
      Throws:
      UnsupportedOperationException - if this column is not of double type
    • getDecimal

      public org.apache.spark.sql.types.Decimal getDecimal(int rowId, int precision, int scale)
      Returns the decimal value at the specified row with the given precision and scale.
      Specified by:
      getDecimal in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      precision - the precision of the decimal
      scale - the scale of the decimal
      Returns:
      the Decimal value at rowId, or null if the value is null
      Throws:
      UnsupportedOperationException - if this column is not of decimal type
    • getUTF8String

      public org.apache.spark.unsafe.types.UTF8String getUTF8String(int rowId)
      Returns the UTF8String value at the specified row.
      Specified by:
      getUTF8String in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      Returns:
      the UTF8String value at rowId, or null if the value is null
      Throws:
      UnsupportedOperationException - if this column is not of string type
    • getBinary

      public byte[] getBinary(int rowId)
      Returns the binary data (byte array) at the specified row.
      Specified by:
      getBinary in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      Returns:
      the byte array at rowId, or null if the value is null
      Throws:
      UnsupportedOperationException - if this column is not of binary type
    • getArray

      public org.apache.spark.sql.vectorized.ColumnarArray getArray(int rowId)
      Returns the array value at the specified row.
      Specified by:
      getArray in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      Returns:
      the ColumnarArray at rowId, or null if the value is null
      Throws:
      UnsupportedOperationException - if this column is not of array type
    • getMap

      public org.apache.spark.sql.vectorized.ColumnarMap getMap(int rowId)
      Returns the map value at the specified row.
      Specified by:
      getMap in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      rowId - the row index
      Returns:
      the ColumnarMap at rowId, or null if the value is null
      Throws:
      UnsupportedOperationException - if this column is not of map type
    • getChild

      public VortexArrowColumnVector getChild(int ordinal)
      Returns the child column at the specified ordinal.

      This is used for complex types like structs where each field is represented as a child column.

      Specified by:
      getChild in class org.apache.spark.sql.vectorized.ColumnVector
      Parameters:
      ordinal - the index of the child column
      Returns:
      the child VortexArrowColumnVector at the specified ordinal
      Throws:
      ArrayIndexOutOfBoundsException - if ordinal is out of bounds