Class VortexColumnarBatchIterator

java.lang.Object
dev.vortex.spark.read.VortexColumnarBatchIterator
All Implemented Interfaces:
AutoCloseable, Iterator<org.apache.spark.sql.vectorized.ColumnarBatch>

public final class VortexColumnarBatchIterator extends Object implements Iterator<org.apache.spark.sql.vectorized.ColumnarBatch>, AutoCloseable
Iterator that converts Vortex Arrays into Spark ColumnarBatch objects.

This iterator wraps a Vortex ArrayIterator and converts each Array into a Spark ColumnarBatch by exporting the data to Arrow format and wrapping it with VortexArrowColumnVector instances. The iterator uses prefetching to optimize memory usage and performance by batching arrays up to a maximum buffer size.

The iterator maintains a reusable VectorSchemaRoot to minimize allocation overhead when converting between Vortex and Arrow formats.

See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final long
    Maximum buffer size in bytes for prefetching arrays.
  • Constructor Summary

    Constructors
    Constructor
    Description
    VortexColumnarBatchIterator(dev.vortex.api.ArrayIterator backing)
    Creates a new VortexColumnarBatchIterator that wraps the given ArrayIterator.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Closes this iterator and releases all associated resources.
    boolean
    Returns whether there are more columnar batches available.
    org.apache.spark.sql.vectorized.ColumnarBatch
    Returns the next columnar batch from the iterator.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface java.util.Iterator

    forEachRemaining, remove
  • Field Details

    • MAX_BUFFER_BYTES

      public static final long MAX_BUFFER_BYTES
      Maximum buffer size in bytes for prefetching arrays.

      The iterator will prefetch and batch arrays until this size limit is reached, which helps optimize memory usage and reduces the overhead of converting small arrays individually.

      See Also:
  • Constructor Details

    • VortexColumnarBatchIterator

      public VortexColumnarBatchIterator(dev.vortex.api.ArrayIterator backing)
      Creates a new VortexColumnarBatchIterator that wraps the given ArrayIterator.

      The iterator will use prefetching to batch arrays up to MAX_BUFFER_BYTES to optimize memory usage and conversion performance.

      Parameters:
      backing - the underlying ArrayIterator to wrap
  • Method Details

    • hasNext

      public boolean hasNext()
      Returns whether there are more columnar batches available.
      Specified by:
      hasNext in interface Iterator<org.apache.spark.sql.vectorized.ColumnarBatch>
      Returns:
      true if there are more batches to iterate over, false otherwise
    • next

      public org.apache.spark.sql.vectorized.ColumnarBatch next()
      Returns the next columnar batch from the iterator.

      This method retrieves the next Array from the prefetching iterator, exports it to Arrow format using a reusable VectorSchemaRoot, and wraps each field vector in a VortexArrowColumnVector to create a VortexColumnarBatch.

      Specified by:
      next in interface Iterator<org.apache.spark.sql.vectorized.ColumnarBatch>
      Returns:
      the next ColumnarBatch containing the data from the next Array
      Throws:
      NoSuchElementException - if there are no more elements
    • close

      public void close()
      Closes this iterator and releases all associated resources.

      This method closes the prefetching iterator, the backing ArrayIterator, and the reusable VectorSchemaRoot if it exists.

      Specified by:
      close in interface AutoCloseable