Class VortexColumnarBatchIterator
- All Implemented Interfaces:
AutoCloseable
,Iterator<org.apache.spark.sql.vectorized.ColumnarBatch>
This iterator wraps a Vortex ArrayIterator and converts each Array into a Spark ColumnarBatch by exporting the data to Arrow format and wrapping it with VortexArrowColumnVector instances. The iterator uses prefetching to optimize memory usage and performance by batching arrays up to a maximum buffer size.
The iterator maintains a reusable VectorSchemaRoot to minimize allocation overhead when converting between Vortex and Arrow formats.
- See Also:
-
ArrayIterator
ColumnarBatch
VortexArrowColumnVector
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final long
Maximum buffer size in bytes for prefetching arrays. -
Constructor Summary
ConstructorsConstructorDescriptionVortexColumnarBatchIterator
(dev.vortex.api.ArrayIterator backing) Creates a new VortexColumnarBatchIterator that wraps the given ArrayIterator. -
Method Summary
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface java.util.Iterator
forEachRemaining, remove
-
Field Details
-
MAX_BUFFER_BYTES
public static final long MAX_BUFFER_BYTESMaximum buffer size in bytes for prefetching arrays.The iterator will prefetch and batch arrays until this size limit is reached, which helps optimize memory usage and reduces the overhead of converting small arrays individually.
- See Also:
-
-
Constructor Details
-
VortexColumnarBatchIterator
public VortexColumnarBatchIterator(dev.vortex.api.ArrayIterator backing) Creates a new VortexColumnarBatchIterator that wraps the given ArrayIterator.The iterator will use prefetching to batch arrays up to MAX_BUFFER_BYTES to optimize memory usage and conversion performance.
- Parameters:
backing
- the underlying ArrayIterator to wrap
-
-
Method Details
-
hasNext
public boolean hasNext()Returns whether there are more columnar batches available. -
next
public org.apache.spark.sql.vectorized.ColumnarBatch next()Returns the next columnar batch from the iterator.This method retrieves the next Array from the prefetching iterator, exports it to Arrow format using a reusable VectorSchemaRoot, and wraps each field vector in a VortexArrowColumnVector to create a VortexColumnarBatch.
- Specified by:
next
in interfaceIterator<org.apache.spark.sql.vectorized.ColumnarBatch>
- Returns:
- the next ColumnarBatch containing the data from the next Array
- Throws:
NoSuchElementException
- if there are no more elements
-
close
public void close()Closes this iterator and releases all associated resources.This method closes the prefetching iterator, the backing ArrayIterator, and the reusable VectorSchemaRoot if it exists.
- Specified by:
close
in interfaceAutoCloseable
-