dev.vortex.spark.read.VortexColumnarBatchIterator

All Implemented Interfaces:: AutoCloseable, Iterator<org.apache.spark.sql.vectorized.ColumnarBatch>

public final class VortexColumnarBatchIterator extends Object implements Iterator<org.apache.spark.sql.vectorized.ColumnarBatch>, AutoCloseable

Iterator that converts Vortex Arrays into Spark ColumnarBatch objects.

This iterator wraps a Vortex ArrayIterator and converts each Array into a Spark ColumnarBatch by exporting the data to Arrow format and wrapping it with VortexArrowColumnVector instances. The iterator uses prefetching to optimize memory usage and performance by batching arrays up to a maximum buffer size.

The iterator maintains a reusable VectorSchemaRoot to minimize allocation overhead when converting between Vortex and Arrow formats.

See Also:

ArrayIterator
ColumnarBatch
VortexArrowColumnVector

Field Summary

Fields

Modifier and Type

Field

Description

static final long

MAX_BUFFER_BYTES

Maximum buffer size in bytes for prefetching arrays.
Constructor Summary

Constructors

Constructor

Description

VortexColumnarBatchIterator(dev.vortex.api.ArrayIterator backing)

Creates a new VortexColumnarBatchIterator that wraps the given ArrayIterator.
Method Summary

Modifier and Type

Method

Description

void

close()

Closes this iterator and releases all associated resources.

boolean

hasNext()

Returns whether there are more columnar batches available.

org.apache.spark.sql.vectorized.ColumnarBatch

next()

Returns the next columnar batch from the iterator.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.util.Iterator
forEachRemaining, remove

Field Details
- MAX_BUFFER_BYTES
  
  public static final long MAX_BUFFER_BYTES
  
  Maximum buffer size in bytes for prefetching arrays.
  The iterator will prefetch and batch arrays until this size limit is reached, which helps optimize memory usage and reduces the overhead of converting small arrays individually.
  See Also:
  
  Constant Field Values
Constructor Details
- VortexColumnarBatchIterator
  
  public VortexColumnarBatchIterator(dev.vortex.api.ArrayIterator backing)
  
  Creates a new VortexColumnarBatchIterator that wraps the given ArrayIterator.
  The iterator will use prefetching to batch arrays up to MAX_BUFFER_BYTES to optimize memory usage and conversion performance.
  
  Parameters:
  
  backing - the underlying ArrayIterator to wrap
Method Details
- hasNext
  
  public boolean hasNext()
  
  Returns whether there are more columnar batches available.
  
  Specified by:
  
  hasNext in interface Iterator<org.apache.spark.sql.vectorized.ColumnarBatch>
  
  Returns:
  
  true if there are more batches to iterate over, false otherwise
- next
  
  public org.apache.spark.sql.vectorized.ColumnarBatch next()
  
  Returns the next columnar batch from the iterator.
  This method retrieves the next Array from the prefetching iterator, exports it to Arrow format using a reusable VectorSchemaRoot, and wraps each field vector in a VortexArrowColumnVector to create a VortexColumnarBatch.
  
  Specified by:
  
  next in interface Iterator<org.apache.spark.sql.vectorized.ColumnarBatch>
  
  Returns:
  
  the next ColumnarBatch containing the data from the next Array
  
  Throws:
  
  NoSuchElementException - if there are no more elements
- close
  
  public void close()
  
  Closes this iterator and releases all associated resources.
  This method closes the prefetching iterator, the backing ArrayIterator, and the reusable VectorSchemaRoot if it exists.
  
  Specified by:
  
  close in interface AutoCloseable

Class VortexColumnarBatchIterator

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface java.util.Iterator

Field Details

MAX_BUFFER_BYTES

Constructor Details

VortexColumnarBatchIterator

Method Details

hasNext

next

close