Scanning#
The scan API provides a builder pattern for reading data from a Vortex file with optional
filter, projection, row range, and limit pushdowns. The resulting stream exposes the
Arrow C Data Interface (ArrowArrayStream).
ScanBuilder#
-
class ScanBuilder#
Public Functions
-
ScanBuilder &WithFilter(expr::Expr &&expr) &#
Only include rows that match the filter expressions.
-
ScanBuilder &WithProjection(expr::Expr &&expr) &#
Only include columns that match the projection expressions.
-
ScanBuilder &WithRowRange(uint64_t row_range_start, uint64_t row_range_end) &#
Only include rows in the range [row_range_start, row_range_end).
-
ScanBuilder &WithIncludeByIndex(const uint64_t *indices, std::size_t size) &#
Only include rows with the given indices.
-
ScanBuilder &WithLimit(uint64_t limit) &#
Set the limit on the number of rows to scan out.
-
ScanBuilder &WithOutputSchema(ArrowSchema &output_schema) &#
Set the output schema on the scan builder. TODO: currently if pass in this option, the schema needs to be the schema after adding projection.
-
ArrowArrayStream IntoStream() &&#
Take ownership and consume the scan builder to a stream of record batches.
-
StreamDriver IntoStreamDriver() &&#
Take ownership and consume the scan builder to a stream driver. Under the hood, this function calls
ScanBuilder::into_record_batch_readerand holds aWorkStealingArrayIteratorin StreamDriver.
-
ScanBuilder &WithFilter(expr::Expr &&expr) &#
StreamDriver#
-
class StreamDriver#
The StreamDriver internally holds a
RecordBatchIteratorAdapterfrom the Rust side, which is thread-safe and cloneable. TheRecordBatchIteratorAdapterinternally holds aWorkStealingArrayIterator.Public Functions
-
ArrowArrayStream CreateArrayStream() const#
Create a stream of record batches.
This function is thread-safe and can be called from multiple threads to create one stream per thread to make progress on the same StreamDriver that is built from a ScanBuilder concurrently.
Within each thread, the record batches will be emitted in the original order they are within the scan. Between threads, the order is not guaranteed.
Example: If the scan contains batches [b0, b1, b2, b3, b4, b5] and two threads call this function respectively to make progress on their own stream, Thread 1 might receive [b0, b2, b4] and Thread 2 might receive [b1, b3, b5]. Each thread maintains order within its subset, but overall ordering between threads is not guaranteed (e.g., Thread 2 could emit b1 before Thread 1 emits b0).
-
ArrowArrayStream CreateArrayStream() const#