Package dev.vortex.spark.read
Class VortexScanBuilder
java.lang.Object
dev.vortex.spark.read.VortexScanBuilder
- All Implemented Interfaces:
org.apache.spark.sql.connector.read.ScanBuilder,org.apache.spark.sql.connector.read.SupportsPushDownRequiredColumns,org.apache.spark.sql.connector.read.SupportsPushDownV2Filters
public final class VortexScanBuilder
extends Object
implements org.apache.spark.sql.connector.read.ScanBuilder, org.apache.spark.sql.connector.read.SupportsPushDownRequiredColumns, org.apache.spark.sql.connector.read.SupportsPushDownV2Filters
Spark V2
ScanBuilder for table scans over Vortex files.-
Constructor Summary
ConstructorsConstructorDescriptionVortexScanBuilder(Map<String, String> formatOptions) Creates a new VortexScanBuilder with empty paths and columns.VortexScanBuilder(Map<String, String> formatOptions, org.apache.spark.sql.connector.expressions.Transform[] partitionTransforms) Creates a new VortexScanBuilder with empty paths and columns and the supplied partition transforms. -
Method Summary
Modifier and TypeMethodDescriptionaddAllColumns(Iterable<org.apache.spark.sql.connector.catalog.Column> columns) Adds multiple columns to read.addAllPaths(Iterable<String> paths) Adds multiple file paths to scan.addColumn(org.apache.spark.sql.connector.catalog.Column column) Adds a column to read.Adds a file path to scan.org.apache.spark.sql.connector.read.Scanbuild()Builds a VortexScan with the configured paths and columns.voidpruneColumns(org.apache.spark.sql.types.StructType requiredSchema) Prunes the columns to only include those specified in the required schema.org.apache.spark.sql.connector.expressions.filter.Predicate[]Returns the predicates this scan promises to apply.org.apache.spark.sql.connector.expressions.filter.Predicate[]pushPredicates(org.apache.spark.sql.connector.expressions.filter.Predicate[] predicates) Splits the supplied predicates into pushed and not-pushed sets.
-
Constructor Details
-
VortexScanBuilder
Creates a new VortexScanBuilder with empty paths and columns. -
VortexScanBuilder
public VortexScanBuilder(Map<String, String> formatOptions, org.apache.spark.sql.connector.expressions.Transform[] partitionTransforms) Creates a new VortexScanBuilder with empty paths and columns and the supplied partition transforms. Filters that reference partition columns are not pushed down, since the partition columns are not stored inside the Vortex files.
-
-
Method Details
-
addPath
Adds a file path to scan.- Parameters:
path- the file path to add- Returns:
- this builder for method chaining
-
addColumn
Adds a column to read.- Parameters:
column- the column to add- Returns:
- this builder for method chaining
-
addAllPaths
Adds multiple file paths to scan.- Parameters:
paths- the iterable of file paths to add- Returns:
- this builder for method chaining
-
addAllColumns
public VortexScanBuilder addAllColumns(Iterable<org.apache.spark.sql.connector.catalog.Column> columns) Adds multiple columns to read.- Parameters:
columns- the iterable of columns to add- Returns:
- this builder for method chaining
-
build
public org.apache.spark.sql.connector.read.Scan build()Builds a VortexScan with the configured paths and columns.- Specified by:
buildin interfaceorg.apache.spark.sql.connector.read.ScanBuilder- Returns:
- a new VortexScan instance
- Throws:
IllegalStateException- if no paths or columns have been added
-
pruneColumns
public void pruneColumns(org.apache.spark.sql.types.StructType requiredSchema) Prunes the columns to only include those specified in the required schema.This method clears the current column list and replaces it with columns derived from the required schema. Currently only supports top-level schema pruning - deeply nested schema pruning is not yet implemented.
- Specified by:
pruneColumnsin interfaceorg.apache.spark.sql.connector.read.SupportsPushDownRequiredColumns- Parameters:
requiredSchema- the schema specifying which columns are required
-
pushPredicates
public org.apache.spark.sql.connector.expressions.filter.Predicate[] pushPredicates(org.apache.spark.sql.connector.expressions.filter.Predicate[] predicates) Splits the supplied predicates into pushed and not-pushed sets.A predicate is pushed when it references only data columns (not partition columns) and uses operators and literal types that
SparkPredicateToVortexExpressioncan map to Vortex expressions. Predicates that reference partition columns or use unsupported features are returned to Spark for post-scan evaluation.- Specified by:
pushPredicatesin interfaceorg.apache.spark.sql.connector.read.SupportsPushDownV2Filters- Returns:
- the predicates that Spark must still evaluate
-
pushedPredicates
public org.apache.spark.sql.connector.expressions.filter.Predicate[] pushedPredicates()Returns the predicates this scan promises to apply.- Specified by:
pushedPredicatesin interfaceorg.apache.spark.sql.connector.read.SupportsPushDownV2Filters
-