Class VortexScanBuilder

java.lang.Object
dev.vortex.spark.read.VortexScanBuilder
All Implemented Interfaces:
org.apache.spark.sql.connector.read.ScanBuilder, org.apache.spark.sql.connector.read.SupportsPushDownRequiredColumns, org.apache.spark.sql.connector.read.SupportsPushDownV2Filters

public final class VortexScanBuilder extends Object implements org.apache.spark.sql.connector.read.ScanBuilder, org.apache.spark.sql.connector.read.SupportsPushDownRequiredColumns, org.apache.spark.sql.connector.read.SupportsPushDownV2Filters
Spark V2 ScanBuilder for table scans over Vortex files.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates a new VortexScanBuilder with empty paths and columns.
    VortexScanBuilder(Map<String,String> formatOptions, org.apache.spark.sql.connector.expressions.Transform[] partitionTransforms)
    Creates a new VortexScanBuilder with empty paths and columns and the supplied partition transforms.
  • Method Summary

    Modifier and Type
    Method
    Description
    addAllColumns(Iterable<org.apache.spark.sql.connector.catalog.Column> columns)
    Adds multiple columns to read.
    Adds multiple file paths to scan.
    addColumn(org.apache.spark.sql.connector.catalog.Column column)
    Adds a column to read.
    Adds a file path to scan.
    org.apache.spark.sql.connector.read.Scan
    Builds a VortexScan with the configured paths and columns.
    void
    pruneColumns(org.apache.spark.sql.types.StructType requiredSchema)
    Prunes the columns to only include those specified in the required schema.
    org.apache.spark.sql.connector.expressions.filter.Predicate[]
    Returns the predicates this scan promises to apply.
    org.apache.spark.sql.connector.expressions.filter.Predicate[]
    pushPredicates(org.apache.spark.sql.connector.expressions.filter.Predicate[] predicates)
    Splits the supplied predicates into pushed and not-pushed sets.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • VortexScanBuilder

      public VortexScanBuilder(Map<String,String> formatOptions)
      Creates a new VortexScanBuilder with empty paths and columns.
    • VortexScanBuilder

      public VortexScanBuilder(Map<String,String> formatOptions, org.apache.spark.sql.connector.expressions.Transform[] partitionTransforms)
      Creates a new VortexScanBuilder with empty paths and columns and the supplied partition transforms. Filters that reference partition columns are not pushed down, since the partition columns are not stored inside the Vortex files.
  • Method Details

    • addPath

      public VortexScanBuilder addPath(String path)
      Adds a file path to scan.
      Parameters:
      path - the file path to add
      Returns:
      this builder for method chaining
    • addColumn

      public VortexScanBuilder addColumn(org.apache.spark.sql.connector.catalog.Column column)
      Adds a column to read.
      Parameters:
      column - the column to add
      Returns:
      this builder for method chaining
    • addAllPaths

      public VortexScanBuilder addAllPaths(Iterable<String> paths)
      Adds multiple file paths to scan.
      Parameters:
      paths - the iterable of file paths to add
      Returns:
      this builder for method chaining
    • addAllColumns

      public VortexScanBuilder addAllColumns(Iterable<org.apache.spark.sql.connector.catalog.Column> columns)
      Adds multiple columns to read.
      Parameters:
      columns - the iterable of columns to add
      Returns:
      this builder for method chaining
    • build

      public org.apache.spark.sql.connector.read.Scan build()
      Builds a VortexScan with the configured paths and columns.
      Specified by:
      build in interface org.apache.spark.sql.connector.read.ScanBuilder
      Returns:
      a new VortexScan instance
      Throws:
      IllegalStateException - if no paths or columns have been added
    • pruneColumns

      public void pruneColumns(org.apache.spark.sql.types.StructType requiredSchema)
      Prunes the columns to only include those specified in the required schema.

      This method clears the current column list and replaces it with columns derived from the required schema. Currently only supports top-level schema pruning - deeply nested schema pruning is not yet implemented.

      Specified by:
      pruneColumns in interface org.apache.spark.sql.connector.read.SupportsPushDownRequiredColumns
      Parameters:
      requiredSchema - the schema specifying which columns are required
    • pushPredicates

      public org.apache.spark.sql.connector.expressions.filter.Predicate[] pushPredicates(org.apache.spark.sql.connector.expressions.filter.Predicate[] predicates)
      Splits the supplied predicates into pushed and not-pushed sets.

      A predicate is pushed when it references only data columns (not partition columns) and uses operators and literal types that SparkPredicateToVortexExpression can map to Vortex expressions. Predicates that reference partition columns or use unsupported features are returned to Spark for post-scan evaluation.

      Specified by:
      pushPredicates in interface org.apache.spark.sql.connector.read.SupportsPushDownV2Filters
      Returns:
      the predicates that Spark must still evaluate
    • pushedPredicates

      public org.apache.spark.sql.connector.expressions.filter.Predicate[] pushedPredicates()
      Returns the predicates this scan promises to apply.
      Specified by:
      pushedPredicates in interface org.apache.spark.sql.connector.read.SupportsPushDownV2Filters