Class VortexScanBuilder

java.lang.Object
dev.vortex.spark.read.VortexScanBuilder
All Implemented Interfaces:
org.apache.spark.sql.connector.read.ScanBuilder, org.apache.spark.sql.connector.read.SupportsPushDownRequiredColumns

public final class VortexScanBuilder extends Object implements org.apache.spark.sql.connector.read.ScanBuilder, org.apache.spark.sql.connector.read.SupportsPushDownRequiredColumns
Spark V2 ScanBuilder for table scans over Vortex files.
  • Constructor Details

    • VortexScanBuilder

      public VortexScanBuilder(Map<String,String> formatOptions)
      Creates a new VortexScanBuilder with empty paths and columns.
  • Method Details

    • addPath

      public VortexScanBuilder addPath(String path)
      Adds a file path to scan.
      Parameters:
      path - the file path to add
      Returns:
      this builder for method chaining
    • addColumn

      public VortexScanBuilder addColumn(org.apache.spark.sql.connector.catalog.Column column)
      Adds a column to read.
      Parameters:
      column - the column to add
      Returns:
      this builder for method chaining
    • addAllPaths

      public VortexScanBuilder addAllPaths(Iterable<String> paths)
      Adds multiple file paths to scan.
      Parameters:
      paths - the iterable of file paths to add
      Returns:
      this builder for method chaining
    • addAllColumns

      public VortexScanBuilder addAllColumns(Iterable<org.apache.spark.sql.connector.catalog.Column> columns)
      Adds multiple columns to read.
      Parameters:
      columns - the iterable of columns to add
      Returns:
      this builder for method chaining
    • build

      public org.apache.spark.sql.connector.read.Scan build()
      Builds a VortexScan with the configured paths and columns.
      Specified by:
      build in interface org.apache.spark.sql.connector.read.ScanBuilder
      Returns:
      a new VortexScan instance
      Throws:
      IllegalStateException - if no paths or columns have been added
    • pruneColumns

      public void pruneColumns(org.apache.spark.sql.types.StructType requiredSchema)
      Prunes the columns to only include those specified in the required schema.

      This method clears the current column list and replaces it with columns derived from the required schema. Currently only supports top-level schema pruning - deeply nested schema pruning is not yet implemented.

      Specified by:
      pruneColumns in interface org.apache.spark.sql.connector.read.SupportsPushDownRequiredColumns
      Parameters:
      requiredSchema - the schema specifying which columns are required