dev.vortex.spark.VortexDataSourceV2

All Implemented Interfaces:: org.apache.spark.sql.connector.catalog.TableProvider, org.apache.spark.sql.sources.DataSourceRegister

public final class VortexDataSourceV2 extends Object implements org.apache.spark.sql.connector.catalog.TableProvider, org.apache.spark.sql.sources.DataSourceRegister

Spark V2 data source for reading and writing Vortex files.

This class is automatically registered so it can be discovered by the Spark runtime. For reading: SparkSession.read() and specify the format as "vortex". For writing: Dataset.write() and specify the format as "vortex".

Constructor Summary

Constructors

Constructor

Description

VortexDataSourceV2()

Creates a new instance of the Vortex data source.
Method Summary

Modifier and Type

Method

Description

org.apache.spark.sql.connector.catalog.Table

getTable(org.apache.spark.sql.types.StructType schema, org.apache.spark.sql.connector.expressions.Transform[] _partitioning, Map<String,String> properties)

Creates a Vortex table instance with the given schema and properties.

org.apache.spark.sql.types.StructType

inferSchema(org.apache.spark.sql.util.CaseInsensitiveStringMap options)

Infers the schema of the Vortex files specified in the options.

String

shortName()

Returns the short name identifier for this data source.

boolean

supportsExternalMetadata()

Indicates whether this data source supports external metadata (schemas).

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.sql.connector.catalog.TableProvider
inferPartitioning

Constructor Details
- VortexDataSourceV2
  
  public VortexDataSourceV2()
  
  Creates a new instance of the Vortex data source.
  This no-argument constructor is required for Spark to instantiate the data source through reflection.
Method Details
- inferSchema
  
  public org.apache.spark.sql.types.StructType inferSchema(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
  
  Infers the schema of the Vortex files specified in the options.
  This method examines the last file in the provided paths to determine the schema. Currently, schema evolution and merging across multiple files is not supported.
  
  Specified by:
  
  inferSchema in interface org.apache.spark.sql.connector.catalog.TableProvider
  
  Parameters:
  
  options - the data source options containing file paths
  
  Returns:
  
  the inferred Spark SQL schema
  
  Throws:
  
  RuntimeException - if required path options are missing
  
  RuntimeException - if there's an error reading the file or converting the schema
- getTable
  
  public org.apache.spark.sql.connector.catalog.Table getTable(org.apache.spark.sql.types.StructType schema, org.apache.spark.sql.connector.expressions.Transform[] _partitioning, Map<String,String> properties)
  
  Creates a Vortex table instance with the given schema and properties.
  This method creates a VortexWritableTable that can be used to both read from and write to Vortex files. The partitioning parameter is currently ignored.
  
  Specified by:
  
  getTable in interface org.apache.spark.sql.connector.catalog.TableProvider
  
  Parameters:
  
  schema - the table schema
  
  _partitioning - table partitioning transforms (currently ignored)
  
  properties - the table properties containing file paths and other options
  
  Returns:
  
  a VortexTable instance for reading and writing data
  
  Throws:
  
  RuntimeException - if required path properties are missing
- supportsExternalMetadata
  
  public boolean supportsExternalMetadata()
  
  Indicates whether this data source supports external metadata (schemas).
  Returns true to indicate that this data source accepts external schemas, which is necessary for write operations where the DataFrame provides the schema.
  
  Specified by:
  
  supportsExternalMetadata in interface org.apache.spark.sql.connector.catalog.TableProvider
  
  Returns:
  
  true to accept external schemas
- shortName
  
  public String shortName()
  
  Returns the short name identifier for this data source.
  This name is used by Spark when registering the data source and can be used in SQL queries and DataFrame read operations to specify this format.
  
  Specified by:
  
  shortName in interface org.apache.spark.sql.sources.DataSourceRegister
  
  Returns:
  
  the short name "vortex"

Class VortexDataSourceV2

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.sql.connector.catalog.TableProvider

Constructor Details

VortexDataSourceV2

Method Details

inferSchema

getTable

supportsExternalMetadata

shortName