Class VortexDataSourceV2
- All Implemented Interfaces:
org.apache.spark.sql.connector.catalog.TableProvider
,org.apache.spark.sql.sources.DataSourceRegister
This class is automatically registered so it can be discovered by the Spark runtime.
For reading: SparkSession.read()
and specify the format as "vortex".
For writing: Dataset.write()
and specify the format as "vortex".
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionorg.apache.spark.sql.connector.catalog.Table
getTable
(org.apache.spark.sql.types.StructType schema, org.apache.spark.sql.connector.expressions.Transform[] _partitioning, Map<String, String> properties) Creates a Vortex table instance with the given schema and properties.org.apache.spark.sql.types.StructType
inferSchema
(org.apache.spark.sql.util.CaseInsensitiveStringMap options) Infers the schema of the Vortex files specified in the options.Returns the short name identifier for this data source.boolean
Indicates whether this data source supports external metadata (schemas).Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.sql.connector.catalog.TableProvider
inferPartitioning
-
Constructor Details
-
VortexDataSourceV2
public VortexDataSourceV2()Creates a new instance of the Vortex data source.This no-argument constructor is required for Spark to instantiate the data source through reflection.
-
-
Method Details
-
inferSchema
public org.apache.spark.sql.types.StructType inferSchema(org.apache.spark.sql.util.CaseInsensitiveStringMap options) Infers the schema of the Vortex files specified in the options.This method examines the last file in the provided paths to determine the schema. Currently, schema evolution and merging across multiple files is not supported.
- Specified by:
inferSchema
in interfaceorg.apache.spark.sql.connector.catalog.TableProvider
- Parameters:
options
- the data source options containing file paths- Returns:
- the inferred Spark SQL schema
- Throws:
RuntimeException
- if required path options are missingRuntimeException
- if there's an error reading the file or converting the schema
-
getTable
public org.apache.spark.sql.connector.catalog.Table getTable(org.apache.spark.sql.types.StructType schema, org.apache.spark.sql.connector.expressions.Transform[] _partitioning, Map<String, String> properties) Creates a Vortex table instance with the given schema and properties.This method creates a VortexWritableTable that can be used to both read from and write to Vortex files. The partitioning parameter is currently ignored.
- Specified by:
getTable
in interfaceorg.apache.spark.sql.connector.catalog.TableProvider
- Parameters:
schema
- the table schema_partitioning
- table partitioning transforms (currently ignored)properties
- the table properties containing file paths and other options- Returns:
- a VortexTable instance for reading and writing data
- Throws:
RuntimeException
- if required path properties are missing
-
supportsExternalMetadata
public boolean supportsExternalMetadata()Indicates whether this data source supports external metadata (schemas).Returns true to indicate that this data source accepts external schemas, which is necessary for write operations where the DataFrame provides the schema.
- Specified by:
supportsExternalMetadata
in interfaceorg.apache.spark.sql.connector.catalog.TableProvider
- Returns:
- true to accept external schemas
-
shortName
Returns the short name identifier for this data source.This name is used by Spark when registering the data source and can be used in SQL queries and DataFrame read operations to specify this format.
- Specified by:
shortName
in interfaceorg.apache.spark.sql.sources.DataSourceRegister
- Returns:
- the short name "vortex"
-