Package dev.vortex.spark.write
Class VortexBatchWrite
java.lang.Object
dev.vortex.spark.write.VortexBatchWrite
- All Implemented Interfaces:
Serializable
,org.apache.spark.sql.connector.write.BatchWrite
,org.apache.spark.sql.connector.write.Write
public final class VortexBatchWrite
extends Object
implements org.apache.spark.sql.connector.write.Write, org.apache.spark.sql.connector.write.BatchWrite, Serializable
Manages the batch write operation for creating Vortex files.
This class coordinates the distributed write operation across Spark executors, handling the creation of data writers and managing commits/aborts.
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionVortexBatchWrite
(String outputPath, org.apache.spark.sql.types.StructType schema, Map<String, String> options, boolean overwrite) Creates a new VortexBatchWrite. -
Method Summary
Modifier and TypeMethodDescriptionvoid
abort
(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages) Aborts the write job due to failures.void
commit
(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages) Commits the entire write job after all tasks complete successfully.org.apache.spark.sql.connector.write.DataWriterFactory
createBatchWriterFactory
(org.apache.spark.sql.connector.write.PhysicalWriteInfo info) Creates a DataWriterFactory for producing data writers on executors.void
onDataWriterCommit
(org.apache.spark.sql.connector.write.WriterCommitMessage message) Called when a single data writer task completes successfully.org.apache.spark.sql.connector.write.BatchWrite
toBatch()
Returns this object as a BatchWrite.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.sql.connector.write.BatchWrite
useCommitCoordinator
Methods inherited from interface org.apache.spark.sql.connector.write.Write
description, supportedCustomMetrics, toStreaming
-
Constructor Details
-
VortexBatchWrite
public VortexBatchWrite(String outputPath, org.apache.spark.sql.types.StructType schema, Map<String, String> options, boolean overwrite) Creates a new VortexBatchWrite.- Parameters:
outputPath
- the base path where Vortex files will be writtenschema
- the schema of the data to writeoptions
- additional write optionsoverwrite
- whether to overwrite existing files
-
-
Method Details
-
toBatch
public org.apache.spark.sql.connector.write.BatchWrite toBatch()Returns this object as a BatchWrite.This method is required by the Write interface to support batch writes.
- Specified by:
toBatch
in interfaceorg.apache.spark.sql.connector.write.Write
- Returns:
- this object
-
createBatchWriterFactory
public org.apache.spark.sql.connector.write.DataWriterFactory createBatchWriterFactory(org.apache.spark.sql.connector.write.PhysicalWriteInfo info) Creates a DataWriterFactory for producing data writers on executors.This method is called once at the start of the write operation, making it the right place to handle overwrite cleanup.
- Specified by:
createBatchWriterFactory
in interfaceorg.apache.spark.sql.connector.write.BatchWrite
- Returns:
- a new VortexDataWriterFactory
-
onDataWriterCommit
public void onDataWriterCommit(org.apache.spark.sql.connector.write.WriterCommitMessage message) Called when a single data writer task completes successfully.This is called for each successful task but individual file commits are handled in the data writer itself.
- Specified by:
onDataWriterCommit
in interfaceorg.apache.spark.sql.connector.write.BatchWrite
- Parameters:
message
- commit message from a successful data writer task
-
commit
public void commit(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages) Commits the entire write job after all tasks complete successfully.This finalizes the write operation and ensures all Vortex files are properly written.
- Specified by:
commit
in interfaceorg.apache.spark.sql.connector.write.BatchWrite
- Parameters:
messages
- commit messages from all successful write tasks
-
abort
public void abort(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages) Aborts the write job due to failures.This method cleans up any partially written files.
- Specified by:
abort
in interfaceorg.apache.spark.sql.connector.write.BatchWrite
- Parameters:
messages
- commit messages from write tasks (may include failures)
-