Package dev.vortex.spark.write
Class VortexBatchWrite
java.lang.Object
dev.vortex.spark.write.VortexBatchWrite
- All Implemented Interfaces:
Serializable,org.apache.spark.sql.connector.write.BatchWrite,org.apache.spark.sql.connector.write.Write
public final class VortexBatchWrite
extends Object
implements org.apache.spark.sql.connector.write.Write, org.apache.spark.sql.connector.write.BatchWrite, Serializable
Manages the batch write operation for creating Vortex files.
This class coordinates the distributed write operation across Spark executors, handling the creation of data writers and managing commits/aborts.
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionVortexBatchWrite(String outputPath, org.apache.spark.sql.types.StructType schema, Map<String, String> options, boolean overwrite) Creates a new VortexBatchWrite. -
Method Summary
Modifier and TypeMethodDescriptionvoidabort(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages) Aborts the write job due to failures.voidcommit(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages) Commits the entire write job after all tasks complete successfully.org.apache.spark.sql.connector.write.DataWriterFactorycreateBatchWriterFactory(org.apache.spark.sql.connector.write.PhysicalWriteInfo info) Creates a DataWriterFactory for producing data writers on executors.voidonDataWriterCommit(org.apache.spark.sql.connector.write.WriterCommitMessage message) Called when a single data writer task completes successfully.org.apache.spark.sql.connector.write.BatchWritetoBatch()Returns this object as a BatchWrite.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.sql.connector.write.BatchWrite
useCommitCoordinatorMethods inherited from interface org.apache.spark.sql.connector.write.Write
description, supportedCustomMetrics, toStreaming
-
Constructor Details
-
VortexBatchWrite
public VortexBatchWrite(String outputPath, org.apache.spark.sql.types.StructType schema, Map<String, String> options, boolean overwrite) Creates a new VortexBatchWrite.- Parameters:
outputPath- the base path where Vortex files will be writtenschema- the schema of the data to writeoptions- additional write optionsoverwrite- whether to overwrite existing files
-
-
Method Details
-
toBatch
public org.apache.spark.sql.connector.write.BatchWrite toBatch()Returns this object as a BatchWrite.This method is required by the Write interface to support batch writes.
- Specified by:
toBatchin interfaceorg.apache.spark.sql.connector.write.Write- Returns:
- this object
-
createBatchWriterFactory
public org.apache.spark.sql.connector.write.DataWriterFactory createBatchWriterFactory(org.apache.spark.sql.connector.write.PhysicalWriteInfo info) Creates a DataWriterFactory for producing data writers on executors.This method is called once at the start of the write operation, making it the right place to handle overwrite cleanup.
- Specified by:
createBatchWriterFactoryin interfaceorg.apache.spark.sql.connector.write.BatchWrite- Returns:
- a new VortexDataWriterFactory
-
onDataWriterCommit
public void onDataWriterCommit(org.apache.spark.sql.connector.write.WriterCommitMessage message) Called when a single data writer task completes successfully.This is called for each successful task but individual file commits are handled in the data writer itself.
- Specified by:
onDataWriterCommitin interfaceorg.apache.spark.sql.connector.write.BatchWrite- Parameters:
message- commit message from a successful data writer task
-
commit
public void commit(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages) Commits the entire write job after all tasks complete successfully.This finalizes the write operation and ensures all Vortex files are properly written.
- Specified by:
commitin interfaceorg.apache.spark.sql.connector.write.BatchWrite- Parameters:
messages- commit messages from all successful write tasks
-
abort
public void abort(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages) Aborts the write job due to failures.This method cleans up any partially written files.
- Specified by:
abortin interfaceorg.apache.spark.sql.connector.write.BatchWrite- Parameters:
messages- commit messages from write tasks (may include failures)
-