Class VortexBatchWrite

java.lang.Object
dev.vortex.spark.write.VortexBatchWrite
All Implemented Interfaces:
Serializable, org.apache.spark.sql.connector.write.BatchWrite, org.apache.spark.sql.connector.write.Write

public final class VortexBatchWrite extends Object implements org.apache.spark.sql.connector.write.Write, org.apache.spark.sql.connector.write.BatchWrite, Serializable
Manages the batch write operation for creating Vortex files.

This class coordinates the distributed write operation across Spark executors, handling the creation of data writers and managing commits/aborts.

See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
    VortexBatchWrite(String outputPath, org.apache.spark.sql.types.StructType schema, Map<String,String> options, boolean overwrite)
    Creates a new VortexBatchWrite.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    abort(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages)
    Aborts the write job due to failures.
    void
    commit(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages)
    Commits the entire write job after all tasks complete successfully.
    org.apache.spark.sql.connector.write.DataWriterFactory
    createBatchWriterFactory(org.apache.spark.sql.connector.write.PhysicalWriteInfo info)
    Creates a DataWriterFactory for producing data writers on executors.
    void
    onDataWriterCommit(org.apache.spark.sql.connector.write.WriterCommitMessage message)
    Called when a single data writer task completes successfully.
    org.apache.spark.sql.connector.write.BatchWrite
    Returns this object as a BatchWrite.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.spark.sql.connector.write.BatchWrite

    useCommitCoordinator

    Methods inherited from interface org.apache.spark.sql.connector.write.Write

    description, supportedCustomMetrics, toStreaming
  • Constructor Details

    • VortexBatchWrite

      public VortexBatchWrite(String outputPath, org.apache.spark.sql.types.StructType schema, Map<String,String> options, boolean overwrite)
      Creates a new VortexBatchWrite.
      Parameters:
      outputPath - the base path where Vortex files will be written
      schema - the schema of the data to write
      options - additional write options
      overwrite - whether to overwrite existing files
  • Method Details

    • toBatch

      public org.apache.spark.sql.connector.write.BatchWrite toBatch()
      Returns this object as a BatchWrite.

      This method is required by the Write interface to support batch writes.

      Specified by:
      toBatch in interface org.apache.spark.sql.connector.write.Write
      Returns:
      this object
    • createBatchWriterFactory

      public org.apache.spark.sql.connector.write.DataWriterFactory createBatchWriterFactory(org.apache.spark.sql.connector.write.PhysicalWriteInfo info)
      Creates a DataWriterFactory for producing data writers on executors.

      This method is called once at the start of the write operation, making it the right place to handle overwrite cleanup.

      Specified by:
      createBatchWriterFactory in interface org.apache.spark.sql.connector.write.BatchWrite
      Returns:
      a new VortexDataWriterFactory
    • onDataWriterCommit

      public void onDataWriterCommit(org.apache.spark.sql.connector.write.WriterCommitMessage message)
      Called when a single data writer task completes successfully.

      This is called for each successful task but individual file commits are handled in the data writer itself.

      Specified by:
      onDataWriterCommit in interface org.apache.spark.sql.connector.write.BatchWrite
      Parameters:
      message - commit message from a successful data writer task
    • commit

      public void commit(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages)
      Commits the entire write job after all tasks complete successfully.

      This finalizes the write operation and ensures all Vortex files are properly written.

      Specified by:
      commit in interface org.apache.spark.sql.connector.write.BatchWrite
      Parameters:
      messages - commit messages from all successful write tasks
    • abort

      public void abort(org.apache.spark.sql.connector.write.WriterCommitMessage[] messages)
      Aborts the write job due to failures.

      This method cleans up any partially written files.

      Specified by:
      abort in interface org.apache.spark.sql.connector.write.BatchWrite
      Parameters:
      messages - commit messages from write tasks (may include failures)