Spark#

Vortex provides a Spark DataSource V2 connector for reading and writing Vortex files. The connector is published to Maven Central as dev.vortex:vortex-spark.

Installation#

Add the dependency to your build. The connector is built against Spark 4.x with Scala 2.13.

Gradle (Kotlin)

implementation("dev.vortex:vortex-spark:<version>")

Maven

<dependency>
    <groupId>dev.vortex</groupId>
    <artifactId>vortex-spark</artifactId>
    <version>${vortex.version}</version>
</dependency>

The connector ships as a shadow JAR that relocates its Arrow, Guava, and Protobuf dependencies to avoid classpath conflicts with Spark.

Reading Vortex Files#

Use the vortex format to read a single file or a directory of Vortex files:

Dataset<Row> df = spark.read()
    .format("vortex")
    .option("path", "/path/to/data.vortex")
    .load();

When pointed at a directory, the connector discovers all .vortex files and creates one read partition per file.

Column pruning is pushed down — only the columns referenced by the query are read from the file.

Writing Vortex Files#

df.write()
    .format("vortex")
    .option("path", "/path/to/output")
    .mode(SaveMode.Overwrite)
    .save();

Each Spark partition produces one output file named part-{partitionId}-{taskId}.vortex.

Write Options#

Option	Default	Description
`vortex.write.batch.size`	2048	Number of rows per batch (1–65536)

Save Modes#

The connector supports all standard Spark save modes: Overwrite, Append, Ignore, and ErrorIfExists.

Supported Types#

Spark Type	Vortex Type
`BooleanType`	Bool
`ByteType`	Int8 / UInt8
`ShortType`	Int16 / UInt16
`IntegerType`	Int32 / UInt32
`LongType`	Int64 / UInt64
`FloatType`	Float32
`DoubleType`	Float64
`StringType`	Utf8
`BinaryType`	Binary
`DecimalType`	Decimal
`DateType`	Date (days)
`TimestampType`	Timestamp (microseconds, UTC)
`TimestampNTZType`	Timestamp (microseconds, no timezone)
`ArrayType`	List
`StructType`	Struct

S3 Support#

The connector supports reading and writing to S3 paths:

Dataset<Row> df = spark.read()
    .format("vortex")
    .option("path", "s3://bucket/path/to/data")
    .load();