DataFusion#

The vortex-datafusion crate integrates Vortex as a native DataFusion FileFormat, supporting both reads and writes with filter, projection, and limit pushdown.

Setup#

Add the dependency:

[dependencies]
vortex-datafusion = "<version>"

Register the Vortex format with a SessionContext:

use std::sync::Arc;

use datafusion::datasource::provider::DefaultTableFactory;
use datafusion::execution::SessionStateBuilder;
use datafusion::prelude::SessionContext;
use datafusion_common::GetExt;
use object_store::memory::InMemory;

use crate::VortexFormatFactory;

let factory = Arc::new(VortexFormatFactory::new());
let state = SessionStateBuilder::new()
    .with_default_features()
    .with_table_factory(
        factory.get_ext().to_uppercase(),
        Arc::new(DefaultTableFactory::new()),
    )
    .with_file_formats(vec![factory])
    .build();
let ctx = SessionContext::new_with_state(state).enable_url_table();

Reading Vortex Files#

SQL#

Create an external table and query it:

ctx.sql(
    "CREATE EXTERNAL TABLE my_table \
        (name VARCHAR NOT NULL, age INT NOT NULL) \
    STORED AS vortex \
    LOCATION '/demo/'",
)
.await?;

Rust API#

You can also register a ListingTable directly:

let ctx = SessionContext::new();
let format = Arc::new(VortexFormat::new(session));
let table_url = ListingTableUrl::parse(
    filepath
        .to_str()
        .ok_or_else(|| vortex_err!("Path is not valid UTF-8"))?,
)?;
let config = ListingTableConfig::new(table_url)
    .with_listing_options(
        ListingOptions::new(format).with_session_config_options(ctx.state().config()),
    )
    .infer_schema(&ctx.state())
    .await?;

let listing_table = Arc::new(ListingTable::try_new(config)?);
ctx.register_table("vortex_tbl", listing_table as _)?;

Writing Vortex Files#

Write query results to Vortex using INSERT INTO:

ctx.sql(
    "INSERT INTO my_table VALUES \
        ('Alice', 30), ('Bob', 25), ('Charlie', 35), ('Diana', 28)",
)
.await?
.collect()
.await?;

Partitioned writes are supported — DataFusion automatically creates subdirectories for each partition value.

Querying#

Filters and projections are pushed down into the Vortex scan:

let result = ctx
    .sql("SELECT name, age FROM my_table WHERE age > 28 ORDER BY age")
    .await?
    .collect()
    .await?;

Pushdown Support#

The integration pushes the following operations into the Vortex scan:

  • Projections — only referenced columns are read and decompressed.

  • Filters — comparison (=, <, >), logical (AND, OR, NOT), IN, LIKE, IS NULL, and cast expressions are evaluated during the scan. Unsupported filters fall back to DataFusion post-scan evaluation.

  • Limits — applied at the scan level when no filter is present.

  • File pruning — files are eliminated without being opened based on partition values and file-level column statistics (min/max).