DataFusion#
The vortex-datafusion crate integrates Vortex as a native DataFusion FileFormat, supporting
both reads and writes with filter, projection, and limit pushdown.
Setup#
Add the dependency:
[dependencies]
vortex-datafusion = "<version>"
Register the Vortex format with a SessionContext:
use std::sync::Arc;
use datafusion::datasource::provider::DefaultTableFactory;
use datafusion::execution::SessionStateBuilder;
use datafusion::prelude::SessionContext;
use datafusion_common::GetExt;
use object_store::memory::InMemory;
use crate::VortexFormatFactory;
let factory = Arc::new(VortexFormatFactory::new());
let state = SessionStateBuilder::new()
.with_default_features()
.with_table_factory(
factory.get_ext().to_uppercase(),
Arc::new(DefaultTableFactory::new()),
)
.with_file_formats(vec![factory])
.build();
let ctx = SessionContext::new_with_state(state).enable_url_table();
Reading Vortex Files#
SQL#
Create an external table and query it:
ctx.sql(
"CREATE EXTERNAL TABLE my_table \
(name VARCHAR NOT NULL, age INT NOT NULL) \
STORED AS vortex \
LOCATION '/demo/'",
)
.await?;
Rust API#
You can also register a ListingTable directly:
let ctx = SessionContext::new();
let format = Arc::new(VortexFormat::new(session));
let table_url = ListingTableUrl::parse(
filepath
.to_str()
.ok_or_else(|| vortex_err!("Path is not valid UTF-8"))?,
)?;
let config = ListingTableConfig::new(table_url)
.with_listing_options(
ListingOptions::new(format).with_session_config_options(ctx.state().config()),
)
.infer_schema(&ctx.state())
.await?;
let listing_table = Arc::new(ListingTable::try_new(config)?);
ctx.register_table("vortex_tbl", listing_table as _)?;
Writing Vortex Files#
Write query results to Vortex using INSERT INTO:
ctx.sql(
"INSERT INTO my_table VALUES \
('Alice', 30), ('Bob', 25), ('Charlie', 35), ('Diana', 28)",
)
.await?
.collect()
.await?;
Partitioned writes are supported — DataFusion automatically creates subdirectories for each partition value.
Querying#
Filters and projections are pushed down into the Vortex scan:
let result = ctx
.sql("SELECT name, age FROM my_table WHERE age > 28 ORDER BY age")
.await?
.collect()
.await?;
Pushdown Support#
The integration pushes the following operations into the Vortex scan:
Projections — only referenced columns are read and decompressed.
Filters — comparison (
=,<,>), logical (AND,OR,NOT),IN,LIKE,IS NULL, and cast expressions are evaluated during the scan. Unsupported filters fall back to DataFusion post-scan evaluation.Limits — applied at the scan level when no filter is present.
File pruning — files are eliminated without being opened based on partition values and file-level column statistics (min/max).