# Vortex Layouts

Layouts share many similarities with [Vortex Arrays](/concepts/arrays). They are hierarchical, they have an associated
vtable, and they have some number of buffers. The main difference is that the buffers of a layout are lazily fetched
and remotely stored.

This allows layouts to perform pruning of unused chunks and columns, without tying the logic to a specific file-based
storage format, and without prescribing the column and row partitioning that a Vortex file can use.

In fact, layouts provide a mechanism to perform efficient scanning of columnar data over any storage medium.
The buffers might live in-memory, in a single file on-disk, split across many files, in a remote Redis, in Postgres
block storage, or anywhere else that you can implement key/value blob storage.

In psuedo-code, a layout might look like this (note that unlike arrays, layouts use u64 lengths to support larger-than
memory data):

```rust
struct Layout {
    vtable: LayoutVTable,
    metadata: [u8],
    dtype: DType,
    length: u64,
    children: [Layout],
    buffers: [BufferId],
}
```

**Owned vs Viewed**

As with other possibly large recursive data structures in Vortex, layouts can be either _owned_ or _viewed_.
Owned layouts are heap-allocated, while viewed layouts are lazily unwrapped from an underlying FlatBuffer
representation. This allows Vortex to efficiently load and work with very wide schemas without needing to deserialize
the full layout.

## VTable

The vtable of a layout is much smaller than that of an array. It looks something like this:

* `id`: returns the unique identifier for the layout type.
* `metadata`
    * `validate`: validates the layout's metadata buffer.
    * `display`: returns a human-readable representation of the layout metadata.
* `accept`: a function for accepting a `LayoutVisitor` and walking the layout's children.
* `reader`: constructs a `LayoutReader` given an async source of buffers.

## Built-in Layouts

Vortex provides a few built-in layout types, and will continue to add new layouts as compression strategies improve.

### Flat Layout

A `FlatLayout` simply holds a serialized Vortex array. This can be considered the leaf node of a layout tree.

### Struct Layout

A `StructLayout` holds a collection of named child layouts, corresponding to an associated `StructDType`. This layout
assists with pruning by partitioning the evaluation expression into sub-expressions that can be evaluated over each
of the referenced fields.

### Chunked Layout

A `ChunkedLayout` holds a collection of row-wise partitioned child layouts. This layout assists with pruning by
computing statistics for each child chunk and only fetching chunks that are relevant to the expression being
evaluated.

* `chunks: [Layout]`: the first `n` children of a `ChunkedLayout` are the chunks themselves.
* `statistics: Layout`: the last child is a statistics table, typically a `FlatLayout` (although different
  layouts may be useful if some statistics grow very large, e.g. bloom filters). Each row corresponds to a chunk, and
  the columns hold statistics such as `min`, `max`, `null_count`, that are useful for pruning.

### Future Layouts

There are some additional layouts that we plan to add in the future:

* `DictionaryLayout`: a layout that holds a dictionary of values in one child layout, and a codes array
  (likely chunked) in another child layout.
* `ListLayout`: a layout that separates the offsets and values of a list array into two child layouts, allowing
  for efficient pruning of the values array based on the relevant offsets.
* `MergeLayout`: a struct layout that can split fields of a struct across separate layouts, combining the result back
  into a single struct. This can be useful to isolate outsized columns and use a different chunking strategy, without
  impacting the compression or read performance of the other columns.

## Custom Layouts

As with most parts of Vortex, users can define their own layout types. Reach out on the Vortex GitHub Discussions
page if you need help defining a custom layout.

## Layout Writer

A `LayoutWriter` defines a way to serialize a stream of array chunks into a layout tree. The writer is given a
buffer writer that takes a `ByteBuffer` and returns a `BufferId`. These identifiers are used to construct the layout
tree.

The Rust trait looks like this:

:::{literalinclude} ../../vortex-layout/src/writer.rs
:start-after: [layout writer]
:end-before: [layout writer]
:::

### File-level Compression

While chunk-level compression can be handed off to a compression strategy, i.e. `fn(Array) -> Array`, there
are some compression techniques that benefit from file-level awareness. For example, sharing a dictionary across
all chunks of a column.

To support this with larger-than-memory data these techniques can be implemented inside a `LayoutStrategy`.

For example, a `DictionaryLayoutStrategy` may accumulate a values dictionary in-memory, while flushing chunks of
codes arrays to disk.
If the dictionary grows too large, the strategy can flush the values dictionary, start a new dictionary, and then
wrap both of these `DictionaryLayout` nodes in a new `ChunkedLayout` node.

## Example: Parquet Row Groups

As an example, suppose we want to replicate the behavior of Parquet row groups in Vortex. We would define a layout
strategy that constructed something like the following tree:

* `ChunkedLayout(ChunkBy::RowCount(100_000))` - at the top-level, we define row-groups of at most 100k rows.
    * `StructLayout` - Parquet then splits the row group into individual columns known as column chunks.
        * `ChunkedLayout(ChunkBy::CompressedSize(64k))` - finally, each column chunk is split into pages by compressed
          size.