# File Format :::{seealso} The majority of the complexity of the Vortex file format is encapsulated in [Vortex Layouts](/concepts/layouts). Unless you are interested in the specific byte layout of the file, you are probably looking for that documentation! ::: Recall that [Vortex Layouts](/concepts/layouts) provide a mechanism to efficiently query large serialized Vortex arrays. The _Vortex File Format_ is designed to provide a container for these serialized arrays, as well as footer definition that allows efficiently querying the layout. Other considerations for the Vortex file format include: * Backwards compatibility, and (uniquely) forwards compatibility. * Fine-grained encryption. * Efficient access for both local disk and cloud storage. * Minimal overhead reading few columns or rows from wide or long arrays. ## File Specification The Vortex file format has a very small definition, with much of the complexity encapsulated in [Vortex Layouts](/concepts/layouts). ``` <4 bytes> magic number 'VTXF' ... segments of binary data, optionally with inter-segment padding ... postfix data <2 bytes> u16 version tag <2 bytes> u16 postfix length <4 bytes> magic number 'VTXF' ``` The file format begins and ends with the 4-byte magic number `VTXF`. Immediately prior to the trailing magic number are two 16-bit integers: the version tag and the length of the postfix. ### Postfix The postfix contains the locations of the file's root `DType` segment, as well as a `FileLayout` segment containing the root `Layout`, a _segment map_, and other shared configuration such as compression and encryption schemes. :::{literalinclude} ../../vortex-flatbuffers/flatbuffers/vortex-file/footer.fbs :start-after: [postscript] :end-before: [postscript] ::: ### Data Type Both viewed arrays and viewed layouts require an external `DType` to instantiate them. This helps us to avoid redundancy in the serialized format since it is very common for a child array or layout to inherit or infer its data type from the parent type. The root `DType` segment is a flat buffer serialized `DType` object. See [DType Format](/specs/dtype-format) for more information. :::{note} Unlike many columnar formats, the `DType` of a Vortex file is not required to be a `StructDType`. It is perfectly valid to store a `Float64` array, a `Boolean` array, or any other root data type. ::: ### Footer The footer is a flat buffer serialized `Footer` object. This object contains all the information required to load the root `Layout` object into a usable `LayoutReader`. For example, it contains the locations, compression schemes, encryption schemes, and required alignment of all segments in the file. :::{literalinclude} ../../vortex-flatbuffers/flatbuffers/vortex-file/footer.fbs :start-after: [footer] :end-before: [footer] ::: The footer is separated from the Data Type such that large schemas can be omitted from the file if they can be shared or fetched from an external source. ## Backward Compatibility Backward compatability guarantees that any **old** Vortex file can be read by **newer** versions of the Vortex library. The Vortex File Format is currently considered unstable. We are aiming for an 0.x release in Q1 2025 that guarantees no breaking changes within each minor version of Vortex, and a 1.0 release in H2 2025 that guarantees no breaking changes within a major version of Vortex. Please upvote or comment on the [GitHub issue](https://github.com/spiraldb/vortex/issues/2077) if you would like to see a stable release sooner. (forward-compatibility)= ## Forward Compatibility :::{note} Forward compatibility is planned to ship prior to the 1.0 release. ::: Forward compatibility guarantees that any **new** Vortex file can be read by **older** versions of the Vortex library. This rare feature allows us to continue to evolve the Vortex File Format, avoiding calcification and remaining up to date with new compression codecs and layout optimizations - all without breaking existing readers or requiring them to be updated. At write-time, a minimum supported reader version is declared. Any new encodings or layouts are then embedded into the file with WebAssembly decompression logic. Old readers are able to decompress new data (slower than native code, but still with SIMD acceleration) and read the file. New readers are able to make the best use of these encodings with native decompression logic and additional push-down compute functions.