F3: A Wasm-Powered Columnar File Format Built to Outlast Parquet
F3 (Future-proof File Format) is a research prototype columnar storage format from a team spanning Carnegie Mellon, the University of Wisconsin, and Wes McKinney, the creator of pandas and a Parquet veteran. The project targets the structural limits of decade-old formats like Parquet and ORC, which were designed for hardware and workloads that no longer reflect modern analytics. F3 reworks the on-disk storage layout for efficiency while keeping the cross-platform interoperability that made the older formats ubiquitous.
Its defining trick is extensibility through embedded WebAssembly decoders. Each F3 file is self-describing, bundling data, metadata, and the Wasm binaries needed to decode it. Because the decoder ships inside the file, a reader on any platform can interpret the data even when a native decoder is missing, and developers can introduce new encoding schemes through a general-purpose API without forcing a format rewrite or waiting for ecosystem-wide adoption. The embedded decoders add only kilobytes of overhead, per the accompanying ACM SIGMOD paper.
The code is MIT-licensed and organized around a FlatBuffers schema, a Rust proof-of-concept implementation, and a benchmark suite reproducing the paper’s results. The authors are explicit that this is a prototype tested only on a single Intel Debian machine and is not production-ready. The more durable contribution here is architectural: making a file format that can evolve its compression and encoding without breaking compatibility, rather than spawning yet another incompatible standard each time data-processing trends shift.
Read the full article
Continue reading at Hacker News →This is an AI-generated summary. Read the original for the full story.