Resources & links¶
DataPress stands on the shoulders of two excellent columnar engines and the research communities around them. This page collects the external links worth bookmarking — project homepages, benchmarks, comparisons, and the people whose work powers it all.
Engines & projects¶
The two backends DataPress wraps, plus the broader ecosystem they live in.
- Apache DataFusion — the embeddable Rust
query engine behind the
datapress-datafusionbackend. - Apache Arrow — the columnar memory format and IPC protocol DataPress serves over the wire.
- DuckDB — the in-process analytical database behind the
datapress-duckdbbackend. See Why DuckDB for its design goals. - DuckDB-WASM — the browser build that powers the DataPress in-page SQL terminal.
- sqlparser-rs — the SQL parser DataPress uses to validate the raw-SQL endpoint.
- Apache Spark — the distributed comparison point for large-scale workloads.
- pandas — the single-node DataFrame baseline.
- Polars — a fast Rust/Arrow DataFrame library, a common point of comparison with DuckDB and DataFusion.
- ClickHouse — column-oriented OLAP database and author of the ClickBench benchmark.
Benchmarks¶
How the analytical-database world measures itself.
- ClickBench — the de-facto analytical DBMS benchmark; DuckDB, DataFusion, ClickHouse, Spark and dozens more on one leaderboard. (methodology & sources)
- TPC-H — the classic ad-hoc decision-support benchmark (22 queries over a star-ish schema).
- TPC-DS — the larger, more complex decision-support successor to TPC-H (99 queries).
- DataFusion vs DuckDB benchmark — Andrew Lamb's reproducible head-to-head harness.
Comparisons¶
DataPress ships the same HTTP API on top of both engines, so the most direct comparison lives right here in the docs:
- DuckDB vs Arrow + DataFusion — the DataPress side-by-side: startup time, RAM footprint, indexing, point-lookup latency.
For the wider DataFusion / DuckDB / Spark / pandas / Polars landscape, the ClickBench leaderboard and the benchmark harnesses above are the most current, reproducible references.
People¶
The maintainers and creators whose work DataPress builds on.
- Andrew Lamb — Apache DataFusion / Arrow / Parquet PMC; drives much of DataFusion's day-to-day. Blog · GitHub
- Andy Grove — original author of Apache DataFusion and author of the How Query Engines Work book. Site · GitHub · How Query Engines Work
- Hannes Mühleisen — co-creator of DuckDB, co-founder & CEO of DuckDB Labs, researcher at CWI Amsterdam. Site · GitHub
- Mark Raasveldt — co-creator of DuckDB and co-founder of DuckDB Labs. Site · GitHub