Resources & links¶

DataPress stands on the shoulders of two excellent columnar engines and the research communities around them. This page collects the external links worth bookmarking — project homepages, benchmarks, comparisons, and the people whose work powers it all.

Engines & projects¶

The two backends DataPress wraps, plus the broader ecosystem they live in.

Apache DataFusion — the embeddable Rust query engine behind the datapress-datafusion backend.
Apache Arrow — the columnar memory format and IPC protocol DataPress serves over the wire.
DuckDB — the in-process analytical database behind the datapress-duckdb backend. See Why DuckDB for its design goals.
DuckDB-WASM — the browser build that powers the DataPress in-page SQL terminal.
sqlparser-rs — the SQL parser DataPress uses to validate the raw-SQL endpoint.
Apache Spark — the distributed comparison point for large-scale workloads.
pandas — the single-node DataFrame baseline.
Polars — a fast Rust/Arrow DataFrame library, a common point of comparison with DuckDB and DataFusion.
ClickHouse — column-oriented OLAP database and author of the ClickBench benchmark.

Benchmarks¶

How the analytical-database world measures itself.

ClickBench — the de-facto analytical DBMS benchmark; DuckDB, DataFusion, ClickHouse, Spark and dozens more on one leaderboard. (methodology & sources)
TPC-H — the classic ad-hoc decision-support benchmark (22 queries over a star-ish schema).
TPC-DS — the larger, more complex decision-support successor to TPC-H (99 queries).
DataFusion vs DuckDB benchmark — Andrew Lamb's reproducible head-to-head harness.

Comparisons¶

DataPress ships the same HTTP API on top of both engines, so the most direct comparison lives right here in the docs:

DuckDB vs Arrow + DataFusion — the DataPress side-by-side: startup time, RAM footprint, indexing, point-lookup latency.

For the wider DataFusion / DuckDB / Spark / pandas / Polars landscape, the ClickBench leaderboard and the benchmark harnesses above are the most current, reproducible references.

People¶

The maintainers and creators whose work DataPress builds on.

Andrew Lamb — Apache DataFusion / Arrow / Parquet PMC; drives much of DataFusion's day-to-day. Blog · GitHub
Andy Grove — original author of Apache DataFusion and author of the How Query Engines Work book. Site · GitHub · How Query Engines Work
Hannes Mühleisen — co-creator of DuckDB, co-founder & CEO of DuckDB Labs, researcher at CWI Amsterdam. Site · GitHub
Mark Raasveldt — co-creator of DuckDB and co-founder of DuckDB Labs. Site · GitHub