Skip to content

Resources & links

DataPress stands on the shoulders of two excellent columnar engines and the research communities around them. This page collects the external links worth bookmarking — project homepages, benchmarks, comparisons, and the people whose work powers it all.

Engines & projects

The two backends DataPress wraps, plus the broader ecosystem they live in.

  • Apache DataFusion — the embeddable Rust query engine behind the datapress-datafusion backend.
  • Apache Arrow — the columnar memory format and IPC protocol DataPress serves over the wire.
  • DuckDB — the in-process analytical database behind the datapress-duckdb backend. See Why DuckDB for its design goals.
  • DuckDB-WASM — the browser build that powers the DataPress in-page SQL terminal.
  • sqlparser-rs — the SQL parser DataPress uses to validate the raw-SQL endpoint.
  • Apache Spark — the distributed comparison point for large-scale workloads.
  • pandas — the single-node DataFrame baseline.
  • Polars — a fast Rust/Arrow DataFrame library, a common point of comparison with DuckDB and DataFusion.
  • ClickHouse — column-oriented OLAP database and author of the ClickBench benchmark.

Benchmarks

How the analytical-database world measures itself.

  • ClickBench — the de-facto analytical DBMS benchmark; DuckDB, DataFusion, ClickHouse, Spark and dozens more on one leaderboard. (methodology & sources)
  • TPC-H — the classic ad-hoc decision-support benchmark (22 queries over a star-ish schema).
  • TPC-DS — the larger, more complex decision-support successor to TPC-H (99 queries).
  • DataFusion vs DuckDB benchmark — Andrew Lamb's reproducible head-to-head harness.

Comparisons

DataPress ships the same HTTP API on top of both engines, so the most direct comparison lives right here in the docs:

For the wider DataFusion / DuckDB / Spark / pandas / Polars landscape, the ClickBench leaderboard and the benchmark harnesses above are the most current, reproducible references.

People

The maintainers and creators whose work DataPress builds on.

  • Andrew Lamb — Apache DataFusion / Arrow / Parquet PMC; drives much of DataFusion's day-to-day. Blog · GitHub
  • Andy Grove — original author of Apache DataFusion and author of the How Query Engines Work book. Site · GitHub · How Query Engines Work
  • Hannes Mühleisen — co-creator of DuckDB, co-founder & CEO of DuckDB Labs, researcher at CWI Amsterdam. Site · GitHub
  • Mark Raasveldt — co-creator of DuckDB and co-founder of DuckDB Labs. Site · GitHub