Skip to content

Side-by-side comparison

Aspect DuckDB Arrow + DataFusion
Binary datapress-duckdb datapress-datafusion
Storage Reads parquet on demand Materialises into RecordBatches in RAM
Equality index ([dataset.index]) Ignored auto / none / list
lazy = true Allowed, redundant Registers a ListingTable
Startup time Sub-second on huge datasets Proportional to dataset size + index width
RAM footprint Just DuckDB Resident Arrow + index maps
Point-lookup latency DuckDB SQL (parallel hash) O(1) via equality index on eq / in
Range / LIKE / is_null SQL path SQL path (DataFusion)
Sort / group_by / distinct SQL path SQL path
Arrow IPC responses ✅ native query_arrow ✅ from resident chunks, zero-copy
Hot reload (/reload) ✅ DuckDB transactional replace ArcSwap double-buffer swap
Delta support ✅ via delta extension ✅ via deltalake crate
S3 support ✅ via httpfs ✅ via object_store
Best for Huge or growing datasets, full SQL Hot, RAM-resident data with indexed point queries

The HTTP request and response shapes are identical, so you can A/B both backends against the same client.