Server settings¶
The [server] block is fully optional — every field has a sensible
default.
[server]
backend = "datafusion" # "datafusion" (default) | "duckdb"
listen = "127.0.0.1" # bind address; "0.0.0.0" to expose
port = 8080
# workers = 8 # omit for one worker per CPU
# prefix = "/datapress" # mount every route under this path
# compress = true
# max_body_bytes = 1048576 # 413 above this
# max_page_size = 100000 # clamp query page_size above this
# request_timeout_ms = 30000 # 504 above this; 0 disables
# shutdown_timeout_secs = 30 # SIGTERM grace period
[server.quack] # DuckDB backend only; experimental
enabled = false
uri = "quack:localhost" # default port 9494; use literal localhost
# token = "change-me" # optional; generated and logged if omitted
allow_other_hostname = false # true for quack:0.0.0.0:9494 behind TLS proxy
read_only = true # allow reads plus Quack attach handshake
Reference¶
| Field | Default | Notes |
|---|---|---|
backend |
datafusion |
Informational hint logged at startup. Each binary always runs as its own backend regardless. |
listen |
127.0.0.1 |
Loopback by default — the service is not network-exposed unless you opt in. |
port |
8080 |
TCP port. |
workers |
(unset) | Actix worker threads. Unset = one per CPU. |
prefix |
"" |
URL prefix in front of every app route. Must start with / and not end with /. |
compress |
true |
Negotiate gzip / brotli / zstd via Accept-Encoding. |
max_body_bytes |
1048576 |
Max accepted JSON request body. Larger → 413 Payload Too Large. |
max_page_size |
100000 |
Max rows returned by one /query page. Larger page_size values are clamped. |
request_timeout_ms |
30000 |
Per-request handler timeout (ms). Long handlers are cancelled and the client gets 504. 0 disables. |
shutdown_timeout_secs |
30 |
Grace period for in-flight requests after SIGTERM / SIGINT. |
DuckDB Quack server¶
DuckDB builds can optionally start DuckDB's experimental Quack remote protocol server after datasets are registered:
[server]
backend = "duckdb"
[server.quack]
enabled = true
uri = "quack:localhost" # default port 9494
token = "analytics-token" # optional, but recommended
read_only = true # default
Quack exposes the DuckDB SQL surface of the in-process database. DataPress
therefore keeps it disabled by default, binds to localhost by default, and
installs a read-only authorization hook by default. If token is omitted,
Quack generates one at startup and DataPress logs it once.
With read_only = true, DataPress allows read/inspection statements and
the Quack client attach handshake, but rejects write-oriented and DDL
statements such as CREATE, INSERT, UPDATE, DELETE, COPY,
DROP, ALTER, LOAD, and INSTALL.
DuckDB's Quack extension currently treats only the literal hostname
localhost as local. Use uri = "quack:localhost"; quack:127.0.0.1
is rejected unless allow_other_hostname = true.
To listen on a non-local address, set both a non-local URI and
allow_other_hostname = true, then put a TLS-terminating reverse proxy in
front of it:
[server.quack]
enabled = true
uri = "quack:0.0.0.0:9494"
allow_other_hostname = true
token = "analytics-token"
DuckDB CLI clients can connect with a Quack secret:
CREATE SECRET (
TYPE quack,
TOKEN 'analytics-token',
SCOPE 'quack:localhost'
);
ATTACH 'quack:localhost' AS datapress (TYPE quack);
FROM datapress.accidents LIMIT 10;
Behind a reverse proxy¶
When nginx / Traefik / Caddy forwards a path prefix verbatim, set
prefix so app routes match:
The unprefixed probes — /healthz, /readyz, /version — stay at the
bare host root regardless. That way orchestrators don't need to know how
the service is exposed.
Compression¶
On by default and negotiated per request via Accept-Encoding. Clients
that want raw JSON send Accept-Encoding: identity or omit the header.
Disable when sitting behind a proxy that already compresses, or to save
CPU on a trusted LAN.
Request size limit¶
max_body_bytes is an incoming request-body limit. It applies to
the bytes the client sends to DataPress: for example the JSON body of a
POST /api/v1/datasets/{name}/query request. It is wired into both
Actix's JSON extractor and raw payload extractor (web::JsonConfig and
web::PayloadConfig). Oversized requests are rejected with
413 Payload Too Large before the query handler runs.
It is not a response-size limit. DataPress does not truncate JSON or
Arrow IPC responses at max_body_bytes, and it does not drop rows to
make a response fit that value. Response size is determined by the query
result: selected columns, row count, Arrow/JSON encoding overhead, and
optional HTTP compression.
For query requests the order is:
- The HTTP request body must fit within
max_body_bytes. - The JSON body is parsed into the query request.
pageis normalized to at least1;page_sizeis clamped to[1, max_page_size].- The backend applies
page,page_size, and optional top-levellimitto choose rows. - The chosen rows are encoded as JSON or Arrow IPC.
That means a small query body can legitimately produce a much larger
response. If max_body_bytes = 10_485_760 and an Arrow IPC query with
page_size = 1000 returns about 10 MiB, the two numbers only correlate
by coincidence unless the client, proxy, or load balancer has its own
separate response-size limit. DataPress itself uses max_body_bytes only
on the request side.
To control response size, reduce page_size, project fewer columns,
add more selective predicates, or page through the result set. See
Arrow IPC vs JSON
for the Arrow-specific details.
Query page-size limit¶
max_page_size controls the largest row page a /query request can
ask for. The default is 100_000. If a client sends a larger
page_size, DataPress clamps it to max_page_size; the response reports
the effective value in the JSON body or Arrow IPC X-Page-Size header.
This is separate from max_body_bytes: max_page_size limits rows in
the response page, while max_body_bytes limits bytes in the incoming
request body.
Request timeout¶
A handler that doesn't produce a response within request_timeout_ms
is cancelled at the next .await point and the client sees
504 Gateway Timeout with body {"error":"request timed out"}.
Graceful shutdown¶
On SIGTERM or SIGINT:
- The listening socket is closed — no new connections.
- In-flight requests get up to
shutdown_timeout_secsto drain. - Workers are stopped.
Set this lower when fast restarts matter more than slow handlers; set it higher for long-running aggregations or large parquet exports. The startup log records which signal triggered shutdown:
INFO Received SIGTERM, shutting down gracefully (up to 30s for in-flight requests)...
INFO Shutdown complete.
See Operations › Graceful shutdown for the orchestrator-side tuning.