Quick tour¶
Every endpoint, end-to-end. Replace accidents with your dataset name.
Discovery¶
# List configured datasets.
curl -s http://localhost:8080/api/v1/datasets | jq
# Schema + sample row.
curl -s http://localhost:8080/api/v1/datasets/accidents/schema | jq
Querying¶
curl -s -X POST http://localhost:8080/api/v1/datasets/accidents/query \
-H 'Content-Type: application/json' \
-d '{
"columns": ["ID", "State", "Severity", "Start_Time"],
"predicates": [
{ "col": "State", "op": "eq", "val": "TX" },
{ "col": "Severity", "op": "gte", "val": 3 }
],
"order_by": [{ "col": "Severity", "dir": "desc" }],
"page": 1,
"page_size": 50
}' | jq
Full DSL reference: Querying › Request body.
Counting¶
curl -s -X POST http://localhost:8080/api/v1/datasets/accidents/count \
-H 'Content-Type: application/json' \
-d '{
"predicates": [
{ "col": "State", "op": "in", "val": ["TX","CA"] }
]
}' | jq
# → { "count": 2_159_851 }
Arrow IPC for bulk pulls¶
For a single page as Arrow IPC, keep using /query and opt into Arrow:
curl -X POST 'http://localhost:8080/api/v1/datasets/accidents/query?format=arrow' \
-H 'Content-Type: application/json' \
--output page.arrow \
-d '{ "columns": ["ID","State"], "page_size": 10000 }'
For one request that streams every matching row, use /query/stream
and cap the export with limit when appropriate:
curl -X POST http://localhost:8080/api/v1/datasets/accidents/query/stream \
-H 'Content-Type: application/json' \
--output result.arrow \
-d '{ "columns": ["ID","State"], "limit": 100000 }'
Load it in Python:
import pyarrow.ipc as ipc, polars as pl
with open("page.arrow", "rb") as fh:
table = ipc.open_stream(fh).read_all()
df = pl.from_arrow(table)
See Querying › Arrow IPC vs JSON for the streaming modes and trade-offs.
Probes¶
curl http://localhost:8080/healthz # always 200
curl http://localhost:8080/readyz # 200 once a dataset is registered
curl http://localhost:8080/version # build/version metadata
Details in Operations › Probes.
Admin: hot reload¶
Set ADMIN_TOKEN in the environment to enable
POST /api/v1/datasets/{name}/reload:
ADMIN_TOKEN=$(openssl rand -hex 32) task run:duckdb &
# ...
curl -s -X POST \
-H "X-Admin-Token: $ADMIN_TOKEN" \
http://localhost:8080/api/v1/datasets/accidents/reload | jq
# → { "dataset": "accidents", "rows": 7728394, "elapsed_ms": 1842 }
If ADMIN_TOKEN is unset, the endpoint returns 403. See
Reference › Endpoints for the full table.