Skip to content

Quick tour

Every endpoint, end-to-end. Replace accidents with your dataset name.

Discovery

# List configured datasets.
curl -s http://localhost:8080/api/v1/datasets | jq

# Schema + sample row.
curl -s http://localhost:8080/api/v1/datasets/accidents/schema | jq

Querying

curl -s -X POST http://localhost:8080/api/v1/datasets/accidents/query \
  -H 'Content-Type: application/json' \
  -d '{
    "columns":   ["ID", "State", "Severity", "Start_Time"],
    "predicates": [
      { "col": "State",    "op": "eq",  "val": "TX" },
      { "col": "Severity", "op": "gte", "val": 3   }
    ],
    "order_by":  [{ "col": "Severity", "dir": "desc" }],
    "page":      1,
    "page_size": 50
  }' | jq

Full DSL reference: Querying › Request body.

Counting

curl -s -X POST http://localhost:8080/api/v1/datasets/accidents/count \
  -H 'Content-Type: application/json' \
  -d '{
    "predicates": [
      { "col": "State", "op": "in", "val": ["TX","CA"] }
    ]
  }' | jq
# → { "count": 2_159_851 }

Arrow IPC for bulk pulls

For a single page as Arrow IPC, keep using /query and opt into Arrow:

curl -X POST 'http://localhost:8080/api/v1/datasets/accidents/query?format=arrow' \
  -H 'Content-Type: application/json' \
  --output page.arrow \
  -d '{ "columns": ["ID","State"], "page_size": 10000 }'

For one request that streams every matching row, use /query/stream and cap the export with limit when appropriate:

curl -X POST http://localhost:8080/api/v1/datasets/accidents/query/stream \
  -H 'Content-Type: application/json' \
  --output result.arrow \
  -d '{ "columns": ["ID","State"], "limit": 100000 }'

Load it in Python:

import pyarrow.ipc as ipc, polars as pl
with open("page.arrow", "rb") as fh:
    table = ipc.open_stream(fh).read_all()
df = pl.from_arrow(table)

See Querying › Arrow IPC vs JSON for the streaming modes and trade-offs.

Probes

curl http://localhost:8080/healthz   # always 200
curl http://localhost:8080/readyz    # 200 once a dataset is registered
curl http://localhost:8080/version   # build/version metadata

Details in Operations › Probes.

Admin: hot reload

Set ADMIN_TOKEN in the environment to enable POST /api/v1/datasets/{name}/reload:

ADMIN_TOKEN=$(openssl rand -hex 32) task run:duckdb &
# ...
curl -s -X POST \
  -H "X-Admin-Token: $ADMIN_TOKEN" \
  http://localhost:8080/api/v1/datasets/accidents/reload | jq
# → { "dataset": "accidents", "rows": 7728394, "elapsed_ms": 1842 }

If ADMIN_TOKEN is unset, the endpoint returns 403. See Reference › Endpoints for the full table.