Python configuration¶
DataPressConfig¶
from datap_rs.datapress import DataPressConfig
cfg = DataPressConfig(
backend="datafusion", # "datafusion" | "duckdb"
listen="0.0.0.0",
port=8000,
workers=8,
prefix="", # e.g. "/datapress" behind a proxy
compress=True,
max_body_bytes=1_048_576, # 413 above this
max_page_size=100_000, # clamp query page_size above this
request_timeout_ms=30_000, # 504 above this; 0 disables
shutdown_timeout_secs=30, # SIGTERM/SIGINT grace period
)
Every kwarg mirrors the TOML [server] block. See
Configuration › Server for full semantics.
DatasetConfig¶
from datap_rs.datapress import DatasetConfig
ds = DatasetConfig(
name="accidents",
source="data/accidents.parquet", # file, dir, glob, or s3://...
format="parquet", # "parquet" | "delta"
mode="auto", # eq-index policy
description="US accidents 2016-2023",
lazy=False,
# DataFusion eq-index policy:
# mode="list", index_columns=["State","Severity"],
# index_max_cardinality=100_000,
)
| kwarg | Meaning |
|---|---|
name |
URL slug + table name. Required. |
source |
Local path, glob, or s3://... URL. Required. |
format |
"parquet" (default) or "delta". |
mode |
DataFusion eq-index policy: "auto" (default), "none", "list". |
index_columns |
Required when mode="list". |
index_max_cardinality |
Auto-mode cardinality cap. Default 100_000. |
lazy |
DataFusion+parquet only. Stream from disk instead of materialising. |
description |
Free-form metadata; surfaced by /api/v1/datasets. |
s3 |
S3Config — only for s3:// sources. |
S3Config¶
from datap_rs.datapress import S3Config
s3 = S3Config(
region="us-east-1",
endpoint="http://localhost:9000", # MinIO / R2 / Wasabi / Backblaze
addressing_style="path", # or "virtual"
allow_http=True, # only for non-https endpoints
access_key_id=None,
secret_access_key=None,
session_token=None,
)
Credentials fall back to the standard AWS env vars
(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN,
AWS_REGION) when not set inline. See
Configuration › S3 for the full precedence
chain and per-dataset env var overrides.
AuthConfig¶
Available when the wheel is built with the auth Cargo feature
(maturin build --release --features auth). Mirrors the TOML [auth]
block — see Operations › Authentication for
field semantics and the full validation pipeline.
from datap_rs.datapress import AuthConfig
auth = AuthConfig(
enabled=True,
issuer="https://login.example.com/realms/datapress",
audience="datapress-api",
read_scopes=["datasets:read"],
reload_scopes=["datasets:reload"],
anonymous_read=False,
algorithms=["RS256"],
leeway_secs=60,
jwks_refresh_secs=3600,
tenant_claim="", # e.g. "/tenant"
allowed_tenants=[],
admin_token_fallback=True,
start_degraded=True,
)
| kwarg | Default | Meaning |
|---|---|---|
enabled |
False |
Master switch. When False all other fields are ignored. |
issuer |
"" |
OIDC issuer URL. Required when enabled=True. |
audience |
"" |
Expected aud claim. Empty = skip audience check. |
read_scopes |
[] |
Scopes required for GET endpoints. |
reload_scopes |
[] |
Scopes required for reload / admin endpoints. |
anonymous_read |
False |
Allow unauthenticated GET requests. |
algorithms |
["RS256"] |
Permitted JWT signing algorithms. |
leeway_secs |
60 |
Clock-skew tolerance for exp / nbf. |
jwks_refresh_secs |
3600 |
Background JWKS refresh interval. |
tenant_claim |
"" |
JSON Pointer (/tenant_id, /realm_access/roles/0, …). |
allowed_tenants |
[] |
Allow-list. Requires tenant_claim. |
admin_token_fallback |
True |
Honour the legacy X-Admin-Token header on reload endpoints. |
start_degraded |
True |
Boot even if JWKS fetch fails; all requests rejected until it recovers. |
Pass it to DataPress(...) via the auth= keyword:
from datap_rs.datapress import DataPress, DataPressConfig, DatasetConfig, AuthConfig
dp = DataPress(
DataPressConfig(backend="datafusion", listen="0.0.0.0", port=8000),
[DatasetConfig(name="accidents", source="data/accidents.parquet")],
auth=AuthConfig(
enabled=True,
issuer="http://localhost:8080/realms/datapress",
audience="datapress-api",
read_scopes=["datasets:read"],
reload_scopes=["datasets:reload"],
),
)
import asyncio; asyncio.run(dp.run())
Validation rules (raised as ValueError):
enabled=Truerequires a non-emptyissuer.allowed_tenantsrequirestenant_claimto be set.tenant_claimmust be a JSON Pointer starting with/.
To spin up a local OIDC provider for testing, see
examples/keycloak/
— one docker compose up and you have a pre-provisioned realm with
client datapress-api and the right scopes.
read_scopes and reload_scopes apply to the whole DataPress server
instance. To enforce strict per-dataset access from Python, run one
server instance per dataset or per access domain and use dataset-named
scopes such as datasets:accidents:read and
datasets:accidents:reload. See Python › Examples for a
complete pattern.