S3 / object storage¶
Whenever source.location starts with s3://, DataPress needs a
[dataset.s3] block (or environment-provided credentials) to talk to
the bucket. The same shape works for AWS, MinIO, Cloudflare R2,
Backblaze B2, Wasabi, and any other S3-compatible service.
Reference¶
| Field | Default | Notes |
|---|---|---|
region |
us-east-1 |
Falls back to AWS_REGION env, then us-east-1. |
endpoint |
(unset) | Custom S3 endpoint (MinIO, R2, Wasabi, Backblaze, …). |
addressing_style |
virtual |
virtual = https://bucket.host, path = https://host/bucket (MinIO). |
allow_http |
false |
Must be true if endpoint is http://.... |
access_key_id, secret_access_key, session_token |
(unset) | Inline creds. Discouraged for prod — use env vars instead. |
AWS S3 — default credential chain¶
Most production deployments use IAM roles or env vars. DataPress reads
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, and
AWS_REGION automatically when not set inline.
[[dataset]]
name = "logs"
[dataset.source]
kind = "parquet"
location = "s3://my-bucket/logs/2024/"
[dataset.s3]
region = "eu-west-1"
MinIO / R2 / Wasabi / Backblaze¶
Non-AWS providers usually need a custom endpoint. MinIO additionally
needs addressing_style = "path". Plain-HTTP endpoints require
allow_http = true.
[[dataset]]
name = "warehouse"
[dataset.source]
kind = "parquet"
location = "s3://warehouse/exports/"
[dataset.s3]
region = "us-east-1"
endpoint = "http://minio.local:9000"
addressing_style = "path"
allow_http = true
Delta on S3¶
Same shape as parquet on S3 — just flip the kind.
[[dataset]]
name = "events_delta"
[dataset.source]
kind = "delta"
location = "s3://my-bucket/events_delta/"
[dataset.s3]
region = "us-east-1"
Inline credentials (discouraged)¶
[dataset.s3]
region = "us-east-1"
access_key_id = "AKIA..."
secret_access_key = "..."
# session_token = "..." # optional, for STS creds
Avoid checking these into version control.
Per-dataset env vars¶
For multi-tenant setups, scope credentials to one dataset by prefixing
the standard AWS env-var names with ${DATASET_NAME_UPPERCASE}_.
Non-alphanumeric chars become _.
For a dataset named sales.eu-1 (prefix → SALES_EU_1):
Credential precedence¶
Highest → lowest:
- Per-dataset env vars:
${PREFIX}_AWS_ACCESS_KEY_ID,${PREFIX}_AWS_SECRET_ACCESS_KEY,${PREFIX}_AWS_SESSION_TOKEN,${PREFIX}_AWS_REGION. - Inline
[dataset.s3]keys. - Plain
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_SESSION_TOKEN,AWS_REGION. - The backend's default credential chain (
~/.aws/credentials, IMDS, …).