Skip to content

S3 / object storage

Whenever source.location starts with s3://, DataPress needs a [dataset.s3] block (or environment-provided credentials) to talk to the bucket. The same shape works for AWS, MinIO, Cloudflare R2, Backblaze B2, Wasabi, and any other S3-compatible service.

Reference

Field Default Notes
region us-east-1 Falls back to AWS_REGION env, then us-east-1.
endpoint (unset) Custom S3 endpoint (MinIO, R2, Wasabi, Backblaze, …).
addressing_style virtual virtual = https://bucket.host, path = https://host/bucket (MinIO).
allow_http false Must be true if endpoint is http://....
access_key_id, secret_access_key, session_token (unset) Inline creds. Discouraged for prod — use env vars instead.

AWS S3 — default credential chain

Most production deployments use IAM roles or env vars. DataPress reads AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, and AWS_REGION automatically when not set inline.

[[dataset]]
name = "logs"

[dataset.source]
kind     = "parquet"
location = "s3://my-bucket/logs/2024/"

[dataset.s3]
region = "eu-west-1"

MinIO / R2 / Wasabi / Backblaze

Non-AWS providers usually need a custom endpoint. MinIO additionally needs addressing_style = "path". Plain-HTTP endpoints require allow_http = true.

[[dataset]]
name = "warehouse"

[dataset.source]
kind     = "parquet"
location = "s3://warehouse/exports/"

[dataset.s3]
region           = "us-east-1"
endpoint         = "http://minio.local:9000"
addressing_style = "path"
allow_http       = true

Delta on S3

Same shape as parquet on S3 — just flip the kind.

[[dataset]]
name = "events_delta"

[dataset.source]
kind     = "delta"
location = "s3://my-bucket/events_delta/"

[dataset.s3]
region = "us-east-1"

Inline credentials (discouraged)

[dataset.s3]
region            = "us-east-1"
access_key_id     = "AKIA..."
secret_access_key = "..."
# session_token   = "..."   # optional, for STS creds

Avoid checking these into version control.

Per-dataset env vars

For multi-tenant setups, scope credentials to one dataset by prefixing the standard AWS env-var names with ${DATASET_NAME_UPPERCASE}_. Non-alphanumeric chars become _.

For a dataset named sales.eu-1 (prefix → SALES_EU_1):

export SALES_EU_1_AWS_ACCESS_KEY_ID=AKIA...
export SALES_EU_1_AWS_SECRET_ACCESS_KEY=...

Credential precedence

Highest → lowest:

  1. Per-dataset env vars: ${PREFIX}_AWS_ACCESS_KEY_ID, ${PREFIX}_AWS_SECRET_ACCESS_KEY, ${PREFIX}_AWS_SESSION_TOKEN, ${PREFIX}_AWS_REGION.
  2. Inline [dataset.s3] keys.
  3. Plain AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_REGION.
  4. The backend's default credential chain (~/.aws/credentials, IMDS, …).