PostgreSQL wire protocol (pgwire)¶
The DataFusion backend can expose a native PostgreSQL wire-protocol
endpoint alongside the HTTP API. Any PostgreSQL client — psql, JDBC/ODBC
drivers, or BI tools such as Power BI, Tableau, and Metabase — can then query
your datasets directly, without knowing anything about the DataPress HTTP API.
Experimental
The PostgreSQL wire-protocol endpoint is experimental. It relies on
emulated pg_catalog/information_schema metadata, so some clients and
introspection queries may not work yet, and its behaviour and configuration
can change between releases. It's ready for querying datasets from psql
and common BI tools, but not recommended for production-critical workloads.
Please report any client
that fails to connect or browse schemas.
Each registered dataset appears as a table in the default catalog/schema
(public to Postgres clients), so SELECT * FROM my_dataset just works.
- Read-only — DataFusion has no write path here, so the endpoint serves
SELECT/WITH … SELECTqueries only. - DataFusion only — the DuckDB backend has no pgwire path.
- Opt-in — available only when the binary/wheel was built with the
pgwireCargo feature, and disabled by default even then.
Requires the pgwire build feature
The endpoint is compiled in only when the server binary (or the datap-rs
wheel) is built with the pgwire Cargo feature and run on the
datafusion backend. A pgwire.enabled = true config on a build without
the feature logs a warning at startup and is otherwise a no-op.
Enable the endpoint¶
Turn it on with the [server.pgwire]
block (or the pgwire_* kwargs when launching from Python):
Authentication is cleartext password, so DataPress enforces these rules at startup (the process refuses to start otherwise):
- A loopback bind (
127.0.0.1/::1) may omit the password — handy for local development. - Any non-loopback bind (for example
0.0.0.0) requires both a password and TLS, so credentials never cross the network in the clear.
TLS for network-exposed listeners
To accept connections from other hosts, bind to a non-loopback address and provide a certificate and key. Without TLS a non-loopback listener is refused at startup.
Connect with psql¶
psql "host=127.0.0.1 port=5432 user=datapress password=change-me dbname=datapress" \
-c "SELECT count(*) FROM accidents"
Or interactively:
psql "host=127.0.0.1 port=5432 user=datapress password=change-me dbname=datapress"
datapress=> SELECT severity, count(*) FROM accidents GROUP BY severity ORDER BY 2 DESC;
The dbname is not meaningful to DataFusion (there is a single catalog); any
value is accepted. When TLS is enabled, add sslmode=require:
psql "host=db.example.com port=5432 user=datapress password=change-me dbname=datapress sslmode=require"
Connection strings¶
Most drivers accept a standard PostgreSQL URI:
With TLS:
BI tools¶
BI connectors usually require TLS
Power BI's PostgreSQL connector (Npgsql) defaults to Encrypt
connection = on, and many other BI drivers require SSL by default. A
DataPress listener without TLS will refuse those connections, so configure
tls_cert / tls_key (or disable encryption in the client for local,
loopback testing).
- Power BI — Get Data → PostgreSQL database. Enter the server as
host:port(e.g.db.example.com:5432) and the database asdatapress. Keep Encrypt connection on and use a TLS-enabled listener. - Tableau / Metabase / Superset — choose the PostgreSQL connector and
supply host, port
5432, databasedatapress, and the configured username/password.
DBeaver / DataGrip¶
Create a standard PostgreSQL connection (not a DataPress-specific driver):
- Host
127.0.0.1, Port5432, Databasedatapress. - Username
datapress, Password as configured. - For a TLS listener, set SSL mode to
requirein the driver's SSL settings.
Tables (your datasets) appear under the public schema. Queries are read-only
SELECTs; DDL/DML is not supported.
Notes and limits¶
- Datasets are exposed as tables in the default catalog/schema; column names and types come from each dataset's Arrow schema.
- Only read queries are supported — there is no
INSERT/UPDATE/DELETEor DDL. - The endpoint reflects the datasets currently registered with the server; use the HTTP reload / register operations to change what is visible.
- Connection pooling is supported. Pooling drivers (notably Npgsql, used by
Power BI) reset a pooled connection by issuing session-maintenance statements
such as
DISCARD ALL. DataFusion doesn't model these, so DataPress transparently acknowledgesDISCARD,DEALLOCATE,RESET, andUNLISTENas no-ops (transactions —BEGIN/COMMIT/ROLLBACK— are likewise accepted but not real transactions, since the endpoint is read-only). You should not need to disable pooling.