Predicates¶

Each predicate is an object:

{ "col": "<column-name>", "op": "<operator>", "val": <json-value> }

col is matched case-insensitively against the dataset schema. The set of operators is closed — anything else returns 400 Unknown operator: <op>.

Multiple predicates are ANDed together. For OR, use in for the same column, or issue separate queries client-side.

Operator reference¶

Operator	Meaning	`val` shape
`eq`	`col = val`	scalar (string, number, bool)
`neq`	`col != val`	scalar
`gt`	`col > val`	scalar
`gte`	`col >= val`	scalar
`lt`	`col < val`	scalar
`lte`	`col <= val`	scalar
`like`	`col LIKE val`	string (use `%` / `_` wildcards)
`ilike`	`col ILIKE val`	string (case-insensitive LIKE)
`in`	`col IN (v1, v2, …)`	non-empty array of scalars
`is_null`	`col IS NULL`	omit `val` (or set to `null`)
`is_not_null`	`col IS NOT NULL`	omit `val` (or set to `null`)

Equality and inequality¶

{
  "predicates": [
    { "col": "state",    "op": "eq",  "val": "CA" },
    { "col": "severity", "op": "neq", "val": 1    }
  ]
}

Boolean values work too:

{ "predicates": [ { "col": "amenity", "op": "eq", "val": true } ] }

Ranges¶

Numeric:

{
  "predicates": [
    { "col": "severity",    "op": "gte", "val": 3   },
    { "col": "distance_mi", "op": "lt",  "val": 5.0 }
  ]
}

Lexicographic on strings:

{ "predicates": [ { "col": "state", "op": "gte", "val": "M" } ] }

Temporal columns are kept native in RAM and compared as strings in their ISO-8601 textual form on the wire:

{
  "predicates": [
    { "col": "start_time", "op": "gte", "val": "2023-01-01" },
    { "col": "start_time", "op": "lt",  "val": "2023-07-01" }
  ]
}

`IN` — multi-value membership¶

{
  "predicates": [
    { "col": "state", "op": "in", "val": ["CA", "TX", "FL", "NY"] }
  ]
}

val must be a non-empty array. [] returns 400.

A single in against an indexed column hits the equality-index fast path — no SQL engine involvement, just a merged sort of the per-value row-id lists.

`LIKE` and `ILIKE`¶

SQL wildcards:

% — zero or more characters
_ — exactly one character

{
  "predicates": [
    { "col": "city", "op": "like",  "val": "San %"     },
    { "col": "city", "op": "ilike", "val": "%falls%"   }
  ]
}

like is case-sensitive; ilike is case-insensitive. Both go through the SQL fallback path (they cannot use the equality index).

Null / not-null¶

val is ignored — omit it or set it to null. Both spellings below are equivalent:

{ "predicates": [
  { "col": "end_lat", "op": "is_null"                    },
  { "col": "end_lng", "op": "is_not_null", "val": null   }
] }

How predicates are executed¶

For materialised (non-lazy) datasets the backend picks the cheapest applicable path:

Empty predicates → direct Arrow slice over the resident chunks. O(page_size), no engine overhead.
All predicates are eq / in on indexed columns → equality-index path. Each per-value row-id list is merged in sorted order; the page is materialised with a single arrow::compute::take. O(predicate_matches).
Anything else (ranges, LIKE, ILIKE, is_null, mixed) → DataFusion SQL. Multi-threaded vectorised scan over the dataset, then pagination.

For lazy datasets every query goes through DataFusion directly against the ListingTable registered on the parquet files — column projection and predicate pushdown are handled by DataFusion's parquet reader.

Combining everything¶

{
  "columns": ["id", "state", "city", "severity", "start_time"],
  "predicates": [
    { "col": "state",             "op": "in",          "val": ["CA", "TX"] },
    { "col": "severity",          "op": "gte",         "val": 3            },
    { "col": "weather_condition", "op": "ilike",       "val": "%rain%"     },
    { "col": "start_time",        "op": "gte",         "val": "2022-01-01" },
    { "col": "end_lat",           "op": "is_not_null"                       }
  ],
  "page": 1,
  "page_size": 250
}