-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Problem
I'm using Cube.js with a ClickHouse data source and need to support dynamic filtering on an array column (Project IDs of type Array(Int32)) via the GraphQL API. Users should be able to provide an arbitrary list of values (e.g., [123, 456]) to filter rows where the array contains any of those values (OR semantics).
Cube.js doesn't natively support array types or operators like contains for arrays, so standard dimension filters append = ? to the WHERE clause, which breaks array functions (e.g., hasAny). I've tried several workarounds, but each has trade-offs (detailed below). I'm looking for:
Better approaches or best practices for this use case.
Confirmation if the "formatted string" method is acceptable (it works but feels hacky).
Any plans for native array support (e.g., custom operators like arrayContains)?
Attempts and Issues
-
Unnest with arrayJoin:
javascriptproject_id_element: {
sql:arrayJoin(${CUBE}."Project IDs"),
type:number
}
Issue: Filtering by multiple values duplicates measures (e.g., count) when multiple values match in one row. -
Direct Array Function:
javascripthas_project_id: {
sql:arrayExists(x -> x = {filterValue}, ${CUBE}."Project IDs"),
type:boolean
}
Issue: Cube appends = ? to filters, breaking arrayExists or hasAny. -
Using FILTER_PARAMS in Segment:
javascriptsegments: {
project_filter: { sql:hasAny(${CUBE}."Project IDs", ${FILTER_PARAMS.project_ids})}
}
Issue: Still gets wrapped in = ? or is hard to maintain with dynamic values. -
Format Array as String (Current Solution):
javascriptproject_id_filter: {
type:string,
sql:arrayStringConcat(arrayMap(x -> concat('$', toString(x), '$'), ${CUBE}."Project IDs"), ',')
}
Works: Filters with contains on ["123ドル$", "456ドル$"] produce WHERE ... LIKE '%123ドル$%' OR ... LIKE '%456ドル$%'.
Concerns: Brittle (client must format$x$ ), potential performance issues with string ops, edge cases (empty arrays).
Related Cube.js schema
cube(`violations`, { sql_table: `violations`, data_source: `default`, measures: { // Assuming some measures like count count: { type: `count`, }, }, dimensions: { id: { sql: `${CUBE}."ID"`, type: `string`, primary_key: true }, // Note: "Project IDs" is the array column (Array(Int32)) project_id: { type: `number`, sql: `${CUBE}."Project IDs"`, // This doesn't work for filtering as-is }, has_project_id: { sql: `arrayExists(x -> x = {filterValue}, ${CUBE}."Project IDs")`, type: `boolean`, }, project_id_element: { sql: `arrayJoin(${CUBE}."Project IDs")`, type: `number`, // Or string }, project_id_filter: { type: `string`, sql: `arrayStringConcat(arrayMap(x -> concat('$', toString(x), '$'), ${CUBE}."Project IDs"), ',')`, }, } });``` **Related Cube.js generated SQL** ```sql SELECT someOtherDim, COUNT(*) AS count FROM violations WHERE hasAny("Project IDs", [123, 456]) -- OR equivalent with arrayExists GROUP BY someOtherDim;