You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: references/joins.mdx
+57-1Lines changed: 57 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -507,6 +507,63 @@ Specifying a `primary_key` and the join `relationship` allows Lightdash to:
507
507
Once you've included primary keys and a join relationship, Lightdash will add CTEs to the compiled SQL query that ensure metrics are not inflated.
508
508
</Info>
509
509
510
+
#### How Lightdash solves SQL fanouts
511
+
512
+
Lightdash uses a pattern of Common Table Expressions (CTEs) to solve the fanout problem. Here's how it works:
513
+
514
+
1. **cte_keys**: Contains dimensions (like payment_method, order_id) that define the grain of your final results and the primary keys. Any field you want to GROUP BY in your final output should be included here.
515
+
2. **cte_metrics**: Performs calculations on metrics while maintaining the correct grain established by the keys CTE. This prevents double-counting when aggregating across related tables.
516
+
3. **cte_unaffected**: Calculates all metrics that are not affected by fanouts. This includes metrics that exclude duplicates by definition (i.e. `MIN()`, `MAX()` and `COUNT(DISTINCT)`) as well as metrics that are calculated on the table that is on the `many` side of an `one-to-many` or `many-to-one` join relationship. For example, if you have joined `accounts` to `deals` using a `one-to-many` join relationship, `SUM(deals.amount)` would be calculated in this unaffected_cte because the `deals` data is not susceptible to fanouts.
517
+
4. **final**: Join the metrics CTEs together to create the complete result set.
518
+
519
+
<Note>
520
+
Lightdash creates a separate cte_keys and cte_metrics for each table that contains metrics with fanouts. This is why you'll see names like "cte_keys_orders" and "cte_metrics_orders" in the example below, indicating they're specific to the orders table.
521
+
</Note>
522
+
523
+
##### Examples
524
+
525
+
```sql
526
+
# Step 1: Create cte_keys that determine the final grain of your results i.e. whatever we will group by.
527
+
# Exclude fields that we use in aggregations to calculate metrics e.g. `"orders".amount`
528
+
WITH cte_keys_orders AS (
529
+
SELECT DISTINCT
530
+
"orders".status AS "orders_status", -- grouping dimension
531
+
"orders".order_id AS "pk_order_id" -- primary key
532
+
FROM "postgres"."jaffle"."payments" AS "payments"
533
+
LEFT OUTER JOIN "postgres"."jaffle"."orders" AS "orders"
534
+
ON ("orders".order_id) = ("payments".order_id)
535
+
),
536
+
# Step 2: Calculate metrics that are affected by fanouts
537
+
cte_metrics_orders AS (
538
+
SELECT
539
+
cte_keys_orders."orders_status",
540
+
SUM("orders".amount) AS "orders_total_order_amount" -- order metric (affected by fanout)
541
+
FROM cte_keys_orders
542
+
LEFT JOIN "postgres"."jaffle"."orders" AS "orders" ON cte_keys_orders."pk_order_id" = "orders".order_id -- join with primary keys
543
+
GROUP BY 1 -- Note orders_status are grouping dimensions
544
+
),
545
+
# Step 3: Calculate metrics that are not affected by fanouts
546
+
cte_unaffected AS (
547
+
SELECT
548
+
"orders".status AS "orders_status",
549
+
COUNT(DISTINCT "payments".payment_id) AS "payments_unique_payment_count" -- payment metric (NOT affected by fanout)
550
+
FROM "postgres"."jaffle"."payments" AS "payments"
551
+
LEFT OUTER JOIN "postgres"."jaffle"."orders" AS "orders"
552
+
ON ("orders".order_id) = ("payments".order_id)
553
+
GROUP BY 1 -- Note orders_status are grouping dimensions
554
+
)
555
+
# Step 4: Join the metrics CTEs together to create the final result with properly calculated metrics
556
+
SELECT
557
+
cte_unaffected.*,
558
+
cte_metrics_orders."orders_total_order_amount" AS "orders_total_order_amount"
559
+
FROM cte_unaffected
560
+
INNER JOIN cte_metrics_orders ON (
561
+
cte_unaffected."orders_status" = cte_metrics_orders."orders_status" OR ( cte_unaffected."orders_status" IS NULL AND cte_metrics_orders."orders_status" IS NULL )
562
+
)
563
+
ORDER BY "orders_total_order_amount" DESC
564
+
LIMIT 500
565
+
```
566
+
510
567
### Known limitations
511
568
512
569
There are a few situations where Lightdash doesn't currently handle inflated metrics:
@@ -580,4 +637,3 @@ In this case, each row represents a real charge, and the total billing (TechCorp
580
637
581
638
**How it works:** When fanout handling is enabled for your Lightdash organization, Lightdash warns you about joins that will likely create fanouts. To remove these warnings and enable fanout protection, specify the join relationship and primary keys in your model.yaml file (details [here](#handling-fanouts)).
582
639
This keeps your setup clean and prevents inflated metrics.
0 commit comments