Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 7d45e3e

Browse files
authored
Merge pull request #119 from lightdash/docs/how-lightdash-solves-sql-fanouts
docs: how lightdash solves sql fanouts
2 parents 7d68254 + 5eff875 commit 7d45e3e

File tree

1 file changed

+57
-1
lines changed

1 file changed

+57
-1
lines changed

‎references/joins.mdx

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -507,6 +507,63 @@ Specifying a `primary_key` and the join `relationship` allows Lightdash to:
507507
Once you've included primary keys and a join relationship, Lightdash will add CTEs to the compiled SQL query that ensure metrics are not inflated.
508508
</Info>
509509
510+
#### How Lightdash solves SQL fanouts
511+
512+
Lightdash uses a pattern of Common Table Expressions (CTEs) to solve the fanout problem. Here's how it works:
513+
514+
1. **cte_keys**: Contains dimensions (like payment_method, order_id) that define the grain of your final results and the primary keys. Any field you want to GROUP BY in your final output should be included here.
515+
2. **cte_metrics**: Performs calculations on metrics while maintaining the correct grain established by the keys CTE. This prevents double-counting when aggregating across related tables.
516+
3. **cte_unaffected**: Calculates all metrics that are not affected by fanouts. This includes metrics that exclude duplicates by definition (i.e. `MIN()`, `MAX()` and `COUNT(DISTINCT)`) as well as metrics that are calculated on the table that is on the `many` side of an `one-to-many` or `many-to-one` join relationship. For example, if you have joined `accounts` to `deals` using a `one-to-many` join relationship, `SUM(deals.amount)` would be calculated in this unaffected_cte because the `deals` data is not susceptible to fanouts.
517+
4. **final**: Join the metrics CTEs together to create the complete result set.
518+
519+
<Note>
520+
Lightdash creates a separate cte_keys and cte_metrics for each table that contains metrics with fanouts. This is why you'll see names like "cte_keys_orders" and "cte_metrics_orders" in the example below, indicating they're specific to the orders table.
521+
</Note>
522+
523+
##### Examples
524+
525+
```sql
526+
# Step 1: Create cte_keys that determine the final grain of your results i.e. whatever we will group by.
527+
# Exclude fields that we use in aggregations to calculate metrics e.g. `"orders".amount`
528+
WITH cte_keys_orders AS (
529+
SELECT DISTINCT
530+
"orders".status AS "orders_status", -- grouping dimension
531+
"orders".order_id AS "pk_order_id" -- primary key
532+
FROM "postgres"."jaffle"."payments" AS "payments"
533+
LEFT OUTER JOIN "postgres"."jaffle"."orders" AS "orders"
534+
ON ("orders".order_id) = ("payments".order_id)
535+
),
536+
# Step 2: Calculate metrics that are affected by fanouts
537+
cte_metrics_orders AS (
538+
SELECT
539+
cte_keys_orders."orders_status",
540+
SUM("orders".amount) AS "orders_total_order_amount" -- order metric (affected by fanout)
541+
FROM cte_keys_orders
542+
LEFT JOIN "postgres"."jaffle"."orders" AS "orders" ON cte_keys_orders."pk_order_id" = "orders".order_id -- join with primary keys
543+
GROUP BY 1 -- Note orders_status are grouping dimensions
544+
),
545+
# Step 3: Calculate metrics that are not affected by fanouts
546+
cte_unaffected AS (
547+
SELECT
548+
"orders".status AS "orders_status",
549+
COUNT(DISTINCT "payments".payment_id) AS "payments_unique_payment_count" -- payment metric (NOT affected by fanout)
550+
FROM "postgres"."jaffle"."payments" AS "payments"
551+
LEFT OUTER JOIN "postgres"."jaffle"."orders" AS "orders"
552+
ON ("orders".order_id) = ("payments".order_id)
553+
GROUP BY 1 -- Note orders_status are grouping dimensions
554+
)
555+
# Step 4: Join the metrics CTEs together to create the final result with properly calculated metrics
556+
SELECT
557+
cte_unaffected.*,
558+
cte_metrics_orders."orders_total_order_amount" AS "orders_total_order_amount"
559+
FROM cte_unaffected
560+
INNER JOIN cte_metrics_orders ON (
561+
cte_unaffected."orders_status" = cte_metrics_orders."orders_status" OR ( cte_unaffected."orders_status" IS NULL AND cte_metrics_orders."orders_status" IS NULL )
562+
)
563+
ORDER BY "orders_total_order_amount" DESC
564+
LIMIT 500
565+
```
566+
510567
### Known limitations
511568

512569
There are a few situations where Lightdash doesn't currently handle inflated metrics:
@@ -580,4 +637,3 @@ In this case, each row represents a real charge, and the total billing (TechCorp
580637

581638
**How it works:** When fanout handling is enabled for your Lightdash organization, Lightdash warns you about joins that will likely create fanouts. To remove these warnings and enable fanout protection, specify the join relationship and primary keys in your model.yaml file (details [here](#handling-fanouts)).
582639
This keeps your setup clean and prevents inflated metrics.
583-

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /