Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit ae82798

Browse files
authored
Merge pull request #137 from lightdash/improve-join-relationship-docs
added detailed instructions on join relationships
2 parents 284b3a6 + 0eaad13 commit ae82798

File tree

1 file changed

+121
-6
lines changed

1 file changed

+121
-6
lines changed

‎references/joins.mdx

Lines changed: 121 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ title: "Joins reference"
33
description: "Joins let you connect different models to each other so that you can explore more than one model at the same time in Lightdash and see how different parts of your data relate to each other."
44
sidebarTitle: "Joins"
55
---
6+
<Info>
7+
**Performance Best Practice:** For optimal query performance, we recommend using wide tables wherever possible and minimising joins in the BI layer. While we offer advanced features like fanout protection to help with complex relationships, handling data transformations and complex logic directly in your SQL models will generally yield better performance than relying heavily on joins at query time. Consider pre-joining related data during your data modeling process rather than joining tables on-the-fly in dashboards and reports.
8+
</Info>
69

710
## Adding joins in your models
811

@@ -155,7 +158,7 @@ A full join returns all rows when there is a match in either the left or right t
155158

156159
You can define the relationship between tables in your joins to help Lightdash show warnings and generate the appropriate SQL. This is especially useful for preventing SQL fanouts issues described in the [SQL fanouts](#sql-fanouts) section.
157160

158-
To define a relationship, add the `relationship` field to your join configuration:
161+
To define a relationship, add the `relationship` field to your join configuration.
159162

160163
```yaml
161164
models:
@@ -167,13 +170,125 @@ models:
167170
sql_on: ${users.user_id} = ${orders.user_id}
168171
relationship: one-to-many
169172
```
173+
<Warning>
174+
Make sure that you consider the direction of the join when defining the relationship. If you incorreclty define the join relationship, your will be affected by fanouts.
175+
</Warning>
176+
177+
##### The following join relationships are supported:
178+
179+
- `one-to-many` - Starting table has 1 record, joined table has many matches
180+
- `many-to-one` - Starting table has many records, joined table has 1 match
181+
- `one-to-one` - Starting table has 1 record, joined table has 1 match
182+
- `many-to-many` - Multiple records in the starting table match multiple records in the joined table
183+
184+
<Accordion title="Helpful Steps for Determining Join Relationships">
185+
#### Step 1: Identify your starting table
186+
Which table are you joining FROM? Direction matters: `Accounts` joining to `Users` (one-to-many) is completely different from users joining to accounts (many-to-one), even though it's the same data.
187+
188+
#### Step 2: Count the expected matches and name the join relationship
189+
For any record in your starting table, ask: "How many matching records will I find in the table I'm joining to and vice versa?" Refer to the supported join relationships listed above.
190+
191+
The examples below detail some more complex join relationships:
192+
193+
##### Chained Join Example
194+
Don't try to figure out `Accounts` → `Users` → `Tracks` all at once. Analyze each join separately:
195+
196+
- First: `Accounts` → `Users` (one-to-many)
197+
- Then: `Users` → `Tracks` (one-to-many)
198+
- Overall result: `Accounts` → `Tracks` (one-to-many)
199+
200+
The `accounts.yml` file will look like this:
201+
``` yaml
202+
version: 2
203+
models:
204+
- name: accounts
205+
meta:
206+
primary_key: account_id
207+
description: List of all customer and prospective customer Accounts pulled from our CRM
208+
joins:
209+
- join: users
210+
relationship: one-to-many
211+
sql_on: ${accounts.account_id} = ${users.account_id}
212+
type: left
213+
- join: tracks
214+
relationship: one-to-many
215+
sql_on: ${users.user_id} = ${tracks.user_id}
216+
type: left
217+
```
218+
The above setup will consider both `Accounts` and `Users` as being susceptible to fanouts and these would be handled accordingly. When you chain two one-to-many relationships, you get a one-to-many relationship from your starting table to your final table (`Accounts` can have many `Tracks`).
219+
220+
Note that if I wanted to join `Users` and `Accounts` onto the `Tracks`, where `Tracks` is the starting model, the direction of the relationship would look different:
221+
222+
The `tracks.yml` model would look like this:
223+
```yaml
224+
version: 2
225+
models:
226+
- name: tracks
227+
meta:
228+
primary_key: account_id
229+
description: List of all customer and prospective customer Accounts pulled from our CRM
230+
joins:
231+
- join: users
232+
relationship: many-to-one
233+
sql_on: ${users.user_id} = ${tracks.user_id}
234+
type: right
235+
- join: accounts
236+
relationship: many-to-one
237+
sql_on: ${users.account_id} = ${accounts.account_id}
238+
type: right
239+
```
240+
##### Complex Join Example
241+
We want to see all Accounts and all Deals, but we only want to see Users (and their associated event tracks) for accounts that have at least one Deal in the 'Won' stage.
242+
243+
This requires a complex join that involves 4 different tables.
244+
245+
• First: `Accounts` → `Deals` (one-to-many)
246+
• Next: `Accounts` and `Deals` → `Users` (many-to-many) - each `Account`+ `Deal` combination can be associated with many `Users` and each user can be associated with multiple `Deals`.
247+
• Then: `Users` → `Tracks` (one-to-many)
248+
249+
A normal SQL join that does not account for fanouts would look like this:
250+
``` sql
251+
select
252+
*
253+
from
254+
accounts
255+
left join deals on
256+
accounts.account_id = deals.account_id
257+
left join users on
258+
accounts.account_id = users.account_id and deals.stage ='Won'
259+
left join tracks on
260+
users.user_id = tracks.user_id
261+
```
262+
263+
And the `accounts.yml` would look like this:
264+
``` yaml
265+
models:
266+
- name: accounts
267+
meta:
268+
primary_key: account_id
269+
description: List of all customer and prospective customer Accounts pulled from our CRM
270+
joins:
271+
- join: deals
272+
relationship: one-to-many
273+
sql_on: ${accounts.account_id} = ${deals.account_id}
274+
type: left
275+
- join: users
276+
relationship: many-to-many
277+
sql_on: ${accounts.account_id} = ${users.account_id} and ${deals.stage} = 'Won'
278+
type: left
279+
- join: tracks
280+
relationship: one-to-many
281+
sql_on: ${users.user_id} = ${tracks.user_id}
282+
type: left
283+
```
284+
In this case, the fanout protection logic will consider metrics from all models to be susceptible to fanouts.
170285

171-
Supported values:
286+
#### Step 3: Check for conditional joins
287+
Look for any AND conditions in your join logic (like and `${deals.stage} = 'Won'`). These can change your relationship from what you'd expect - a typical one-to-many might become many-to-many when you add conditions.
172288

173-
- `one-to-many`
174-
- `many-to-one`
175-
- `one-to-one`
176-
- `many-to-many`
289+
#### Step 4: Validate with sample data
290+
Pick one record from your starting table and manually trace through the joins. Count how many final records you get - this helps catch relationship mistakes before they cause problems.
291+
</Accordion>
177292

178293
## Always join a table
179294

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /