Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

added detailed instructions on join relationships #137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
ZeRego merged 1 commit into main from improve-join-relationship-docs
Aug 15, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 121 additions & 6 deletions references/joins.mdx
View file Open in desktop
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ title: "Joins reference"
description: "Joins let you connect different models to each other so that you can explore more than one model at the same time in Lightdash and see how different parts of your data relate to each other."
sidebarTitle: "Joins"
---
<Info>
**Performance Best Practice:** For optimal query performance, we recommend using wide tables wherever possible and minimising joins in the BI layer. While we offer advanced features like fanout protection to help with complex relationships, handling data transformations and complex logic directly in your SQL models will generally yield better performance than relying heavily on joins at query time. Consider pre-joining related data during your data modeling process rather than joining tables on-the-fly in dashboards and reports.
</Info>

## Adding joins in your models

Expand Down Expand Up @@ -155,7 +158,7 @@ A full join returns all rows when there is a match in either the left or right t

You can define the relationship between tables in your joins to help Lightdash show warnings and generate the appropriate SQL. This is especially useful for preventing SQL fanouts issues described in the [SQL fanouts](#sql-fanouts) section.

To define a relationship, add the `relationship` field to your join configuration:
To define a relationship, add the `relationship` field to your join configuration.

```yaml
models:
Expand All @@ -167,13 +170,125 @@ models:
sql_on: ${users.user_id} = ${orders.user_id}
relationship: one-to-many
```
<Warning>
Make sure that you consider the direction of the join when defining the relationship. If you incorreclty define the join relationship, your will be affected by fanouts.
</Warning>

##### The following join relationships are supported:

- `one-to-many` - Starting table has 1 record, joined table has many matches
- `many-to-one` - Starting table has many records, joined table has 1 match
- `one-to-one` - Starting table has 1 record, joined table has 1 match
- `many-to-many` - Multiple records in the starting table match multiple records in the joined table

<Accordion title="Helpful Steps for Determining Join Relationships">
#### Step 1: Identify your starting table
Which table are you joining FROM? Direction matters: `Accounts` joining to `Users` (one-to-many) is completely different from users joining to accounts (many-to-one), even though it's the same data.

#### Step 2: Count the expected matches and name the join relationship
For any record in your starting table, ask: "How many matching records will I find in the table I'm joining to and vice versa?" Refer to the supported join relationships listed above.

The examples below detail some more complex join relationships:

##### Chained Join Example
Don't try to figure out `Accounts` → `Users` → `Tracks` all at once. Analyze each join separately:

- First: `Accounts` → `Users` (one-to-many)
- Then: `Users` → `Tracks` (one-to-many)
- Overall result: `Accounts` → `Tracks` (one-to-many)

The `accounts.yml` file will look like this:
``` yaml
version: 2
models:
- name: accounts
meta:
primary_key: account_id
description: List of all customer and prospective customer Accounts pulled from our CRM
joins:
- join: users
relationship: one-to-many
sql_on: ${accounts.account_id} = ${users.account_id}
type: left
- join: tracks
relationship: one-to-many
sql_on: ${users.user_id} = ${tracks.user_id}
type: left
```
The above setup will consider both `Accounts` and `Users` as being susceptible to fanouts and these would be handled accordingly. When you chain two one-to-many relationships, you get a one-to-many relationship from your starting table to your final table (`Accounts` can have many `Tracks`).

Note that if I wanted to join `Users` and `Accounts` onto the `Tracks`, where `Tracks` is the starting model, the direction of the relationship would look different:

The `tracks.yml` model would look like this:
```yaml
version: 2
models:
- name: tracks
meta:
primary_key: account_id
description: List of all customer and prospective customer Accounts pulled from our CRM
joins:
- join: users
relationship: many-to-one
sql_on: ${users.user_id} = ${tracks.user_id}
type: right
- join: accounts
relationship: many-to-one
sql_on: ${users.account_id} = ${accounts.account_id}
type: right
```
##### Complex Join Example
We want to see all Accounts and all Deals, but we only want to see Users (and their associated event tracks) for accounts that have at least one Deal in the 'Won' stage.

This requires a complex join that involves 4 different tables.

• First: `Accounts` → `Deals` (one-to-many)
• Next: `Accounts` and `Deals` → `Users` (many-to-many) - each `Account`+ `Deal` combination can be associated with many `Users` and each user can be associated with multiple `Deals`.
• Then: `Users` → `Tracks` (one-to-many)

A normal SQL join that does not account for fanouts would look like this:
``` sql
select
*
from
accounts
left join deals on
accounts.account_id = deals.account_id
left join users on
accounts.account_id = users.account_id and deals.stage ='Won'
left join tracks on
users.user_id = tracks.user_id
```

And the `accounts.yml` would look like this:
``` yaml
models:
- name: accounts
meta:
primary_key: account_id
description: List of all customer and prospective customer Accounts pulled from our CRM
joins:
- join: deals
relationship: one-to-many
sql_on: ${accounts.account_id} = ${deals.account_id}
type: left
- join: users
relationship: many-to-many
sql_on: ${accounts.account_id} = ${users.account_id} and ${deals.stage} = 'Won'
type: left
- join: tracks
relationship: one-to-many
sql_on: ${users.user_id} = ${tracks.user_id}
type: left
```
In this case, the fanout protection logic will consider metrics from all models to be susceptible to fanouts.

Supported values:
#### Step 3: Check for conditional joins
Look for any AND conditions in your join logic (like and `${deals.stage} = 'Won'`). These can change your relationship from what you'd expect - a typical one-to-many might become many-to-many when you add conditions.

- `one-to-many`
- `many-to-one`
- `one-to-one`
- `many-to-many`
#### Step 4: Validate with sample data
Pick one record from your starting table and manually trace through the joins. Count how many final records you get - this helps catch relationship mistakes before they cause problems.
</Accordion>

## Always join a table

Expand Down

AltStyle によって変換されたページ (->オリジナル) /