I'm designing a schema that has a many-to-many relationship as such:
A lead can have several stages and each stage can have several leads. I currently have three tables, with one of them as being a junction table.
stages
----------
id | name | workflow_id | created_at | updated_at
leads
----------
id | name | created_at | updated_at
stage_leads
--------------
id | stage_id | lead_id | created_at | updated_at
(stages belong to a workflow)
I've got indices set up.
**
When a lead moves from one stage to another, a new entry is created in the stage_leads table. My use cases seem simple:
a) find/count all leads on a particular stage, and
b) find/count all stages a particular lead has gone through
My concern is that since every move is creating a new entry into the junction table -- and there are potentially hundreds of stages a lead can go through -- it may lead to hundreds of thousands of rows for very few leads and that could probably impact the lookup time when I'm trying to fetch a specific set.
My question is this: would it be better to use Postgres' JSON functionality to store my data something like this:
stages
----------
id | name | workflow_id | created_at | updated_at
leads
----------
id | name | current_stage_id | stages | created_at | updated_at
... where the stages column on "leads" is a JSON object that logs the stage_ids that the lead has gone through, the timestamps, and a couple of other fields?
This seems to satisfy my use case of finding all leads on a particular stage (I can run a query against the current_stage_id of all leads) and it probably makes retrieving the stages for a particular lead easier. I'm not very confident about it though. I realize I'd be giving up on things like constraints, etc.
Are there any downsides to an approach like this?
1 Answer 1
would it be better to use Postgres' JSON functionality to store my data something like this:
No.
Are there any downsides to an approach like this?
GIN / GiST indexes on jsonb
are bigger and slower than b-tree indexes. Additionally, whenever you update a field it must be rewritten, you can't just insert an extra element inside the json document in the row.
My concern is that since every move is creating a new entry into the junction table -- and there are potentially hundreds of stages a lead can go through -- it may lead to hundreds of thousands of rows for very few leads and that could probably impact the lookup time when I'm trying to fetch a specific set.
This is what RDBMSes are good at.
Use the conventional relational modelling for this unless you have a really good reason not to.
-
Thank you so much! I was thinking along these lines; I just wasn't very sure what would have been the right approach. :)swastik– swastik2014年11月06日 10:24:12 +00:00Commented Nov 6, 2014 at 10:24