We have a table of
id|school_id|parent_ids
where parent_ids
is an array of ids.
If we don't have the school_id
and only parent_id
to search for, then the query will search through all the table rows in parent_ids
array, there might be thousands of rows, and parent_id
might actually be within just few of them.
Does using IN in query for the array column could be a performance barrier in this case?
EDIT
Here is the dump of table structure:
-- ----------------------------
-- Table structure for schools_messages
-- ----------------------------
DROP TABLE IF EXISTS "public"."schools_messages";
CREATE TABLE "public"."schools_messages" (
"id" int4 NOT NULL DEFAULT nextval('schools_messages_id_seq'::regclass),
"message" jsonb NOT NULL DEFAULT '[]'::jsonb,
"details" jsonb NOT NULL DEFAULT '[]'::jsonb,
"school_id" int4 NOT NULL,
"created_at" timestamp(0),
"updated_at" timestamp(0),
"parents_ids" int4[] DEFAULT ARRAY[]::integer[]
)
;
ALTER TABLE "public"."schools_messages" OWNER TO "prod_schools";
-- ----------------------------
-- Primary Key structure for table schools_messages
-- ----------------------------
ALTER TABLE "public"."schools_messages" ADD CONSTRAINT "schools_messages_pkey" PRIMARY KEY ("id");
-- ----------------------------
-- Foreign Keys structure for table schools_messages
-- ----------------------------
ALTER TABLE "public"."schools_messages" ADD CONSTRAINT "schools_messages_school_id_foreign" FOREIGN KEY ("school_id") REFERENCES "public"."trk_schools" ("id") ON DELETE CASCADE ON UPDATE NO ACTION;
1 Answer 1
I agree with Jack that your schema needs help. But you can still do this. Here we do this with one index lookup, using two core extensions intarray
and btree_gist
CREATE EXTENSION intarray;
CREATE EXTENSION btree_gist;
CREATE INDEX ON public.schools_messages
USING gist(school_id, parents_ids gist__int_ops);
VACUUM ANALYZE public.schools_messages;
SELECT *
FROM public.schools_messages
WHERE school_id = 42
OR parent_id @> ARRAY[42];
-
Would this be efficient when having thousands of rows, each array in a row has thousands of integers? It seems like searching in a 2D array.. well?simo– simo2018年01月25日 08:52:14 +00:00Commented Jan 25, 2018 at 8:52
-
Depends on what you mean by "efficient" it's better than anything else except changing the schema.Evan Carroll– Evan Carroll2018年01月25日 09:02:23 +00:00Commented Jan 25, 2018 at 9:02
-
Absolutely, it's better than anything else, but changing the schema is an option too, but I need a recommendation, I thought about adding an array in the parent table, that has foreign ids to the schools_messages table, this way with the parent_id, I can get all the messages parent involved in using one query, but the downside, is that when adding a new message, I will have to add its id to all parents sent to, what do you think?simo– simo2018年01月25日 09:08:41 +00:00Commented Jan 25, 2018 at 9:08
-
3@simo I think you need to ask another question and tag it with database-design =)Evan Carroll– Evan Carroll2018年01月25日 09:09:24 +00:00Commented Jan 25, 2018 at 9:09
-
here: dba.stackexchange.com/questions/196207/…simo– simo2018年01月25日 10:09:28 +00:00Commented Jan 25, 2018 at 10:09
create table
statement for the table in question (including all indexes), the query you are using and the execution plan generated usingexplain (analyze, buffers)
. Formatted text please, no screen shots. Edit your question. Do not post code in comments