Searching in Array performance?

Question 1

We have a table of

id|school_id|parent_ids

where parent_ids is an array of ids.

If we don't have the school_id and only parent_id to search for, then the query will search through all the table rows in parent_ids array, there might be thousands of rows, and parent_id might actually be within just few of them.

Does using IN in query for the array column could be a performance barrier in this case?

EDIT

Here is the dump of table structure:

-- ----------------------------
-- Table structure for schools_messages
-- ----------------------------
DROP TABLE IF EXISTS "public"."schools_messages";
CREATE TABLE "public"."schools_messages" (
 "id" int4 NOT NULL DEFAULT nextval('schools_messages_id_seq'::regclass),
 "message" jsonb NOT NULL DEFAULT '[]'::jsonb,
 "details" jsonb NOT NULL DEFAULT '[]'::jsonb,
 "school_id" int4 NOT NULL,
 "created_at" timestamp(0),
 "updated_at" timestamp(0),
 "parents_ids" int4[] DEFAULT ARRAY[]::integer[]
)
;
ALTER TABLE "public"."schools_messages" OWNER TO "prod_schools";
-- ----------------------------
-- Primary Key structure for table schools_messages
-- ----------------------------
ALTER TABLE "public"."schools_messages" ADD CONSTRAINT "schools_messages_pkey" PRIMARY KEY ("id");
-- ----------------------------
-- Foreign Keys structure for table schools_messages
-- ----------------------------
ALTER TABLE "public"."schools_messages" ADD CONSTRAINT "schools_messages_school_id_foreign" FOREIGN KEY ("school_id") REFERENCES "public"."trk_schools" ("id") ON DELETE CASCADE ON UPDATE NO ACTION;

Question 2

Please edit your question and add the create table statement for the table in question (including all indexes), the query you are using and the execution plan generated using explain (analyze, buffers). Formatted text please, no screen shots. Edit your question. Do not post code in comments

Question 3

Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

Question 4

Kindly check the structure above

Question 5

Index? There does not seem to be an index on the field. Not a postgres guy, but on SQL Server a foreign key constraint does NOT automatically create an index.

Question 6

@JackDouglas will you please describe more? I am not getting your point

Question 7

I agree with Jack that your schema needs help. But you can still do this. Here we do this with one index lookup, using two core extensions intarray and btree_gist

CREATE EXTENSION intarray;
CREATE EXTENSION btree_gist;
CREATE INDEX ON public.schools_messages
 USING gist(school_id, parents_ids gist__int_ops);
VACUUM ANALYZE public.schools_messages;
SELECT *
FROM public.schools_messages
WHERE school_id = 42
 OR parent_id @> ARRAY[42];

Question 8

Would this be efficient when having thousands of rows, each array in a row has thousands of integers? It seems like searching in a 2D array.. well?

Question 9

Depends on what you mean by "efficient" it's better than anything else except changing the schema.

Question 10

Absolutely, it's better than anything else, but changing the schema is an option too, but I need a recommendation, I thought about adding an array in the parent table, that has foreign ids to the schools_messages table, this way with the parent_id, I can get all the messages parent involved in using one query, but the downside, is that when adding a new message, I will have to add its id to all parents sent to, what do you think?

Question 11

@simo I think you need to ask another question and tag it with database-design =)

Question 12

here: dba.stackexchange.com/questions/196207/…

Evan Carroll Evan Carroll 65.7k50 gold badges259 silver badges510 bronze badges · Accepted Answer · 2018-01-24 16:50:55Z

2

I agree with Jack that your schema needs help. But you can still do this. Here we do this with one index lookup, using two core extensions intarray and btree_gist

CREATE EXTENSION intarray;
CREATE EXTENSION btree_gist;
CREATE INDEX ON public.schools_messages
 USING gist(school_id, parents_ids gist__int_ops);
VACUUM ANALYZE public.schools_messages;
SELECT *
FROM public.schools_messages
WHERE school_id = 42
 OR parent_id @> ARRAY[42];

Share

Improve this answer

answered Jan 24, 2018 at 16:50

Evan Carroll's user avatar

Evan Carroll Evan Carroll

65.7k50 gold badges259 silver badges510 bronze badges

5

Would this be efficient when having thousands of rows, each array in a row has thousands of integers? It seems like searching in a 2D array.. well?

simo
– simo

2018年01月25日 08:52:14 +00:00
Commented Jan 25, 2018 at 8:52
Depends on what you mean by "efficient" it's better than anything else except changing the schema.

Evan Carroll
– Evan Carroll

2018年01月25日 09:02:23 +00:00
Commented Jan 25, 2018 at 9:02
Absolutely, it's better than anything else, but changing the schema is an option too, but I need a recommendation, I thought about adding an array in the parent table, that has foreign ids to the schools_messages table, this way with the parent_id, I can get all the messages parent involved in using one query, but the downside, is that when adding a new message, I will have to add its id to all parents sent to, what do you think?

simo
– simo

2018年01月25日 09:08:41 +00:00
Commented Jan 25, 2018 at 9:08
3

@simo I think you need to ask another question and tag it with database-design =)

Evan Carroll
– Evan Carroll

2018年01月25日 09:09:24 +00:00
Commented Jan 25, 2018 at 9:09
here: dba.stackexchange.com/questions/196207/…

simo
– simo

2018年01月25日 10:09:28 +00:00
Commented Jan 25, 2018 at 10:09

Add a comment |

Stack Exchange Network

Searching in Array performance?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Searching in Array performance?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions