Index for split string. PostgreSQL 10

Question 1

I have the following string in db:

A/B/C/
A/B/D/
E/F/C/

It's the uri column.

The table is the following:

create table meta_info (
 id bigserial primary key,
 uri varchar,
 created timestamp with time zone,
 version varchar,
 binary_length bigserial
 )

In most cases I find the directory and want to find children.

E.g.,

SELECT tt.stem[2+1] as current, tt.stem[2+2]
as child from (SELECT regexp_split_to_array(uri, '/')
 as stem FROM meta_info
 where uri like 'A/B%' )
as tt group by tt.stem[2+1], tt.stem[2+2]

... , where "2" is the size of path to find (A/B% - size 2 , A/B/C% - size 3 , A/B/D/E% - size 4)

E.g.,

SELECT tt.stem[4+1] as current, tt.stem[4+2]
as child from (SELECT regexp_split_to_array(uri, '/')
 as stem FROM meta_info
 where uri like 'A/B/C/D%' )
as tt group by tt.stem[4+1], tt.stem[4+2]

At the moment I have 400000 rows. It takes too much time to run that sql query. Can I create an index to improve the performance?

The following index didn't help.

CREATE INDEX idx ON meta_info (regexp_split_to_array(uri, '/'));

The query still is quite slow. Especially is slow the following sql:

EXPLAIN ANALYZE
 SELECT DISTINCT((regexp_split_to_array(uri, '/'))[1]) as branch
 FROM meta_info

I need to run the mentioned sql when I need to get the unique first levels (folders).

.......................... what about text_pattern_ops, yep, I know it. I am already using it. But thanks for split_part(uri, '/',1). Wow, it works really faster. I have replaced all regexp_split_to_array with split_part. Thanks a lot!

Question 2

Unrelated, but: distinct is not a function. distinct (a) is the same as distinct a - it always applies to all expressions in the select list (distinct a,b,c is also the same as distinct (a),b,c or distinct a,b,(c))

Question 3

There is no index that could speed this up as you are essentially normalizing your data "on-the-fly" and the putting it back together again. You might want to consider a database model known as the "adjacency model" and then use a recursive cte to traverse the paths. That is probably a lot faster

Question 4

If you will be constantly parsing the data out on the fly, you should probably just stored it parsed in the first place.

If you do want to store it as flat strings there are some things you can do to speed it up.

where uri like 'A/B/C/D%'

This part can be sped up creating an index

create index on meta_info (uri text_pattern_ops);

How much of a speed up this is depends on how selective 'A/B/C/D%' is.

EXPLAIN ANALYZE SELECT DISTINCT((regexp_split_to_array(uri, '/'))[1]) as branch FROM meta_info

This can be sped up 2 to 3 times if you avoid the unnecessary regexp machinery and the array machinery.

EXPLAIN (ANALYZE, BUFFERS)
SELECT DISTINCT split_part(uri, '/',1) as branch
FROM meta_info;

Question 5

what about text_pattern_ops, yep, I know it. I am already using it. But thanks for split_part(uri, '/',1). It's really faster

jjanes jjanes 42.5k3 gold badges44 silver badges54 bronze badges · Accepted Answer · 2018-09-19 01:57:05Z

If you will be constantly parsing the data out on the fly, you should probably just stored it parsed in the first place.

If you do want to store it as flat strings there are some things you can do to speed it up.

where uri like 'A/B/C/D%'

This part can be sped up creating an index

create index on meta_info (uri text_pattern_ops);

How much of a speed up this is depends on how selective 'A/B/C/D%' is.

EXPLAIN ANALYZE SELECT DISTINCT((regexp_split_to_array(uri, '/'))[1]) as branch FROM meta_info

This can be sped up 2 to 3 times if you avoid the unnecessary regexp machinery and the array machinery.

EXPLAIN (ANALYZE, BUFFERS)
SELECT DISTINCT split_part(uri, '/',1) as branch
FROM meta_info;

what about text_pattern_ops, yep, I know it. I am already using it. But thanks for split_part(uri, '/',1). It's really faster

Stack Exchange Network

Index for split string. PostgreSQL 10

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Index for split string. PostgreSQL 10

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions