0

I have the following string in db:

  • A/B/C/
  • A/B/D/
  • E/F/C/

It's the uri column.

The table is the following:

create table meta_info (
 id bigserial primary key,
 uri varchar,
 created timestamp with time zone,
 version varchar,
 binary_length bigserial
 )

In most cases I find the directory and want to find children.

E.g.,

SELECT tt.stem[2+1] as current, tt.stem[2+2]
as child from (SELECT regexp_split_to_array(uri, '/')
 as stem FROM meta_info
 where uri like 'A/B%' )
as tt group by tt.stem[2+1], tt.stem[2+2]

... , where "2" is the size of path to find (A/B% - size 2 , A/B/C% - size 3 , A/B/D/E% - size 4)

E.g.,

SELECT tt.stem[4+1] as current, tt.stem[4+2]
as child from (SELECT regexp_split_to_array(uri, '/')
 as stem FROM meta_info
 where uri like 'A/B/C/D%' )
as tt group by tt.stem[4+1], tt.stem[4+2]

At the moment I have 400000 rows. It takes too much time to run that sql query. Can I create an index to improve the performance?

The following index didn't help.

CREATE INDEX idx ON meta_info (regexp_split_to_array(uri, '/'));

The query still is quite slow. Especially is slow the following sql:

EXPLAIN ANALYZE
 SELECT DISTINCT((regexp_split_to_array(uri, '/'))[1]) as branch
 FROM meta_info

I need to run the mentioned sql when I need to get the unique first levels (folders).

.......................... what about text_pattern_ops, yep, I know it. I am already using it. But thanks for split_part(uri, '/',1). Wow, it works really faster. I have replaced all regexp_split_to_array with split_part. Thanks a lot!

asked Sep 18, 2018 at 15:03
2
  • 1
    Unrelated, but: distinct is not a function. distinct (a) is the same as distinct a - it always applies to all expressions in the select list (distinct a,b,c is also the same as distinct (a),b,c or distinct a,b,(c)) Commented Sep 18, 2018 at 17:32
  • 1
    There is no index that could speed this up as you are essentially normalizing your data "on-the-fly" and the putting it back together again. You might want to consider a database model known as the "adjacency model" and then use a recursive cte to traverse the paths. That is probably a lot faster Commented Sep 18, 2018 at 17:34

1 Answer 1

1

If you will be constantly parsing the data out on the fly, you should probably just stored it parsed in the first place.

If you do want to store it as flat strings there are some things you can do to speed it up.

where uri like 'A/B/C/D%'

This part can be sped up creating an index

create index on meta_info (uri text_pattern_ops);

How much of a speed up this is depends on how selective 'A/B/C/D%' is.

EXPLAIN ANALYZE SELECT DISTINCT((regexp_split_to_array(uri, '/'))[1]) as branch FROM meta_info

This can be sped up 2 to 3 times if you avoid the unnecessary regexp machinery and the array machinery.

EXPLAIN (ANALYZE, BUFFERS)
SELECT DISTINCT split_part(uri, '/',1) as branch
FROM meta_info;
answered Sep 19, 2018 at 1:57
1
  • what about text_pattern_ops, yep, I know it. I am already using it. But thanks for split_part(uri, '/',1). It's really faster Commented Sep 19, 2018 at 8:00

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.