PostgreSQL simple query with repeated function calls

Question 1

I have PostgreSQL 9.1.1 running in a shared hosting environment and I'm running this query:

SELECT a.id,
CASE WHEN a.idpai IS NULL THEN a.nome
ELSE concat(a.nome, ' (', b.nome, ')')
END AS nome, a.idpai
FROM localidade a
LEFT OUTER JOIN localidade b ON a.idpai = b.id
WHERE a.idcidade = :idcidade AND normalizar(
 CASE WHEN a.idpai IS NULL THEN a.nome
 ELSE concat(a.nome, ' (', b.nome, ')')
 END
) LIKE normalizar(:nome)
ORDER BY CASE WHEN normalizar(
 CASE WHEN a.idpai IS NULL THEN a.nome
 ELSE concat(a.nome, ' (', b.nome, ')')
 END
) LIKE substring(normalizar(:nome) from 2) THEN 1
ELSE 2
END, a.nome, b.nome
LIMIT 15

The performance is pretty good, the query returns in less than 50ms. Though, I'm repeating this part 3 times inside the query:

CASE WHEN a.idpai IS NULL THEN a.nome
ELSE concat(a.nome, ' (', b.nome, ')')
END

I believe this reduces the maintainability of the query and is not DRY at all. What is the recommended way to handle that? Should I create a function in my schema solely for this query? Or just store it in a variable and concatenate into the query in my back end before sending it over to the db server? Or is there any better way around?

I believe what this query does isn't the main point here, but just to give some background:

The localidade table may or may not have a relation with itself through the idpai which references a "parent" register. Assuming this table format:

id | nome | idpai
 1 | foo |
 2 | bar | 1

So if the query placeholder :nome contains %foo%, it will return:

id | nome | idpai
 1 | foo |
 2 | bar (foo) | 1

The ORDER BY clause is just to display results which the query parameter matches the beginning of the returned row's name before those who don't - SUBSTRING('%foo%' FROM 2) returns foo% -, and then the matching and not matching groups are ordered by a.nome and b.nome ASC as the SQL shows.

normalizar is a STABLE function that takes a string as parameter and calls LOWER() and removes accentuation returning the new string, so I can perform case-insensitive and accent-insensitive string comparison. Similar to unaccent which I couldn't properly apply on my use case due to some encoding problems (page is UTF8 and DB is latin1, if I convert the UTF8 to latin1 solely on the back end I'll have broken UTF8 pages; well this has been solved with the function above and is off-topic).

idcidade is just a foreign key which I use to reduce the result set to less than 0.1% of the table. Hence performance isn't really an issue in this specific case.

My real question is, does calling CASE multiple times passing the exact same parameters have an impact on performance? I assume the result from CASE should be STABLE and the query will be optimized automatically then. If that's the case, I can store part of the query into a variable on my back end and build the query string using that. Otherwise, if calling CASE multiple times in the same query with the same arguments affects performance, should I create a STABLE function for it then or what's the correct approach to this? Any pointers are appreciated.

Question 2

A few additional simplifications:

SELECT a.id
 ,concat(a.nome, ' (' || b.nome || ')') AS nome
 ,a.idpai
FROM localidade a
LEFT JOIN localidade b ON a.idpai = b.id
WHERE a.idcidade = :idcidade
AND normalizar(concat(a.nome, ' (' || b.nome || ')'))
 LIKE ('%' || normalizar(:nome) || '%')
ORDER BY (normalizar(concat(a.nome, ' (' || b.nome || ')'))
 LIKE (normalizar(:nome)) || '%') DESC NULLS LAST
 ,2
LIMIT 15;

right(x, -1) would be a little simpler and faster than substring(x from 2). But if you follow my advice below you don't need either.
If you use this query in a prepared statement of function, and you know the search term is never anchored (always % around the term - %foo%), then it will be faster to use explicit placeholder. Let the planner know, the pattern always starts with %, so the query plan can be optimized.
You can just ORDER BY the boolean result of an expression. No need for a case statement. FALSE (0) sorts before TRUE (1) sorts before NULL.
I added NULLS LAST to get 100% identical behaviour like your original with descending order. Consider this SQLfiddle demonstrating the effect.
I also ORDER BY the positional parameter 2 to simplify the syntax. Won't make the query faster, just shorter.
One thing to consider: If a.idpai IS NOT NULL that doesn't necessarily mean b.nome IS NOT NULL, so the original and the simplified concat() version are not identical if b.nome can be NULL.
I would make the function normalizar IMMUTABLE, since the result is always the same for the same input. Probably won't make a difference for this query, though. Both STABLE and IMMUTABLE function can be optimized by the query planner.
Some other minor syntax simplifications just to demonstrate possibilities. Like .. ~~ is a Postgres operator for the SQL standard LIKE. (Niether is faster, ~~ is just shorter but non-standard. Note the slight difference in operator precedence. You may have to change parenthesis since ~~ (or ~~*, !~~, !~~*) binds stronger than LIKE (or ILIKE, NOT LIKE, NOT ILIKE).

If your tables are big and you really want fast - like several orders of magnitude faster than what you have - then create a materialized view (read here and here) with a trigram GIN or GiST index to support non-anchored LIKE searches. Read here and here.

Question 3

Thanks for your throughout answer. Many of your answers in SO helped me in the past as well and I'm grateful for that. On topic now, there is no advantage in using ~~ instead of LIKE besides the reduced characters, right? In that case the more verbose form LIKE is technically more maintainable when members of your team do not have as much PGSQL experience. I forgot to mention that b.nome cannot be null, my bad. Some things I'm not 100% sure: does NULLS LAST make a difference? It seems to be the default. Also does positional parameter 2 correspond to the 2nd column in the SELECT?

Question 4

And 2 more serious issues: running ~~ in the WHERE results in an "ERROR: argument of WHERE must be type boolean, not type text", which doesn't happen using LIKE (simplified sqlfiddle), and one last thing, your concat(a.nome, ' (', b.nome, ')') will display a.nome's value () when b.nome is null - not showing empty brackets when b.nome is null was the main point of the question. Thanks for the time to provide your answer and insights though.

Question 5

Sorry for the number of questions, but I can only find documentation on positional parameters when it is inside of a SQL function. Weird thing is, the documentation says that double tildes are equivalent to LIKE (link) but it fails in the fiddle's WHERE. Weird.

Question 6

Oh the issue with ~~ was probably due to operator precedence, ~~ ('%' || normalizar(:nome) || '%') works. At least this shows that these are not 100% equivalent apparently.

Question 7

Apart from the issue with empty brackets when b.nome is null, your answer does not account for ordering by matches starting with the queried string first. sqlfiddle.com/#!1/3872a/1

Question 8

I think this part:

CASE WHEN a.idpai IS NULL THEN a.nome
ELSE concat(a.nome, ' (', b.nome, ')')
END AS nome

Can be replaced with:

CONCAT(a.nome, ' (' || b.nome || ')') AS nome

If b.nome is NULL, then "' (' || b.nome || ')'" will also be NULL (concatenating strings using '||' with NULL values will result in NULL. CONCAT() ignore null values, so will return only a.none in this case

In other words: if b.nome is present, it will output "a.nome (b.nome)", otherwise "a.nome"

You can test it yourself, here's my sqlfiddle:

http://sqlfiddle.com/#!1/d41d8/744/0

Also, you can try to use a subselect to prevent having to repeat this CONCAT;

http://sqlfiddle.com/#!1/a9571/7/0

I haven checked if this improves performance, but it reduces the amount of repeated code and may be better maintainable

Question 9

Awesome! Thanks. I've read about the side effect of ||ing with NULL before but never thought of taking advantage of it. Brilliant solution.

Question 10

@FabrícioMatté Glad I could help!

Erwin Brandstetter Erwin Brandstetter 4234 silver badges12 bronze badges · Accepted Answer · 2013-03-09 18:12:24Z

A few additional simplifications:

SELECT a.id
 ,concat(a.nome, ' (' || b.nome || ')') AS nome
 ,a.idpai
FROM localidade a
LEFT JOIN localidade b ON a.idpai = b.id
WHERE a.idcidade = :idcidade
AND normalizar(concat(a.nome, ' (' || b.nome || ')'))
 LIKE ('%' || normalizar(:nome) || '%')
ORDER BY (normalizar(concat(a.nome, ' (' || b.nome || ')'))
 LIKE (normalizar(:nome)) || '%') DESC NULLS LAST
 ,2
LIMIT 15;

right(x, -1) would be a little simpler and faster than substring(x from 2). But if you follow my advice below you don't need either.
If you use this query in a prepared statement of function, and you know the search term is never anchored (always % around the term - %foo%), then it will be faster to use explicit placeholder. Let the planner know, the pattern always starts with %, so the query plan can be optimized.
You can just ORDER BY the boolean result of an expression. No need for a case statement. FALSE (0) sorts before TRUE (1) sorts before NULL.
I added NULLS LAST to get 100% identical behaviour like your original with descending order. Consider this SQLfiddle demonstrating the effect.
I also ORDER BY the positional parameter 2 to simplify the syntax. Won't make the query faster, just shorter.
One thing to consider: If a.idpai IS NOT NULL that doesn't necessarily mean b.nome IS NOT NULL, so the original and the simplified concat() version are not identical if b.nome can be NULL.
I would make the function normalizar IMMUTABLE, since the result is always the same for the same input. Probably won't make a difference for this query, though. Both STABLE and IMMUTABLE function can be optimized by the query planner.
Some other minor syntax simplifications just to demonstrate possibilities. Like .. ~~ is a Postgres operator for the SQL standard LIKE. (Niether is faster, ~~ is just shorter but non-standard. Note the slight difference in operator precedence. You may have to change parenthesis since ~~ (or ~~*, !~~, !~~*) binds stronger than LIKE (or ILIKE, NOT LIKE, NOT ILIKE).

If your tables are big and you really want fast - like several orders of magnitude faster than what you have - then create a materialized view (read here and here) with a trigram GIN or GiST index to support non-anchored LIKE searches. Read here and here.

Thanks for your throughout answer. Many of your answers in SO helped me in the past as well and I'm grateful for that. On topic now, there is no advantage in using ~~ instead of LIKE besides the reduced characters, right? In that case the more verbose form LIKE is technically more maintainable when members of your team do not have as much PGSQL experience. I forgot to mention that b.nome cannot be null, my bad. Some things I'm not 100% sure: does NULLS LAST make a difference? It seems to be the default. Also does positional parameter 2 correspond to the 2nd column in the SELECT?
And 2 more serious issues: running ~~ in the WHERE results in an "ERROR: argument of WHERE must be type boolean, not type text", which doesn't happen using LIKE (simplified sqlfiddle), and one last thing, your concat(a.nome, ' (', b.nome, ')') will display a.nome's value () when b.nome is null - not showing empty brackets when b.nome is null was the main point of the question. Thanks for the time to provide your answer and insights though.
Sorry for the number of questions, but I can only find documentation on positional parameters when it is inside of a SQL function. Weird thing is, the documentation says that double tildes are equivalent to LIKE (link) but it fails in the fiddle's WHERE. Weird.
Oh the issue with ~~ was probably due to operator precedence, ~~ ('%' || normalizar(:nome) || '%') works. At least this shows that these are not 100% equivalent apparently.
Apart from the issue with empty brackets when b.nome is null, your answer does not account for ordering by matches starting with the queried string first. sqlfiddle.com/#!1/3872a/1

Stack Exchange Network

PostgreSQL simple query with repeated function calls

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

PostgreSQL simple query with repeated function calls

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions