I have PostgreSQL 9.1.1 running in a shared hosting environment and I'm running this query:
SELECT a.id,
CASE WHEN a.idpai IS NULL THEN a.nome
ELSE concat(a.nome, ' (', b.nome, ')')
END AS nome, a.idpai
FROM localidade a
LEFT OUTER JOIN localidade b ON a.idpai = b.id
WHERE a.idcidade = :idcidade AND normalizar(
CASE WHEN a.idpai IS NULL THEN a.nome
ELSE concat(a.nome, ' (', b.nome, ')')
END
) LIKE normalizar(:nome)
ORDER BY CASE WHEN normalizar(
CASE WHEN a.idpai IS NULL THEN a.nome
ELSE concat(a.nome, ' (', b.nome, ')')
END
) LIKE substring(normalizar(:nome) from 2) THEN 1
ELSE 2
END, a.nome, b.nome
LIMIT 15
The performance is pretty good, the query returns in less than 50ms. Though, I'm repeating this part 3 times inside the query:
CASE WHEN a.idpai IS NULL THEN a.nome
ELSE concat(a.nome, ' (', b.nome, ')')
END
I believe this reduces the maintainability of the query and is not DRY at all. What is the recommended way to handle that? Should I create a function in my schema solely for this query? Or just store it in a variable and concatenate into the query in my back end before sending it over to the db server? Or is there any better way around?
I believe what this query does isn't the main point here, but just to give some background:
The localidade
table may or may not have a relation with itself through the idpai
which references a "parent" register. Assuming this table format:
id | nome | idpai
1 | foo |
2 | bar | 1
So if the query placeholder :nome
contains %foo%
, it will return:
id | nome | idpai
1 | foo |
2 | bar (foo) | 1
The ORDER BY
clause is just to display results which the query parameter matches the beginning of the returned row's name
before those who don't - SUBSTRING('%foo%' FROM 2)
returns foo%
-, and then the matching and not matching groups are ordered by a.nome
and b.nome
ASC as the SQL shows.
normalizar
is a STABLE
function that takes a string as parameter and calls LOWER()
and removes accentuation returning the new string, so I can perform case-insensitive and accent-insensitive string comparison. Similar to unaccent
which I couldn't properly apply on my use case due to some encoding problems (page is UTF8 and DB is latin1, if I convert the UTF8 to latin1 solely on the back end I'll have broken UTF8 pages; well this has been solved with the function above and is off-topic).
idcidade
is just a foreign key which I use to reduce the result set to less than 0.1% of the table. Hence performance isn't really an issue in this specific case.
My real question is, does calling CASE
multiple times passing the exact same parameters have an impact on performance? I assume the result from CASE
should be STABLE
and the query will be optimized automatically then. If that's the case, I can store part of the query into a variable on my back end and build the query string using that. Otherwise, if calling CASE
multiple times in the same query with the same arguments affects performance, should I create a STABLE
function for it then or what's the correct approach to this? Any pointers are appreciated.
2 Answers 2
A few additional simplifications:
SELECT a.id
,concat(a.nome, ' (' || b.nome || ')') AS nome
,a.idpai
FROM localidade a
LEFT JOIN localidade b ON a.idpai = b.id
WHERE a.idcidade = :idcidade
AND normalizar(concat(a.nome, ' (' || b.nome || ')'))
LIKE ('%' || normalizar(:nome) || '%')
ORDER BY (normalizar(concat(a.nome, ' (' || b.nome || ')'))
LIKE (normalizar(:nome)) || '%') DESC NULLS LAST
,2
LIMIT 15;
right(x, -1)
would be a little simpler and faster thansubstring(x from 2)
. But if you follow my advice below you don't need either.If you use this query in a prepared statement of function, and you know the search term is never anchored (always
%
around the term -%foo%
), then it will be faster to use explicit placeholder. Let the planner know, the pattern always starts with%
, so the query plan can be optimized.You can just
ORDER BY
the boolean result of an expression. No need for a case statement.FALSE
(0) sorts beforeTRUE
(1) sorts beforeNULL
.
I addedNULLS LAST
to get 100% identical behaviour like your original with descending order. Consider this SQLfiddle demonstrating the effect.I also
ORDER BY
the positional parameter2
to simplify the syntax. Won't make the query faster, just shorter.One thing to consider: If
a.idpai IS NOT NULL
that doesn't necessarily meanb.nome IS NOT NULL
, so the original and the simplifiedconcat()
version are not identical ifb.nome
can beNULL
.I would make the function
normalizar
IMMUTABLE
, since the result is always the same for the same input. Probably won't make a difference for this query, though. BothSTABLE
andIMMUTABLE
function can be optimized by the query planner.Some other minor syntax simplifications just to demonstrate possibilities. Like ..
~~
is a Postgres operator for the SQL standardLIKE
. (Niether is faster,~~
is just shorter but non-standard. Note the slight difference in operator precedence. You may have to change parenthesis since~~
(or~~*
,!~~
,!~~*
) binds stronger thanLIKE
(orILIKE
,NOT LIKE
,NOT ILIKE
).
If your tables are big and you really want fast - like several orders of magnitude faster than what you have - then create a materialized view (read here and here) with a trigram GIN or GiST index to support non-anchored LIKE
searches. Read here and here.
-
\$\begingroup\$ Thanks for your throughout answer. Many of your answers in SO helped me in the past as well and I'm grateful for that. On topic now, there is no advantage in using
~~
instead ofLIKE
besides the reduced characters, right? In that case the more verbose formLIKE
is technically more maintainable when members of your team do not have as much PGSQL experience. I forgot to mention thatb.nome
cannot benull
, my bad. Some things I'm not 100% sure: doesNULLS LAST
make a difference? It seems to be the default. Also does positional parameter 2 correspond to the 2nd column in the SELECT? \$\endgroup\$Fabrício Matté– Fabrício Matté2013年03月10日 22:23:30 +00:00Commented Mar 10, 2013 at 22:23 -
\$\begingroup\$ And 2 more serious issues: running
~~
in theWHERE
results in an "ERROR: argument of WHERE must be type boolean, not type text", which doesn't happen using LIKE (simplified sqlfiddle), and one last thing, yourconcat(a.nome, ' (', b.nome, ')')
will displaya.nome's value ()
whenb.nome
isnull
- not showing empty brackets whenb.nome
isnull
was the main point of the question. Thanks for the time to provide your answer and insights though. \$\endgroup\$Fabrício Matté– Fabrício Matté2013年03月10日 22:26:53 +00:00Commented Mar 10, 2013 at 22:26 -
\$\begingroup\$ Sorry for the number of questions, but I can only find documentation on positional parameters when it is inside of a SQL function. Weird thing is, the documentation says that double tildes are equivalent to LIKE (link) but it fails in the fiddle's
WHERE
. Weird. \$\endgroup\$Fabrício Matté– Fabrício Matté2013年03月10日 22:34:31 +00:00Commented Mar 10, 2013 at 22:34 -
\$\begingroup\$ Oh the issue with ~~ was probably due to operator precedence,
~~ ('%' || normalizar(:nome) || '%')
works. At least this shows that these are not 100% equivalent apparently. \$\endgroup\$Fabrício Matté– Fabrício Matté2013年03月10日 22:38:56 +00:00Commented Mar 10, 2013 at 22:38 -
\$\begingroup\$ Apart from the issue with empty brackets when
b.nome
isnull
, your answer does not account for ordering by matches starting with the queried string first. sqlfiddle.com/#!1/3872a/1 \$\endgroup\$Fabrício Matté– Fabrício Matté2013年03月10日 22:43:28 +00:00Commented Mar 10, 2013 at 22:43
I think this part:
CASE WHEN a.idpai IS NULL THEN a.nome
ELSE concat(a.nome, ' (', b.nome, ')')
END AS nome
Can be replaced with:
CONCAT(a.nome, ' (' || b.nome || ')') AS nome
If b.nome is NULL, then "' (' || b.nome || ')'" will also be NULL (concatenating strings using '||' with NULL values will result in NULL. CONCAT() ignore null values, so will return only a.none in this case
In other words: if b.nome is present, it will output "a.nome (b.nome)", otherwise "a.nome"
You can test it yourself, here's my sqlfiddle:
http://sqlfiddle.com/#!1/d41d8/744/0
Also, you can try to use a subselect to prevent having to repeat this CONCAT;
http://sqlfiddle.com/#!1/a9571/7/0
I haven checked if this improves performance, but it reduces the amount of repeated code and may be better maintainable
-
\$\begingroup\$ Awesome! Thanks. I've read about the side effect of
||
ing withNULL
before but never thought of taking advantage of it. Brilliant solution. \$\endgroup\$Fabrício Matté– Fabrício Matté2013年02月01日 00:11:30 +00:00Commented Feb 1, 2013 at 0:11 -
\$\begingroup\$ @FabrícioMatté Glad I could help! \$\endgroup\$thaJeztah– thaJeztah2013年02月01日 00:13:48 +00:00Commented Feb 1, 2013 at 0:13