Have a look at the following XML data:
<data>
<test color="red">Red text</test>
<test color="green">green</test>
</data>
Let's say I have several xml-documents with this structure in my database:
CREATE TABLE xmldata (
id bigserial not null,
documents xml)
Now I want to select all possible colors:
SELECT id, xpath('//test', xml) FROM xmldata;
But this returns a table with the id of each document and a text-array of the test-nodes. Furthermore, documents without any "test" node exist in the result as well - with an empty array {}
What I really want is a table like this:
| id | node |
| 1 | <test color="red">Red text</test> |
| 1 | <test color="green">green</test> |
What is the syntax I have to use?
I heard that xpath_table may be the function to use - but this function is marked as deprecated...
(The returned table has to have one line for each occurence of the node I searched for. The node itself maybe an xml-snippet, text or something else - isn't really important)
2 Answers 2
There is nothing wrong with using xpath_table()
.
Even though some of the functions in the xml2 extension have been replaced with in-core functions, xpath_table is not one of them (I think this will happen for Postgres 10).
Until then, xpath_table()
seems to be your only option:
SELECT *
FROM xpath_table('id',
'documents',
'xmldata',
'//test',
'true') AS t(doc_id integer, data text);
Returns:
doc_id | data
-------+---------
1 | Red text
1 | green
In order for this to work you first need to create the extension (as a superuser):
create extension xml2;
Otherwise the function isn't available.
You have yet another alternative, using regexp_matches:
WITH s0 AS
(
-- Your original query
SELECT
id, xpath('//test', documents) AS x
FROM
xmldata
)
, s1 AS
(
-- We unnest the array (convert it to rows)
SELECT
id, unnest(x) AS xml_node
FROM
s0
)
SELECT
id,
xml_node,
(regexp_matches(xml_node::text, '<test[^>]*>(.*)<\/test>'))[1] AS data
FROM
s1 ;
id | xml_node | data
-: | :-------------------------------- | :------- 1 | <test color="red">Red text</test> | Red text 1 | <test color="green">green</test> | green
... and you can have everything in just one SELECT
-- Compacted version
SELECT
id, (regexp_matches(unnest(xpath('//test', documents))::text, '<test[^>]*>(.*)<\/test>'))[1] AS xml_node
FROM
xmldata
id | xml_node -: | :------- 1 | Red text 1 | green
dbfiddle here