How to extract indeterminate attributes from xml in a postgres db?

Question 1

I've inherited a legacy DB in which excel data is stored in a text column of a table in a postgres DB. A value from that column might look like:

<Sheets>
 <Sheet1>
 <Addresses E54="3" G23="1.1" N87="0"/>
 </Sheet1>
 <Sheet2>
 <Addresses W32="thing"/>
 </Sheet2>
</Sheets>

I know I can pick out values of specific adresses with

select xpath( '//Addresses/@E54', cast(ssd.data as xml)) from spreadsheetdata ssd

but I have no idea how many distinct addresses exist.

What I'm hoping to do is produce a table looking like:

sheet address value
Sheet1 E54 "3"
Sheet1 G23 "1.1"
Sheet1 N87 "0"
Sheet2 W32 "thing"
...

How do I do that?

Question 2

Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer.

Question 3

My solution was to first convert the xml to json; xml_to_json

Then I defined:

-- converts xml from data in spreadsheet templates table into a table via json
CREATE OR REPLACE FUNCTION public.xml_to_table(xml)
 RETURNS TABLE(sheetname text, attributename text, attributevalue text) 
 LANGUAGE 'sql'
 COST 100
 VOLATILE PARALLEL UNSAFE
 ROWS 1000
AS $BODY$
 -- from the records returned by the subquery below this returns records with columns SheetName, 
 -- attribute name (address) and attribute value e.g.:
 -- Sheet1 E54 3
 -- Sheet1 G23 1.1
 -- Sheet1 N87 0
 -- Sheet2 W32 thing
 -- ... 
 
 select 
 e.sheetname, 
 jsonb_object_keys(e.attr) as attributename, 
 e.attr ->> jsonb_object_keys(e.attr) as attributevalue
 from 
 (
 -- removes the rows with null for the list of attributes from the results from the subquery under this, and 
 -- separates each attribute to its own row e.g.:
 -- Sheet1 {"E54": "3"}
 -- Sheet1 {"G23": "1.1"}
 -- Sheet1 {"N87": "0"}
 -- Sheet2 {"W32": "thing"}
 -- ... 
 select 
 d.sheetname, 
 jsonb_array_elements(d.exceldata) as attr
 from 
 (
 -- separates each line from the subquery under this into records containging columns for sheetname, and 
 -- this can handle xml having more than one element at the addresses level 
 -- (e.g. it can handle a NamedeCells element alongside Addresses )
 -- Sheet1 [{"E54": "3"}, ...
 -- Sheet2 [{"W32": "thing"}]
 -- ...
 select 
 b.sheetname, 
 b.records -> jsonb_object_keys(b.records) -> 'attr' as exceldata
 from 
 (
 -- separates each line from the subquery under this into records with columns for sheetname and 
 -- a row for the JSON for each of Addresses e.g.:
 -- Sheet1 {Addresses: {attr: ...
 -- Sheet2 {Addresses: {attr: ...
 -- ...
 select 
 jsonb_object_keys(a.sheetjson) as sheetname, 
 jsonb_array_elements((a.sheetjson->jsonb_object_keys(a.sheetjson) -> 'childs')) as records
 from 
 (
 -- separates the supplied xml into json records for each sheet e.g.:
 -- {Sheet1: {attr: ...
 -- {Sheet2: {attr: ...
 -- ...
 select 
 jsonb_array_elements(xml_to_json(1ドル)->'Sheets'->'childs') as sheetjson
 ) as a
 ) as b
 ) as d
 ) as e;
$BODY$;

this can be called with

select * from xml_to_table('<Sheets>
 <Sheet1>
 <Addresses E54="3" G23="1.1" N87="0"/>
 </Sheet1>
 <Sheet2>
 <Addresses W32="thing"/>
 </Sheet2>
</Sheets>')

to produce

sheetname	attributename	attributevalue
Sheet1	E54	3
Sheet1	G23	1.1
Sheet1	N87	0
Sheet2	W32	thing

This function is not a general as I'd like but it will suffice for my data clean up needs

Any comments welcome.

Gareth Vaughan Gareth Vaughan 315 bronze badges · Answer 1 · 2024-08-07 23:11:16Z

My solution was to first convert the xml to json; xml_to_json

Then I defined:

-- converts xml from data in spreadsheet templates table into a table via json
CREATE OR REPLACE FUNCTION public.xml_to_table(xml)
 RETURNS TABLE(sheetname text, attributename text, attributevalue text) 
 LANGUAGE 'sql'
 COST 100
 VOLATILE PARALLEL UNSAFE
 ROWS 1000
AS $BODY$
 -- from the records returned by the subquery below this returns records with columns SheetName, 
 -- attribute name (address) and attribute value e.g.:
 -- Sheet1 E54 3
 -- Sheet1 G23 1.1
 -- Sheet1 N87 0
 -- Sheet2 W32 thing
 -- ... 
 
 select 
 e.sheetname, 
 jsonb_object_keys(e.attr) as attributename, 
 e.attr ->> jsonb_object_keys(e.attr) as attributevalue
 from 
 (
 -- removes the rows with null for the list of attributes from the results from the subquery under this, and 
 -- separates each attribute to its own row e.g.:
 -- Sheet1 {"E54": "3"}
 -- Sheet1 {"G23": "1.1"}
 -- Sheet1 {"N87": "0"}
 -- Sheet2 {"W32": "thing"}
 -- ... 
 select 
 d.sheetname, 
 jsonb_array_elements(d.exceldata) as attr
 from 
 (
 -- separates each line from the subquery under this into records containging columns for sheetname, and 
 -- this can handle xml having more than one element at the addresses level 
 -- (e.g. it can handle a NamedeCells element alongside Addresses )
 -- Sheet1 [{"E54": "3"}, ...
 -- Sheet2 [{"W32": "thing"}]
 -- ...
 select 
 b.sheetname, 
 b.records -> jsonb_object_keys(b.records) -> 'attr' as exceldata
 from 
 (
 -- separates each line from the subquery under this into records with columns for sheetname and 
 -- a row for the JSON for each of Addresses e.g.:
 -- Sheet1 {Addresses: {attr: ...
 -- Sheet2 {Addresses: {attr: ...
 -- ...
 select 
 jsonb_object_keys(a.sheetjson) as sheetname, 
 jsonb_array_elements((a.sheetjson->jsonb_object_keys(a.sheetjson) -> 'childs')) as records
 from 
 (
 -- separates the supplied xml into json records for each sheet e.g.:
 -- {Sheet1: {attr: ...
 -- {Sheet2: {attr: ...
 -- ...
 select 
 jsonb_array_elements(xml_to_json(1ドル)->'Sheets'->'childs') as sheetjson
 ) as a
 ) as b
 ) as d
 ) as e;
$BODY$;

this can be called with

select * from xml_to_table('<Sheets>
 <Sheet1>
 <Addresses E54="3" G23="1.1" N87="0"/>
 </Sheet1>
 <Sheet2>
 <Addresses W32="thing"/>
 </Sheet2>
</Sheets>')

to produce

sheetname	attributename	attributevalue
Sheet1	E54	3
Sheet1	G23	1.1
Sheet1	N87	0
Sheet2	W32	thing

This function is not a general as I'd like but it will suffice for my data clean up needs

Any comments welcome.

Stack Exchange Network

How to extract indeterminate attributes from xml in a postgres db?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to extract indeterminate attributes from xml in a postgres db?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions