I've inherited a legacy DB in which excel data is stored in a text column of a table in a postgres DB. A value from that column might look like:
<Sheets>
<Sheet1>
<Addresses E54="3" G23="1.1" N87="0"/>
</Sheet1>
<Sheet2>
<Addresses W32="thing"/>
</Sheet2>
</Sheets>
I know I can pick out values of specific adresses with
select xpath( '//Addresses/@E54', cast(ssd.data as xml)) from spreadsheetdata ssd
but I have no idea how many distinct addresses exist.
What I'm hoping to do is produce a table looking like:
sheet address value
Sheet1 E54 "3"
Sheet1 G23 "1.1"
Sheet1 N87 "0"
Sheet2 W32 "thing"
...
How do I do that?
-
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer.Community– Community Bot2024年08月06日 08:14:11 +00:00Commented Aug 6, 2024 at 8:14
1 Answer 1
My solution was to first convert the xml to json; xml_to_json
Then I defined:
-- converts xml from data in spreadsheet templates table into a table via json
CREATE OR REPLACE FUNCTION public.xml_to_table(xml)
RETURNS TABLE(sheetname text, attributename text, attributevalue text)
LANGUAGE 'sql'
COST 100
VOLATILE PARALLEL UNSAFE
ROWS 1000
AS $BODY$
-- from the records returned by the subquery below this returns records with columns SheetName,
-- attribute name (address) and attribute value e.g.:
-- Sheet1 E54 3
-- Sheet1 G23 1.1
-- Sheet1 N87 0
-- Sheet2 W32 thing
-- ...
select
e.sheetname,
jsonb_object_keys(e.attr) as attributename,
e.attr ->> jsonb_object_keys(e.attr) as attributevalue
from
(
-- removes the rows with null for the list of attributes from the results from the subquery under this, and
-- separates each attribute to its own row e.g.:
-- Sheet1 {"E54": "3"}
-- Sheet1 {"G23": "1.1"}
-- Sheet1 {"N87": "0"}
-- Sheet2 {"W32": "thing"}
-- ...
select
d.sheetname,
jsonb_array_elements(d.exceldata) as attr
from
(
-- separates each line from the subquery under this into records containging columns for sheetname, and
-- this can handle xml having more than one element at the addresses level
-- (e.g. it can handle a NamedeCells element alongside Addresses )
-- Sheet1 [{"E54": "3"}, ...
-- Sheet2 [{"W32": "thing"}]
-- ...
select
b.sheetname,
b.records -> jsonb_object_keys(b.records) -> 'attr' as exceldata
from
(
-- separates each line from the subquery under this into records with columns for sheetname and
-- a row for the JSON for each of Addresses e.g.:
-- Sheet1 {Addresses: {attr: ...
-- Sheet2 {Addresses: {attr: ...
-- ...
select
jsonb_object_keys(a.sheetjson) as sheetname,
jsonb_array_elements((a.sheetjson->jsonb_object_keys(a.sheetjson) -> 'childs')) as records
from
(
-- separates the supplied xml into json records for each sheet e.g.:
-- {Sheet1: {attr: ...
-- {Sheet2: {attr: ...
-- ...
select
jsonb_array_elements(xml_to_json(1ドル)->'Sheets'->'childs') as sheetjson
) as a
) as b
) as d
) as e;
$BODY$;
this can be called with
select * from xml_to_table('<Sheets>
<Sheet1>
<Addresses E54="3" G23="1.1" N87="0"/>
</Sheet1>
<Sheet2>
<Addresses W32="thing"/>
</Sheet2>
</Sheets>')
to produce
sheetname | attributename | attributevalue |
---|---|---|
Sheet1 | E54 | 3 |
Sheet1 | G23 | 1.1 |
Sheet1 | N87 | 0 |
Sheet2 | W32 | thing |
This function is not a general as I'd like but it will suffice for my data clean up needs
Any comments welcome.