1

I've inherited a legacy DB in which excel data is stored in a text column of a table in a postgres DB. A value from that column might look like:

<Sheets>
 <Sheet1>
 <Addresses E54="3" G23="1.1" N87="0"/>
 </Sheet1>
 <Sheet2>
 <Addresses W32="thing"/>
 </Sheet2>
</Sheets>

I know I can pick out values of specific adresses with

select xpath( '//Addresses/@E54', cast(ssd.data as xml)) from spreadsheetdata ssd 

but I have no idea how many distinct addresses exist.

What I'm hoping to do is produce a table looking like:

sheet address value
Sheet1 E54 "3"
Sheet1 G23 "1.1"
Sheet1 N87 "0"
Sheet2 W32 "thing"
...

How do I do that?

asked Aug 5, 2024 at 10:29
1
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Commented Aug 6, 2024 at 8:14

1 Answer 1

2

My solution was to first convert the xml to json; xml_to_json

Then I defined:

-- converts xml from data in spreadsheet templates table into a table via json
CREATE OR REPLACE FUNCTION public.xml_to_table(xml)
 RETURNS TABLE(sheetname text, attributename text, attributevalue text) 
 LANGUAGE 'sql'
 COST 100
 VOLATILE PARALLEL UNSAFE
 ROWS 1000
AS $BODY$
 -- from the records returned by the subquery below this returns records with columns SheetName, 
 -- attribute name (address) and attribute value e.g.:
 -- Sheet1 E54 3
 -- Sheet1 G23 1.1
 -- Sheet1 N87 0
 -- Sheet2 W32 thing
 -- ... 
 
 select 
 e.sheetname, 
 jsonb_object_keys(e.attr) as attributename, 
 e.attr ->> jsonb_object_keys(e.attr) as attributevalue
 from 
 (
 -- removes the rows with null for the list of attributes from the results from the subquery under this, and 
 -- separates each attribute to its own row e.g.:
 -- Sheet1 {"E54": "3"}
 -- Sheet1 {"G23": "1.1"}
 -- Sheet1 {"N87": "0"}
 -- Sheet2 {"W32": "thing"}
 -- ... 
 select 
 d.sheetname, 
 jsonb_array_elements(d.exceldata) as attr
 from 
 (
 -- separates each line from the subquery under this into records containging columns for sheetname, and 
 -- this can handle xml having more than one element at the addresses level 
 -- (e.g. it can handle a NamedeCells element alongside Addresses )
 -- Sheet1 [{"E54": "3"}, ...
 -- Sheet2 [{"W32": "thing"}]
 -- ...
 select 
 b.sheetname, 
 b.records -> jsonb_object_keys(b.records) -> 'attr' as exceldata
 from 
 (
 -- separates each line from the subquery under this into records with columns for sheetname and 
 -- a row for the JSON for each of Addresses e.g.:
 -- Sheet1 {Addresses: {attr: ...
 -- Sheet2 {Addresses: {attr: ...
 -- ...
 select 
 jsonb_object_keys(a.sheetjson) as sheetname, 
 jsonb_array_elements((a.sheetjson->jsonb_object_keys(a.sheetjson) -> 'childs')) as records
 from 
 (
 -- separates the supplied xml into json records for each sheet e.g.:
 -- {Sheet1: {attr: ...
 -- {Sheet2: {attr: ...
 -- ...
 select 
 jsonb_array_elements(xml_to_json(1ドル)->'Sheets'->'childs') as sheetjson
 ) as a
 ) as b
 ) as d
 ) as e;
$BODY$;

this can be called with

select * from xml_to_table('<Sheets>
 <Sheet1>
 <Addresses E54="3" G23="1.1" N87="0"/>
 </Sheet1>
 <Sheet2>
 <Addresses W32="thing"/>
 </Sheet2>
</Sheets>')

to produce

sheetname attributename attributevalue
Sheet1 E54 3
Sheet1 G23 1.1
Sheet1 N87 0
Sheet2 W32 thing

This function is not a general as I'd like but it will suffice for my data clean up needs

Any comments welcome.

answered Aug 7, 2024 at 23:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.