0

Say I have following xml

<Operations>
 <Info Id="2265" cId="2" aId="5" />
 <Info Id="2266" cId="2" aId="5" />
 <Info Id="2266" cId="2" aId="6" />
 <Info Id="2267" cId="2" aId="5" />
 <Info Id="2267" cId="2" aId="6" />
</Operations>

Without inserting the values into table by allocating extra space I want to count the number of nodes with distinct value on attributes cId and aId.

Currently I am doing it by inserting the same into table, but the data is very large and is taking a lot of temp space during execution.

Is there any way?

Expected output for above would be 3

asked May 22, 2023 at 15:19
4
  • Load the data into a table, then do a simple SQL SELECT with COUNT and GROUP BY. Commented May 22, 2023 at 16:03
  • Can't load into a table, space concerns Commented May 22, 2023 at 16:07
  • "Space concerns"?? Is it gigabytes? Commented May 22, 2023 at 18:33
  • 300MBish kind, loading into temp table would just clutter my RAM. Commented May 23, 2023 at 11:55

1 Answer 1

1

Don't force it into RAM; let it use disk as needed. Do you have it in a table? Or only in XML?

There is probably a single LOAD DATA statement to convert that from XML to a table defined thus:

CREATE TABLE Operations (
 id INT NOT NULL AUTO_INCREMENT,
 cid TINYINT UNSIGNED NOT NULL,
 aid TINYINT UNSIGNED NOT NULL,
 PRIMARY KEY(id)
) ENGINE=InnoDB;

The table will probably be smaller than your 300MB. Let it spill to disk if it needs to. (You have very little control anyway.) How big can the ids be? I picked the smallest datatype, but that assumes the numbers are between 0 and 255. Pick a larger datatype if needed.

Then do

 SELECT COUNT(DISTINCT cid, aid)
 FROM Operations;

That seems to be "number of nodes with distinct value on attributes cId and aId", but it looks like the answer will be 2, namely [2,5] and [2,6]. If Info_id is relevant to the counting, please explain how. (That may necessitate including it in the LOAD DATA.

If you are exporting data from a spreadsheet,... Well, I see XML as an awful way to do it. Using a CSV file is easier and faster as an RDBMS-friendly format. After that, 3 SQL statements achieve the goal: CREATE TABLE, LOAD DATA, SELECT.

answered May 23, 2023 at 15:41
2
  • Thanks but I was looking at OpenXML doc, is there any way to read using openxml and count on demand Commented May 23, 2023 at 15:56
  • Sorry, I don't know OpenXML. And my dislike for XML will probably keep me from ever learning it. Commented May 23, 2023 at 16:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.