Say I have following xml
<Operations>
<Info Id="2265" cId="2" aId="5" />
<Info Id="2266" cId="2" aId="5" />
<Info Id="2266" cId="2" aId="6" />
<Info Id="2267" cId="2" aId="5" />
<Info Id="2267" cId="2" aId="6" />
</Operations>
Without inserting the values into table by allocating extra space I want to count the number of nodes with distinct value on attributes cId
and aId
.
Currently I am doing it by inserting the same into table, but the data is very large and is taking a lot of temp space during execution.
Is there any way?
Expected output for above would be 3
-
Load the data into a table, then do a simple SQL SELECT with COUNT and GROUP BY.Rick James– Rick James2023年05月22日 16:03:14 +00:00Commented May 22, 2023 at 16:03
-
Can't load into a table, space concernsHimanshuman– Himanshuman2023年05月22日 16:07:10 +00:00Commented May 22, 2023 at 16:07
-
"Space concerns"?? Is it gigabytes?Rick James– Rick James2023年05月22日 18:33:41 +00:00Commented May 22, 2023 at 18:33
-
300MBish kind, loading into temp table would just clutter my RAM.Himanshuman– Himanshuman2023年05月23日 11:55:57 +00:00Commented May 23, 2023 at 11:55
1 Answer 1
Don't force it into RAM; let it use disk as needed. Do you have it in a table? Or only in XML?
There is probably a single LOAD DATA
statement to convert that from XML to a table defined thus:
CREATE TABLE Operations (
id INT NOT NULL AUTO_INCREMENT,
cid TINYINT UNSIGNED NOT NULL,
aid TINYINT UNSIGNED NOT NULL,
PRIMARY KEY(id)
) ENGINE=InnoDB;
The table will probably be smaller than your 300MB. Let it spill to disk if it needs to. (You have very little control anyway.) How big can the ids be? I picked the smallest datatype, but that assumes the numbers are between 0 and 255. Pick a larger datatype if needed.
Then do
SELECT COUNT(DISTINCT cid, aid)
FROM Operations;
That seems to be "number of nodes with distinct value on attributes cId and aId", but it looks like the answer will be 2, namely [2,5] and [2,6]. If Info_id
is relevant to the counting, please explain how. (That may necessitate including it in the LOAD DATA
.
If you are exporting data from a spreadsheet,... Well, I see XML as an awful way to do it. Using a CSV file is easier and faster as an RDBMS-friendly format. After that, 3 SQL statements achieve the goal: CREATE TABLE, LOAD DATA, SELECT.
-
Thanks but I was looking at OpenXML doc, is there any way to read using openxml and count on demandHimanshuman– Himanshuman2023年05月23日 15:56:29 +00:00Commented May 23, 2023 at 15:56
-
Sorry, I don't know OpenXML. And my dislike for XML will probably keep me from ever learning it.Rick James– Rick James2023年05月23日 16:11:45 +00:00Commented May 23, 2023 at 16:11