Best Practice - XML To Excel

Question 1

I've to read a big XML file with a lot of information. Afterwards I extract the needed information (~20 Points(columns) / ~80 relevant Data (rows, some of them with subdatasets) and write them out in a Excel File.

My Question is how to handle the extraction (of unused Data) part,

should I copy the whole file and delete the unused parts, and then write it to excel
or is it a good approach to create Objects for each column?
should I write the whole xml to excel and start to delete rows in excel?

What would be performant and a acceptable solution?

Question 2

This is more like stack overflow question. You could easily just export them to csv and to whatever you want with them afterwards.

Question 3

csv formatted text files, is an easier option to consider

Question 4

I'd say do the filtering as part of the processing. Programming in Excel is significantly more painful and limited than any server-side technology you could possibly be using.

CSV as an output format is much easier to work with than Excel proper, and virtually every programming language can easily output CSV without requiring any libraries (even writing your own CSV writer should be doable in about an hour or so); as long as you're only interested in the plain data, no formulae, multi-sheet workbooks or layout, CSV should be fine.

Now, depending on the size of the XML input, I'd either:

a) Read the entire XML file into memory, parse it into a DOM tree, and use XPath to extract the information you want. If the transformation is nontrivial, consider using XSLT (you'll need a tiny bit of post-processing, because generating valid CSV with XSLT is unnecessarily complicated). Because the DOM tree needs to fit into RAM in its entirety, this is only doable for smaller XML files (say, up to the tens-of-megabytes range).

For larger documents:

b) Use a SAX parser to walk the document, outputting the relevant nodes as you go. This is a bit harder to write, because you'll be doing input and output intertwined, but it has the advantage that memory requirements are linear with the depth of the tree rather than its total size (that is, the SAX parser only keeps the path from the document root up to the current node in memory, not the entire DOM).

tdammers tdammers 53k15 gold badges112 silver badges155 bronze badges · Accepted Answer · 2012-09-21 08:35:20Z

I'd say do the filtering as part of the processing. Programming in Excel is significantly more painful and limited than any server-side technology you could possibly be using.

CSV as an output format is much easier to work with than Excel proper, and virtually every programming language can easily output CSV without requiring any libraries (even writing your own CSV writer should be doable in about an hour or so); as long as you're only interested in the plain data, no formulae, multi-sheet workbooks or layout, CSV should be fine.

Now, depending on the size of the XML input, I'd either:

a) Read the entire XML file into memory, parse it into a DOM tree, and use XPath to extract the information you want. If the transformation is nontrivial, consider using XSLT (you'll need a tiny bit of post-processing, because generating valid CSV with XSLT is unnecessarily complicated). Because the DOM tree needs to fit into RAM in its entirety, this is only doable for smaller XML files (say, up to the tens-of-megabytes range).

For larger documents:

b) Use a SAX parser to walk the document, outputting the relevant nodes as you go. This is a bit harder to write, because you'll be doing input and output intertwined, but it has the advantage that memory requirements are linear with the depth of the tree rather than its total size (that is, the SAX parser only keeps the path from the document root up to the current node in memory, not the entire DOM).

Stack Exchange Network

Best Practice - XML To Excel

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Best Practice - XML To Excel

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions