1

I am reading a single XML file of size- 2.6GB-- the size of JVM is 6GB.

However I am still getting a Heap Space out of memory error?

What am I doing wrong here...

For reference, I output the max memory and free memory properties of the JVM--

The max memory was shown as approx 5.6GB, but free memory was shown as only 90MB... Why is only 90MB being shown as free, esp. when I have not even started any processing... I have just started the program?

asked Dec 28, 2012 at 16:52
4
  • 1
    What operating system are you using? Some have limits on how much memory one process can consume... i believe 32bit windows is max 2gb. Commented Dec 28, 2012 at 16:54
  • 4
    2.6GB XML - OMG! Use a database!! Storing a XML File in memory will use much more space than the flat file on disk, because of all the node objects, child lists, attribute objects, etc. Commented Dec 28, 2012 at 16:54
  • @jlordo - reading an XML file with SAX or into a DOM can be a perfectly appropriate thing to do. Depending on the requirements, a database could actually be the worst possible solution. IMHO... Commented Dec 28, 2012 at 17:08
  • 1
    @paulsm4 I agree, it can be perfectly appropriate. A 2.6GB XML is will never be suitable for DOM representation, and only good for SAX in a few cases. At this amount of data a database is a good choice because of memory consumption, query and manipulation opportunities and access speed. Commented Dec 28, 2012 at 17:13

4 Answers 4

8

In general, when converting structured text to the corresponding data structures in Java you need a lot more space than the size of the input file. There is a lot of overhead associated with the various data structures that are used, apart from the space required for the strings.

For example, each String instance has an additional overhead of about 32-40 bytes - not to mention that each character is stored in two bytes, which effectively doubles the space requirements for ASCII-encoded XML.

Then you have additional overhead when storing the String in a structure. For example, in order to store a String instance in a Map you will need about 16-32 bytes of additional overhead, depending on the implementation and how you measure the usage.

It is quite possible that 6GB is just not enough to store a parsed 2.6GB XML file at once...

Bottom line:

If you are loading such a large XML file in memory (e.g. using a DOM parser) you are probably doing something wrong. A stream-based parser such as SAX should have far more modest requirements.

Alternatively consider transforming the XML file into a more usable file format, such as an embedded database - or even an actual server-based database. That would allow you to process far larger documents without issues.

answered Dec 28, 2012 at 16:55

5 Comments

+1 each byte turns into a 16-bit char at a minimum, each String which is every part of the file has about 32 byte of overhead.
Here is an example where a 5MB XML File uses 60MB memory when read in java.
"It is quite possible that 6GB is just not enough to store a parsed 2.6GB XML file at once". True. But the point is 1) Make sure you're running a 64-bit JVM (one that can use more than 2GB!), 2) use a tool like VisualVM (nee JConsole) to analyze exactly how much memory is being used, and where it's going. IMHO...
@paulsm4: Considering that typical DOM parsers require four times the size of the input XML file, the fact that if a 2.6GB file exists then a 4GB file is not unlikely and the typical memory available on most low-end and mid-grade servers I'd argue that this approach is broken by design and no amount of configuration, cajoling or prayer will save it.
Oh, and lets not forget that after you load the data in memory, you still need even more memory to get the actual work done...
1

You should avoid loading the entire xml into memory at once and instead use a specialized class that can deal with large amounts of xml.

answered Dec 28, 2012 at 16:57

1 Comment

Absolutely. Specifically, a SAX based class that reads just that portion of the XML that's of immediate interest.
1

There are potentially several different issues here.

But for starters:

1) If you're on a 64-bit OS, make sure you're using a 64-bit JVM

2) Make sure your code closes all resources you open as promptly as possible.

3) Explicitly set references to large objects you're done with to "null".

... AND ...

4) Familiarize yourself with JConsole or VisualVM:

answered Dec 28, 2012 at 16:58

1 Comment

Generally the JVM will refuse to start if too large a heap size is specified, e.g. if not enough physical memory is available or a 32-bit JVM is used and too much memory is requested. I assume that if the OP has managed to start a JVM with -Xmx6144m, then they are actually using a 64-bit OS and JVM...
1

You can't load a 2.6 GB XML image as a document with just 6 GB. As jhordo suggests, the ratio is more likely to be be 12 to 1. This is because every byte turns into a 16-bit character and every tag, attribute and value turns into a String with at least 32 bytes of overhead.

Instead what you should do is use a SAX or event based parser to process the file progressively. This way it will only keep as much data as you need to retain. If you can process everything in one pass, you won't need to retain anything.

answered Dec 28, 2012 at 17:04

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.