When trying to load a large text file into memory I get this:
Python(24297,0xa0d291a8) malloc: *** mach_vm_map(size=717418496) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
MemoryError
The code that's causing it is:
with open('list.txt') as f:
total = sum(1 for _ in f)
Is there a native python way to take care of this?
asked Aug 6, 2014 at 1:29
user3893623
3211 gold badge5 silver badges15 bronze badges
1 Answer 1
Wild guess:
You are running the above code on a binary file that contains no (or very few) newlines. Thus, the attempt to read one line reads one very long line.
Try this instead:
with open('list.txt') as f:
total = sum(block.count('\n')
for block in iter(lambda: f.read(64*1024), ''))
answered Aug 6, 2014 at 2:19
Robφ
170k20 gold badges251 silver badges323 bronze badges
Sign up to request clarification or add additional context in comments.
4 Comments
Robφ
@Matt3o12 - In your comment to the original post, you say that you created lines 110,000 bytes long. That's relatively short. Trying creating a 1GB file with no newlines in it. For example:
dd if=/dev/zero of=/tmp/list.txt bs=1M count=1K.Matt3o12
dd: count: illegal numeric value. Anyway, I now have a third testcase that is almost 1GB and python still doesn't need more than 20MB.Matt3o12
Ok, I now have a 2GB file with only zeros, but python only took 2GB of my memory but crashed eventually:
SystemError: Negative size passed to PyString_FromStringAndSize. 2,489,971,200 is too big for an integer. Anyway, with 16GB Ram, there should be a memory issue. Reading in chunks works fine, though (and it is quicker than I thought).user3893623
This worked, but it printed the count of 51 million lines (not just one)
lang-py
wc -l list.txtwill solve it