To test the claims that Pypy JIT is significantly faster I wrote a simple code that repeatedly adds two arrays of size 1200x1200. My code is as follows
import numpy as np
import random
a=np.zeros((1200, 1200), dtype=np.float32)
b=np.zeros((1200, 1200), dtype=np.float32)
import timeit
#Start timer
start=timeit.default_timer()
#Initialize the arrays
for j in range(1200):
for k in range(1200):
a[j][k]=random.random()
b[j][k]=random.random()
#Repeatedly add the arrays
for j in range(10):
a=np.add(a,b)
#Stop timer and display the results
stop=timeit.default_timer()
print stop-start
With normal python the time taken for execution is about 1.2 - 1.5sec. However with Pypy it it more than 15sec? Also in the above case I have added the arrays only 10 times. If I increase this value to 1000, my computer stops responding. I found that this was because almost the entire RAM was consumed while using pypy. Am I doing something wrong? Or is the issue something else?
2 Answers 2
The JIT cannot help in this case since very little time is spent in python code. NumPy is written in C, so the JIT cannot look into that code and make it faster. In fact, PyPy suffers in this case since the impedance match between PyPy, written in RPython, and NumPy, written in C, means that each time a NumPy function is called from PyPy additional conversion code must run to prepare and call the C function.
CFFI was specifically written for this use case, the conversions necessary to call into C are already taken care of at object creation time, so the program can more seamlessly run both.
The memory problems are a separate issue and should be fixed, see the answer to the garbage collection question above.
pypy doesn't do garbage collection in numpy arrays in all circumstances, and that's likely the reason you're running out of memory, spilling to disk, and then locking up.
numpy.ndarray objects not garbage collected
Reducing numpy memory footprint in long-running application
There are two solutions. The easiest is to simply tell pypy to delete the array by doing:
import gc
del my_array
gc.collect()
This will force pypy to do a garbage collection. Note that gc.collect() shouldn't be put into tight loops unless it's really really necessary.
The second, more manual solution is to make arrays yourself with CFFI, and tell numpy about them with the arrays interface: https://docs.scipy.org/doc/numpy/reference/arrays.interface.html
This way you can still manipulate the struct from numpy, but you have the control to delete/resize the array manually.
random.random()
with a constant value1
or something? Lacking any experience with Pypy I'm just guessing, but maybe this function does not like to be JITed.np.random.rand(1200, 1200).astype(np.float32)
.