28

Implementing a system where, when it comes to the heavy mathematical lifting, I want to do as little as possible.

I'm aware that there are issues with memoisation with numpy objects, and as such implemented a lazy-key cache to avoid the whole "Premature optimisation" argument.

def magic(numpyarg,intarg):
 key = str(numpyarg)+str(intarg)
 try:
 ret = self._cache[key]
 return ret
 except:
 pass
 ... here be dragons ...
 self._cache[key]=value
 return value

but since string conversion takes quite a while...

t=timeit.Timer("str(a)","import numpy;a=numpy.random.rand(10,10)")
t.timeit(number=100000)/100000 = 0.00132s/call

What do people suggest as being 'the better way' to do it?

asked Mar 22, 2011 at 4:09
2

3 Answers 3

30

Borrowed from this answer... so really I guess this is a duplicate:

>>> import hashlib
>>> import numpy
>>> a = numpy.random.rand(10, 100)
>>> b = a.view(numpy.uint8)
>>> hashlib.sha1(b).hexdigest()
'15c61fba5c969e5ed12cee619551881be908f11b'
>>> t=timeit.Timer("hashlib.sha1(a.view(numpy.uint8)).hexdigest()", 
 "import hashlib;import numpy;a=numpy.random.rand(10,10)") 
>>> t.timeit(number=10000)/10000
2.5790500640869139e-05
answered Mar 22, 2011 at 4:21
3
  • 6
    Nice! For multidimensional arrays this gives a different hash (for the "same" array) depending on whether it's fortran or c contiguous. If that's an issue, calling np.ascontiguousarray should solve it. Commented Jan 27, 2014 at 16:00
  • Not sure why a known slow hash function sha1 is chosen. SHA-1 is OK for minimising hash collision but poor at speed. For speed you'll need something like murmurhash or xxhash (the latter claims to be even faster). Commented Aug 5, 2015 at 10:06
  • @CongMa, thanks for the extra info. There are lots of options! But as you'll notice, this is already two orders of magnitude faster. And speed is never the only concern. It's probably worth using a well-understood hash if the alternative is only a few millionths of a second faster. Commented Aug 8, 2015 at 11:24
7

There is a package for this called joblib. Found from this question.

from joblib import Memory
location = './cachedir'
memory = Memory(location)
# Create caching version of magic
magic_cached = memory.cache(magic)
result = magic_cached(...)
# Or (for one-time use)
result = memory.eval(magic, ...)
Erik
2,6202 gold badges17 silver badges26 bronze badges
answered Mar 22, 2011 at 6:55
1
  • 1
    It would be better to have a quote from those links copied over in your answer, in case these websites go offline. Commented Sep 16, 2018 at 18:30
2

For small numpy arrays also this might be suitable:

tuple(map(float, a))

if a is the numpy array.

answered Aug 5, 2014 at 6:15
1
  • Oh yes, tuple is hashable in comparison with list! Commented Oct 1, 2017 at 14:48

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.