Implementing a system where, when it comes to the heavy mathematical lifting, I want to do as little as possible.
I'm aware that there are issues with memoisation with numpy objects, and as such implemented a lazy-key cache to avoid the whole "Premature optimisation" argument.
def magic(numpyarg,intarg):
key = str(numpyarg)+str(intarg)
try:
ret = self._cache[key]
return ret
except:
pass
... here be dragons ...
self._cache[key]=value
return value
but since string conversion takes quite a while...
t=timeit.Timer("str(a)","import numpy;a=numpy.random.rand(10,10)")
t.timeit(number=100000)/100000 = 0.00132s/call
What do people suggest as being 'the better way' to do it?
3 Answers 3
Borrowed from this answer... so really I guess this is a duplicate:
>>> import hashlib
>>> import numpy
>>> a = numpy.random.rand(10, 100)
>>> b = a.view(numpy.uint8)
>>> hashlib.sha1(b).hexdigest()
'15c61fba5c969e5ed12cee619551881be908f11b'
>>> t=timeit.Timer("hashlib.sha1(a.view(numpy.uint8)).hexdigest()",
"import hashlib;import numpy;a=numpy.random.rand(10,10)")
>>> t.timeit(number=10000)/10000
2.5790500640869139e-05
-
6Nice! For multidimensional arrays this gives a different hash (for the "same" array) depending on whether it's fortran or c contiguous. If that's an issue, calling
np.ascontiguousarray
should solve it.jorgeca– jorgeca2014年01月27日 16:00:24 +00:00Commented Jan 27, 2014 at 16:00 -
Not sure why a known slow hash function
sha1
is chosen. SHA-1 is OK for minimising hash collision but poor at speed. For speed you'll need something likemurmurhash
orxxhash
(the latter claims to be even faster).Cong Ma– Cong Ma2015年08月05日 10:06:26 +00:00Commented Aug 5, 2015 at 10:06 -
@CongMa, thanks for the extra info. There are lots of options! But as you'll notice, this is already two orders of magnitude faster. And speed is never the only concern. It's probably worth using a well-understood hash if the alternative is only a few millionths of a second faster.senderle– senderle2015年08月08日 11:24:01 +00:00Commented Aug 8, 2015 at 11:24
There is a package for this called joblib. Found from this question.
from joblib import Memory
location = './cachedir'
memory = Memory(location)
# Create caching version of magic
magic_cached = memory.cache(magic)
result = magic_cached(...)
# Or (for one-time use)
result = memory.eval(magic, ...)
-
1It would be better to have a quote from those links copied over in your answer, in case these websites go offline.Alex Fortin– Alex Fortin2018年09月16日 18:30:26 +00:00Commented Sep 16, 2018 at 18:30
For small numpy arrays also this might be suitable:
tuple(map(float, a))
if a
is the numpy array.
-
Oh yes, tuple is hashable in comparison with list!Maksym Ganenko– Maksym Ganenko2017年10月01日 14:48:22 +00:00Commented Oct 1, 2017 at 14:48
str(a)
only shows a part of the array, as would be later pointed out in this comment: stackoverflow.com/questions/16589791/…