Python lazy dictionary

Question 1

Here is a lazy dictionary with test cases.

"""
A dictionary whose values are only evaluated on access.
"""
from functools import partial
from UserDict import IterableUserDict
class LazyDict(IterableUserDict, object):
 """
 A lazy dictionary implementation which will try
 to evaluate all values on access and cache the
 result for later access.
 """
 def set_lazy(self, key, item, *args, **kwargs):
 """
 Allow the setting of a callable and arguments
 as value of dictionary.
 """
 if callable(item):
 item = partial(item, *args, **kwargs)
 super(LazyDict, self).__setitem__(key, item)
 def __getitem__(self, key):
 item = super(LazyDict, self).__getitem__(key)
 try:
 self[key] = item = item()
 except TypeError:
 pass
 return item
def test_lazy_dict():
 """
 Simple test cases for `LazyDict`.
 """
 lazy_dict = LazyDict({1: 1, 2: lambda: 2})
 assert lazy_dict[2] == 2
 lazy_dict[3] = 3
 assert lazy_dict[3] == 3
 def joiner(*args, **kwargs):
 sep = kwargs.pop('sep', ' ')
 kwargs = [
 '%s=%s' % (k, v)
 for k, v in sorted(kwargs.iteritems())]
 return sep.join(list(args) + kwargs)
 lazy_dict.set_lazy(
 4, joiner, 'foo', 'bar', name='test', other='muah', sep=' ')
 assert lazy_dict[4] == 'foo bar name=test other=muah'
 assert lazy_dict.get('5') is None
 # Test caching functionality.
 def call_at_max(count):
 counter = [0]
 def inner():
 counter[0] += 1
 if counter[0] > count:
 raise AssertionError('Called more than once')
 return 'happy'
 return inner
 call_once = call_at_max(1)
 lazy_dict[5] = call_once
 assert lazy_dict[5] == 'happy'
 assert lazy_dict[5] == 'happy'
 # Test for helper function.
 try:
 call_once()
 except AssertionError:
 assert True

Question 2

What is IterableUserDict?

Question 3

Hi @jonrsharpe IterableUserDict is a dictionary meant for subclassing that supports iteration, see docs.python.org/2/library/userdict.html for documentation.

Question 4

Ah, I see; I assumed that was a custom class, I would generally implement using collections.MutableMapping instead. Note, per the docs, "The need for [UserDict] has been largely supplanted by the ability to subclass directly from dict (a feature that became available starting with Python version 2.2)."

Question 5

Per the documentation for UserDict:

The need for this class has been largely supplanted by the ability to subclass directly from dict (a feature that became available starting with Python version 2.2).

Rather than:

class LazyDict(IterableUserDict, object):

therefore, unless you have a really good reason to want to support 2.1 and earlier, I would use:

class LazyDict(dict):

or base it on collections.MutableMapping. This also makes compatibility with 3.x (where UserDict doesn't exist) less complex.

An edge case you may not have thought of: what if the function stored in the dictionary returns a callable object? Then you will get different results depending on when you access the dictionary. If this is intentional, it should be documented. If not, one solution would be to create a LazyCallable class (effectively a custom partial) to store the function and its arguments, so you can check if isinstance(item, LazyCallable) to distinguish between items added via set_lazy and any other callable values.

Overall, I'm not convinced I see the point to this. The function only gets called once, but I'm not sure the layer of additional complexity needed for:

lazy_dict.set_lazy(4, joiner, 'foo', 'bar', name='test', other='muah', sep=' ')

is better than:

vanilla_dict[4] = joiner('foo', 'bar', name='test', other='muah', sep=' ')

In both, the function only gets called once (at slightly different times, admittedly), and the latter doesn't require the reader to know about LazyDict. This also doesn't provide the functionality (that e.g. regular "memoization" does) to dynamically store results of calls to the function with different arguments, so it's only called once for each set of arguments.

I suppose one advantage would be in cases where you aren't sure, when you add the function to the dictionary, whether or not you will ever need to call it. If the function is very computationally complex but not actually needed, you can optimise one call down to zero, but there are probably easier ways to do that. There's no reason the user couldn't put a partial into the dictionary themselves.

Perhaps off-topic, but:

# Test caching functionality.
def call_at_max(count):
 counter = [0]
 def inner():
 counter[0] += 1
 if counter[0] > count:
 raise AssertionError('Called more than once')
 return 'happy'
 return inner

You have hard-coded 'Called more than once', which won't make sense for count != 1. Also, using a list to make a "mutable integer" isn't very neat. I would either make it non-generic (i.e. hard-code the 1, too), or implement as something like:

MULTIPLES = {1: 'once', 2: 'twice'} # add 3: 'thrice' if you like!
# Test caching functionality.
def call_at_most(times):
 def inner():
 inner.counter += 1
 if inner.counter > times:
 raise AssertionError(
 'Called more than {}'.format(
 MULTIPLES.get(times, '{} times').format(times)
 )
 )
 return 'happy'
 inner.counter = 0
 return inner

In use:

>>> test = call_at_most(2)
>>> test()
'happy'
>>> test()
'happy'
>>> test()
Traceback (most recent call last):
 File "<pyshell#51>", line 1, in <module>
 test()
 File "<pyshell#47>", line 8, in inner
 MULTIPLES.get(times, '{} times').format(times)
AssertionError: Called more than twice

I'd also replace assert True with pass.

Question 6

For future readers also wondering what would be the point of deferring the function execution (rather than calling it immediately when building the dictionary): this is extremely useful if said function is slow or expensive (like a remote API call, or something incurring a lot of I/O or CPU), but won't be needed all the time. It is then a very good idea to use some kind of lazy evaluation.

jonrsharpe jonrsharpe 14k2 gold badges36 silver badges62 bronze badges · Accepted Answer · 2015-04-07 12:48:57Z

Per the documentation for UserDict:

The need for this class has been largely supplanted by the ability to subclass directly from dict (a feature that became available starting with Python version 2.2).

Rather than:

class LazyDict(IterableUserDict, object):

therefore, unless you have a really good reason to want to support 2.1 and earlier, I would use:

class LazyDict(dict):

or base it on collections.MutableMapping. This also makes compatibility with 3.x (where UserDict doesn't exist) less complex.

An edge case you may not have thought of: what if the function stored in the dictionary returns a callable object? Then you will get different results depending on when you access the dictionary. If this is intentional, it should be documented. If not, one solution would be to create a LazyCallable class (effectively a custom partial) to store the function and its arguments, so you can check if isinstance(item, LazyCallable) to distinguish between items added via set_lazy and any other callable values.

Overall, I'm not convinced I see the point to this. The function only gets called once, but I'm not sure the layer of additional complexity needed for:

lazy_dict.set_lazy(4, joiner, 'foo', 'bar', name='test', other='muah', sep=' ')

is better than:

vanilla_dict[4] = joiner('foo', 'bar', name='test', other='muah', sep=' ')

In both, the function only gets called once (at slightly different times, admittedly), and the latter doesn't require the reader to know about LazyDict. This also doesn't provide the functionality (that e.g. regular "memoization" does) to dynamically store results of calls to the function with different arguments, so it's only called once for each set of arguments.

I suppose one advantage would be in cases where you aren't sure, when you add the function to the dictionary, whether or not you will ever need to call it. If the function is very computationally complex but not actually needed, you can optimise one call down to zero, but there are probably easier ways to do that. There's no reason the user couldn't put a partial into the dictionary themselves.

Perhaps off-topic, but:

# Test caching functionality.
def call_at_max(count):
 counter = [0]
 def inner():
 counter[0] += 1
 if counter[0] > count:
 raise AssertionError('Called more than once')
 return 'happy'
 return inner

You have hard-coded 'Called more than once', which won't make sense for count != 1. Also, using a list to make a "mutable integer" isn't very neat. I would either make it non-generic (i.e. hard-code the 1, too), or implement as something like:

MULTIPLES = {1: 'once', 2: 'twice'} # add 3: 'thrice' if you like!
# Test caching functionality.
def call_at_most(times):
 def inner():
 inner.counter += 1
 if inner.counter > times:
 raise AssertionError(
 'Called more than {}'.format(
 MULTIPLES.get(times, '{} times').format(times)
 )
 )
 return 'happy'
 inner.counter = 0
 return inner

In use:

>>> test = call_at_most(2)
>>> test()
'happy'
>>> test()
'happy'
>>> test()
Traceback (most recent call last):
 File "<pyshell#51>", line 1, in <module>
 test()
 File "<pyshell#47>", line 8, in inner
 MULTIPLES.get(times, '{} times').format(times)
AssertionError: Called more than twice

I'd also replace assert True with pass.

For future readers also wondering what would be the point of deferring the function execution (rather than calling it immediately when building the dictionary): this is extremely useful if said function is slow or expensive (like a remote API call, or something incurring a lot of I/O or CPU), but won't be needed all the time. It is then a very good idea to use some kind of lazy evaluation.

Stack Exchange Network

Python lazy dictionary

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Python lazy dictionary

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions