1
\$\begingroup\$

In a service I'm developing I'm using three cloud services (five if you count additional two streaming services I have to create that are used by those cloud services in turn) and to orchestrate bits and pieces incoming I need a cache for objects relating data from all those services. That cache has to be is safe to access from various Tornado handlers in an atomic manner.

Now, since Tornado is based on I/O loop, theoretically I do not need to use locks or some other synchronization primitives... but that's obviously relying on implementation detail (GIL) and it can reportedly cause problems under some circumstances anyway, so to be rather safe than sorry I try to develop the cache that is all:

  • Singleton
  • Thread-safe
  • Creates a default object available under a key

I'm approaching the subject with some trepidation as so far I have not done such stuff much. This is what I've come up with so far:

UPDATE

I've realized I had a stupid mistake in the previous implementation: the setdefault method called default_factory every time. Updated version works but it's kinda ugly:

class ThreadSafeSingletonCache(object):
 __me = None
 __cache = None
 __lock = threading.Lock()
 def __new__(cls, _):
 if cls.__me is None:
 cls.__cache = {}
 cls.__me = super(ThreadSafeSingletonCache, cls).__new__(cls)
 return cls.__me
 def __init__(self, default_factory):
 self.default_factory = default_factory
 def obj_setattr(self, cache_key, obj_attr, value):
 with self.__lock:
 item = self.__like_setdefault(cache_key)
 setattr(item, obj_attr, value)
 def __like_setdefault(self, cache_key):
 ''' Unfortunately I cannot use .setdefault as default_factory function gets called on each setdefault use'''
 item = self.__cache.get(cache_key)
 if not item:
 item = self.default_factory()
 self.__cache[cache_key] = item
 return item
 def get(self, cache_key):
 with self.__lock:
 return self.__like_setdefault(cache_key)
 def set(self, cache_key, obj):
 with self.__lock:
 self.__cache[cache_key] = obj

Please point out problems and potential improvements.

asked Feb 18, 2019 at 10:14
\$\endgroup\$
5
  • \$\begingroup\$ I don't get it... If the default factory could change at each instantiation, why design a singleton; and if it doesn't change, why use it as a parameter? \$\endgroup\$ Commented Feb 18, 2019 at 11:33
  • \$\begingroup\$ @MathiasEttinger: Single Responsibility Principle? Implementing the cache that is not tightly coupled to particular factory? \$\endgroup\$ Commented Feb 18, 2019 at 12:12
  • \$\begingroup\$ Well, I guess an example usage would help here as I have a hard time understanding why a singleton would fit if you want to create one for each kind of objects you own. \$\endgroup\$ Commented Feb 18, 2019 at 12:19
  • \$\begingroup\$ @MathiasEttinger: in this particular case there's only one cache and only one type of objects I put in cache as value, so if the code were limited to this program I'm developing now only, sure, there would be no reason. However, I could not use it if I put this class in a package and tried to use it elsewhere, no? The code would not be reusable (plus not open-close, that is open for extension but closed for modification). \$\endgroup\$ Commented Feb 18, 2019 at 12:26
  • \$\begingroup\$ Your last edit did not invalidate any answers, but please keep in mind that editing the code in a question (or providing a newer version) is frowned and often unacceptable the moment answers are posted. Please see what you may and may not do after receiving answers . Just a heads-up to keep in mind for your future activity on this site. \$\endgroup\$ Commented Feb 18, 2019 at 12:53

2 Answers 2

2
\$\begingroup\$

I would consider using get(cache_key) within obj_setattr, rather than re-implementing it.

Another thing to look at would be to use a decorator to activate the lock, instead of writing the with every time. This would scale better in case you want to do some preprocessing / postprocessing later in time.

Toby Speight
87.1k14 gold badges104 silver badges322 bronze badges
answered Feb 18, 2019 at 10:35
\$\endgroup\$
1
  • \$\begingroup\$ This looks like a good suggestion to improve the code; therefore it's correct to write an answer rather than a comment. Welcome to Code Review; please stay around and contribute some more! \$\endgroup\$ Commented Feb 18, 2019 at 15:56
1
\$\begingroup\$

The design may sound fine as long as you’re using a single default_factory throughout your application; but this design permit more than that and does not handle it properly.

You could have two parts of your application (say two tornado.web.RequestHandler or two threading.Thread) that uses your singleton like:

# part 1
cache = ThreadSafeSingletonCache(int)
cache.set(username, user_id)
count = cache.get('count')
cache.set('count', count + 1)
...
# part 2
cache = ThreadSafeSingletonCache(dict)
settings = cache.get(username)
...

This may not be how you use it, but it is something you can do. So, in this scenario, what could possibly go wrong? Well, both parts could start executing "at the same time", run the cache = ... line one after the other and then, either count can be a dict instead of an int or settings can be an int instead of a dict.

There are various ways to solve this, but having to specify the internal type held by your singleton at each instantiation is not one of them. Ideally you want either:

  1. to let the singleton aspect out of the cache and let the calling code decide on its own if it needs a single instance or not (using global objects, shared state, or whatever...);
  2. to specify a single time which kind of object is stored in the cache, ideally in a setup phase before spawning threads/IO loop;
  3. to specify the kind of queried objects at the get call, like a regular dict.get call.

Implementing the first solution is very easy, just drop the __new__ method and call it a day, it is not your problem anymore. The second one could use an init like:

def __init__(self, default_factory=None):
 factory = self.default_factory
 if factory is None:
 if default_factory is None:
 raise RuntimeError('must specify a factory at first instantiation')
 self.default_factory = default_factory
 elif default_factory is not None:
 raise RuntimeError('should not specify a default factory more than once')

Of course, you’ll need to define the default_factory = None argument at class level for this to work. This implementation is not wrapped into locks because this could allow a call with and without default_factory to compete and run into a RuntimeError anyway. So the first call must be done in a race-condition-free part of the code.

The last solution is by far my favorite as it is the most versatile and the least surprising: just mimic dict.get and ask for either a default value or a default constructor.


Also for this particular class, I would implement __getitem__ and __setitem__ for ease of use. And, for stylistic reasons, recommend the Borg pattern. e.g.:

class ThreadSafeCache:
 __shared_state = {
 'default_factory': None,
 '_lock': threading.Lock(),
 '_cache': {},
 }
 def __init__(self, default_factory=None):
 self.__dict__ = self.__class__.__shared_state
 factory = self.default_factory
 if factory is None:
 if default_factory is None:
 raise RuntimeError('must specify a factory at first instantiation')
 self.default_factory = default_factory
 elif default_factory is not None:
 raise RuntimeError('should not specify a default factory more than once')
answered Feb 18, 2019 at 13:27
\$\endgroup\$
2
  • \$\begingroup\$ Thank you for pointing out that problem with default_factory. Re Borg pattern: well, I started with it.. until I've ran into severe criticisms of Borg pattern as such, see e.g. stackoverflow.com/questions/34568923/borg-design-pattern At this point I'm not really sure what to think about it.. \$\endgroup\$ Commented Feb 18, 2019 at 16:55
  • \$\begingroup\$ @LetMeSOThat4U Well, if you want to keep the default_factory parameter, you won't be able to use a module globals to store your singleton. So you're left with overriding __new__ or using the borg pattern. At this point this is stylistic preferences to me, as the overhead of creating a new borg each time is way less than a blocking lock. I do agree, however, that if you go the dict.get route, a module globals is way simpler and more adapted. See for instance Django's settings or tornado's options. \$\endgroup\$ Commented Feb 18, 2019 at 22:25

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.