4

I have a superclass and a subclass that need to handle their initialisation differently, based on a regular expression. See below for a working example.

import os
import re
class Sample:
 RE = r'(?P<id>\d+)'
 STRICT_MATCHING = False
 def __init__(self, f):
 self.file = f
 self.basename = os.path.basename(os.path.splitext(self.file)[0])
 re_ = re.compile(self.RE)
 match = re_.fullmatch if self.STRICT_MATCHING else re_.match
 self.__dict__.update(match(self.basename).groupdict())
class DetailedSample(Sample):
 RE = r'(?P<id>\d+)_(?P<dir>[lr])_(?P<n>\d+)'
 STRICT_MATCHING = True
s1 = Sample("/asdf/2.jpg")
print(s1.id)
s2 = DetailedSample("/asdfadsf/2_l_2.jpg")
print(s2.id, s2.dir, s2.n)

This code works but it has two drawbacks:

  • The regular expression gets recompiled every time a new Sample is initialised.
  • The match function cannot be called from other class methods in Sample (for instance, I might want to be able to check whether a file has a valid name - relative to RE - before initialising a Sample from it).

To put it simply, I'd like to have something like this:

class Sample:
 RE = r'(?P<id>\d+)'
 STRICT_MATCHING = False
 re_ = re.compile(RE) #
 match = re_.fullmatch if STRICT_MATCHING else re_.match #
 def __init__(self, f):
 self.file = f
 self.basename = os.path.basename(os.path.splitext(self.file)[0])
 self.__dict__.update(self.match(self.basename).groupdict())
 @classmethod
 def valid(cls, f):
 basename, ext = os.path.splitext(os.path.basename(f))
 return cls.match(basename) and ext.lower() in ('.jpg', '.jpeg', '.png')
class DetailedSample(Sample):
 RE = r'(?P<id>\d+)_(?P<dir>[lr])_(?P<n>\d+)'
 STRICT_MATCHING = True

This, however, obviously won't work in the subclasses, because the two lines marked with # won't execute after the redefinition of RE and STRICT_MATCHING in the subclass.

Is there an approach that will:

  • keep the functionality of the first approach (i.e. regex-based intialisation);
  • only compile the regex and define the match method once per subclass;
  • allow the match method to be called from class methods;
  • only require the redefinition of the regex string and STRICT_MATCHING parameter in the subclasses?
asked Nov 22, 2019 at 13:59
7
  • You could always use metaclasses to do this, but I'm not sure if that is a solution or another problem... ;-) Commented Nov 22, 2019 at 14:30
  • Your approach will fail as the lines RE = ... up to match = are only evaluated once on module load. Therefore changing RE = ... and STRICT_MATCHING = ... has no effect. Commented Nov 22, 2019 at 15:33
  • You could simply provide RE = re.compile(r'...') yourself, rather than having the class call re.compile for you. Commented Nov 22, 2019 at 18:48
  • @thebjorn __init_subclass__ was introduced precisely to avoid the need for a full-blown metaclass here :) Commented Nov 22, 2019 at 19:08
  • The re module keeps cached the most recently compiled patterns, so there may be no penalty to repeatedly compiling the same few patterns over and over (it doesn't actually do extra work if they're still in the cache). Commented Nov 22, 2019 at 19:08

3 Answers 3

4

You can use __init_subclass__ to make sure each subclass does the appropriate work. This would be defined in a private base class that your public base class inherits from.

import os
import re
class _BaseSample:
 RE = r'(?P<id>\d+)'
 STRICT_MATCHING = False
 def __init_subclass__(cls, **kwargs):
 super().__init_subclass__(**kwargs)
 cls._re = re.compile(cls.RE)
 cls.match = cls._re.fullmatch if cls.STRICT_MATCHING else cls._re.match
class Sample(_BaseSample):
 def __init__(self, f):
 self.file = f
 self.basename = os.path.basename(os.path.splitext(self.file)[0]
 self.__dict__.update(self.match(self.basename).groupdict())
class DetailedSample(Sample):
 RE = r'(?P<id>\d+)_(?P<dir>[lr])_(?P<n>\d+)'
 STRICT_MATCHING = True
s1 = Sample("/asdf/2.jpg")
print(s1.id)
s2 = DetailedSample("/asdfadsf/2_l_2.jpg")
print(s2.id, s2.dir, s2.n)

Unless you need direct access to the compiled regular expression later, _re can be a local variable to _BaseSample.__init_subclass__ rather than a class attribute of each class.

Note that __init_subclass__ can also take additional keyword arguments, supplied as keyword arguments to the class statement itself. I don't think there is any particular benefit to doing that; it's just a matter of what interface you want to provide for setting RE and STRICT_MATCHING. See Customizing Class Creation for details.

answered Nov 22, 2019 at 19:02
Sign up to request clarification or add additional context in comments.

2 Comments

I always forget about __init_subclass__ - you're right, this is the probably best way to do it.
I like this solution best, it seems the most pythonic to me, since it's just overriding a method that was created for exactly this purpose. It should be noted that __init_subclass__ is only available in python>=3.6, so in older versions, you'd have to go with one of the class decorator approaches.
3

You can do this by decorating the classes.

This decorator inspects the STRICT_MATCHING attribute and sets the match attribute accordingly.

def set_match(cls):
 match = cls.RE.fullmatch if cls.STRICT_MATCHING else cls.RE.match
 setattr(cls, 'match', match)
 return cls
@set_match
class Sample:
 RE = re.compile(r'(?P<id>\d+)')
 STRICT_MATCHING = False
 def __init__(self, f):
 self.file = f
 self.basename = os.path.basename(os.path.splitext(self.file)[0])
 self.__dict__.update(self.match(self.basename).groupdict())
@set_match
class DetailedSample(Sample):
 RE = re.compile(r'(?P<id>\d+)_(?P<dir>[lr])_(?P<n>\d+)')
 STRICT_MATCHING = True

The same effect could be obtained using a metaclass:

class MetaMatchSetter(type):
 def __new__(cls, clsname, bases, clsdict):
 rgx = clsdict['RE']
 match = rgx.fullmatch if clsdict['STRICT_MATCHING'] else rgx.match
 clsdict['match'] = match
 return super().__new__(cls, clsname, bases, clsdict)
class Sample(metaclass=MetaMatchSetter):
 ...
class DetailedSample(Sample):
 ...

But using a class decorator (or __init_subclass__ as described in chepner's answer) is more readable and understandable, in my view.

answered Nov 22, 2019 at 18:44

1 Comment

Not only is it more readable, but since metaclasses are not, in general, composable, introducing a metaclass can make future inheritance more difficult. Someone trying to subclass both Sample and a class using a different metaclass could run into a conflict.
2

You could cache/memoize the compiled regular expressions as mentioned on wiki.python.org and you need to use the class attributes instead if the instance attributes:

import os
import re
import functools
def memoize(obj):
 cache = obj.cache = {}
 @functools.wraps(obj)
 def memoizer(*args, **kwargs):
 if args not in cache:
 cache[args] = obj(*args, **kwargs)
 return cache[args]
 return memoizer
@memoize
def myRegExpCompiler(*args):
 print("compiling")
 return re.compile(*args)
class Sample:
 RE = r'(?P<id>\d+)'
 STRICT_MATCHING = False
 def __init__(self, f):
 self.file = f
 self.basename = os.path.basename(os.path.splitext(self.file)[0])
 re_ = myRegExpCompiler(self.__class__.RE) # use cls method!
 match = re_.fullmatch if self.__class__.STRICT_MATCHING else re_.match # use cls method!
 self.__dict__.update(match(self.basename).groupdict())
class DetailedSample(Sample):
 RE = r'(?P<id>\d+)_(?P<dir>[lr])_(?P<n>\d+)'
 STRICT_MATCHING = True
s1 = Sample("/asdf/2.jpg")
print(s1.id)
s2 = DetailedSample("/asdfadsf/2_l_2.jpg")
print(s2.id, s2.dir, s2.n)
s3 = DetailedSample("/asdfadsf/2_l_2.jpg")
print(s3.id, s3.dir, s3.n)

Output:

compiling
2
compiling
2 l 2
2 l 2

... as you can see the reg. expressions are compiled just two times.

answered Nov 22, 2019 at 14:23

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.