Background
I've taught myself Python over the past year-and-a-bit, and would consider myself an intermediate Python user at this point, but never studied computing at school/university. As such, my knowledge of theory is a little weak. Python is the only language I know.
I'm trying to wrap my head around the Liskov Substitution Principle so that I can write better, more object-oriented code.
Question 1: The Python data model
As we know, in Python 3, all custom classes implicitly inherit from object
. But the docstring of object
includes this line:
When called, it accepts no arguments and returns a new featureless instance that has no instance attributes and cannot be given any.
With this in mind, how can any python classes that accept a nonzero number of parameters for their constructors be said to comply with the LSP? If I have a class Foo
, like so:
class Foo:
def __init__(self, bar):
self.bar = bar
then, surely this class definition (and all others like it) violates the contract specified by object
? object
and Foo
cannot be used interchangeably, as object
accepts exactly 0 parameters to its constructor, while Foo
accepts exactly 1.
Question 2: collections.Counter
As Raymond Hettinger tells us, the Python dictionary is an excellent example of the open/closed principle. The dict
class is "open for extension, closed for modification" -- if you wish to modify some of the prepackaged behaviours of a dict
, you're advised to inherit from collections.abc.MutableMapping
or collections.UserDict
instead.
collections.Counter
, however, is a direct subclass of dict
. While it is mostly an extension of dict
rather than a modification of behaviours already defined in dict
, this isn't true for collection.Counter.fromkeys
. With a standard dict
, this classmethod is an alternative constructor for a dictionary, but the method is overridden in Counter
so that it raises an exception. Here is the comment explaining why this is the case:
# There is no equivalent method for counters because the semantics
# would be ambiguous in cases such as Counter.fromkeys('aaabbc', v=2).
# Initializing counters to zero values isn't necessary because zero
# is already the default value for counter lookups. Initializing
# to one is easily accomplished with Counter(set(iterable)). For
# more exotic cases, create a dictionary first using a dictionary
# comprehension or dict.fromkeys().
The explanation makes sense -- but surely this violates the LSP? According to the Wikipedia article, the LSP states that "New exceptions cannot be thrown by the methods in the subtype, except if they are subtypes of exceptions thrown by the methods of the supertype." dict.fromkeys
does not throw a NotImplementedError
; Counter.fromkeys
does.
Comments
I'm interested in whether these examples do, in fact, break the LSP. However, I'm also interested in why they break the LSP, if indeed they do. In what situations is enforcing the LSP necessary/advisable/useful? In what situations is worrying about the LSP more of a meaningless distraction? Etc.
-
4In your entire question, you only talk about classes, subclasses, and superclasses. However, the LSP only talks about types, subtypes, and subtypes. The first thing you need to understand is that a class is not a type and a type is not a class. Start from there.Jörg W Mittag– Jörg W Mittag2021年08月31日 15:11:48 +00:00Commented Aug 31, 2021 at 15:11
-
2LSP is not about inheritance in itself, but about subtyping, and it holds with respect to the type expected by client code. When you write code that uses an object of some type, then that code relies on the abstract behavior of that type, and if you provide a subtype (or a different implementation) instead, that client code shouldn't break. Code that only uses the interface of the 'object' class doesn't do much - you typically don't write such code, so you're not substituting for the 'object' type.Filip Milovanović– Filip Milovanović2021年08月31日 15:14:35 +00:00Commented Aug 31, 2021 at 15:14
-
4Second, objects are often constructed elsewhere and then passed in as class dependencies or function parameters, so the constructor is generally not a part of the abstract type (which in python may be implicit) that is to be substituted by a subtype. In python, the type of the dependency might not be explicitly defined in code (e.g. it could just exist conceptually, defined in the documentation). In a statically typed language it would be the type of the parameter - which might differ from the actual, concrete type of the object passed in (polymorphism).Filip Milovanović– Filip Milovanović2021年08月31日 15:14:42 +00:00Commented Aug 31, 2021 at 15:14
-
Thanks @FilipMilovanović, that's extremely clarifying.Alex Waygood– Alex Waygood2021年08月31日 15:19:19 +00:00Commented Aug 31, 2021 at 15:19
3 Answers 3
I think it's good to go back to the actual definition of the Liskov substitution principle:
Subtype Requirement: Let φ(x) be a property provable about objects x of type T. Then φ(y) should be true for objects y of type S where S is a subtype of T.
Note, the principle only refers to properties of objects. And, implicitly, only public properties, because the Liskov substitution principle is interested in behavioral typing — typing according to observable properties.
With that in mind...
Answer 1
With this in mind, how can any python classes that accept a nonzero number of parameters for their constructors be said to comply with the LSP?
There are two parts to this. First, __init__
is not a constuctor. __new__
is the constructor — the method that actually constructs a new class instance from whole cloth. __init__
is just a private method that is called by __new__
on the new instance. And since it's private, it's not part of your type and not subject to the Liskov substitution principle.
What about __new__
then? All the parameters of __init__
are by default implicitly parameters of __new__
, so am I just kicking the can down the road? __new__
is a static method, so it's not a property of an instance of object
— it's part of object
itself. So it's not subject to the Liskov substitution principle for instance of object
either.
(Answer 1 Digression)
Here's where it gets kind of interesting (to me, at least). object
is a class, but in Python classes are objects. They're instances of type
, which is called their metaclass. So while __new__
is a static method, that means it's an instance method of object
itself.1 So, it is subject to the Liskov substitution principle for instance of type
. And if we look at the definition of __new__
in type
, we see:
__new__(*args, **kwargs) method of builtins.type instance
Create and return a new object. See help(type) for accurate signature.
So type
's __new__
accepts any and all arguments. Since many classes __init__
methods — and thus their __new__
methods — don't accept arbitrary arguments, those class objects are kind of in violation of the Liskov substitution method as instances of type
. But... as you pointed out later in your question,
New exceptions cannot be thrown by the methods in the subtype, except if they are subtypes of exceptions thrown by the methods of the supertype.
And that's exactly what __new__
does. If you call type.__new__
with different arguments than it expects, it throws a TypeError
:
>>> type.__new__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: type.__new__(): not enough arguments
Which means that all subtypes of type
(i.e., all class objects) are free to throw their own TypeError
s in __new__
, and callers are obligated to handle it. And that's exactly what object.__new__
does, but under different conditions:
>>> object.__new__(object, 'foo', (), {}) # This would be valid for type.__new__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object() takes no arguments
So, by weakening the base type's precondition (the argument list) and opting to instead validate "at runtime" by throwing an exception, the __new__
method is able to meet the Liskov substitution principle as a property of instances of type
.2
This means we really can't just instantiate any arbitrary type in Python (either by calling __new__
or by just calling the type) without knowing the target type, unless we're prepared to catch and handle TypeError
s — and I think that tracks with most programmers' intuitions.
Answer 2
collections.Counter
, however, is a direct subclass ofdict
. While it is mostly an extension ofdict
rather than a modification of behaviours already defined indict
, this isn't true forcollection.Counter.fromkeys
.
The answer here is similar to the previous answer. Since fromkeys
is a class method, it's not really a property of instances of dict
and not subject to the Liskov substitution principle.3
But then, what about if we look at dict
as a class object? Do we run into the same complications we did with object.__new__
? No, we don't, because Counter
and dict
don't have any sort of hierarchical relationship as class objects — they're both direct instances of type
. We can't assume anything about their fromkeys
methods because they didn't inherit them from type
.
On the other hand, in Python a class does inherit all its parents' properties, which includes static and class methods like fromkeys
. So Counter
has to do something with fromkeys
. It could attempt to hide the method, e.g., by replacing it with a descriptor that always throws an AttributeError
, or even just by setting the property to None
. The author of Counter
chose to keep the method visible and to throw NotImplementedError
instead, perhaps to signal that the method is intentionally unusable.4
In the end, the Liskov substitution principle is just an attempt to formalize something very intuitive: don't surprise the users of your code. In that sense, it may be seen a necessary condition for "good code" (whatever that is), but not a sufficient condition.
1 This is a slight lie. __new__
is not an instance method of object
because it doesn't take the receiver (cls
, a.k.a. self
) as a parameter. It's a static method, so it's essentially just a function property of object
— but it doesn't make a difference to the discussion.
2 I put "at runtime" in scare quotes because technically everything in Python is at runtime, but hopefully the distinction is clear.
3 It is possible to call static and class methods through instances, e.g.,
d1 = dict()
d2 = d1.fromkeys(itr)
This just dispatches the method to type(d1)
, which is dict
. As far as I know, it's pretty well accepted that this is a quirk of Python, and we still think of those methods as properties of the type and not properties of the instance. But I suppose, in the strictest sense, that is a violation of the Liskov substitution principle.
4 According to the docs for NotImplementedError
, this is exactly how that exception is not meant to be used.
Note: It should not be used to indicate that an operator or method is not meant to be supported at all — in that case either leave the operator/method undefined or, if a subclass, set it to
None
.
But, I suppose the standard library is allowed to contradict itself.
-
1Out of interest, when you say "the author of
Counter
opted to keepfromkeys
and have it throw an exception rather than just remove the method" -- what would be the way to "remove a method" from a subclass in Python? I'm not sure I've ever seen it done, except maybe through monkey-patching a class after it's been defined, or by using a metaclass.Alex Waygood– Alex Waygood2021年09月01日 11:08:52 +00:00Commented Sep 1, 2021 at 11:08 -
1When I wrote that, I was thinking you could, but now I'm not sure at all. There are tricks to hide a method inherited from a parent class, but not to remove. On reflection, I'm going to rework that part of my answer.Chris Bouchard– Chris Bouchard2021年09月02日 01:33:41 +00:00Commented Sep 2, 2021 at 1:33
-
1Hopefully the new paragraph is more factual and helpful. (I'm realizing I had C++'s
= delete
feature in mind when I wrote that originally.)Chris Bouchard– Chris Bouchard2021年09月02日 01:53:53 +00:00Commented Sep 2, 2021 at 1:53 -
2Not a comment on your answer — but I find the documentation for
NotImplementedError
a little bizarre, frankly. I'm sure I've seen it used like that many times in the standard library, and I'd much prefer a helpfulNotImplementedError
message rather than the inexplicableNoneType is not callable
error message you'd get if you replaced the method withNone
. It was also endorsed the other day by Raymond Hettinger on twitter, who's been a core dev for around 20 years now I believe! (I think he's also the author ofCounter
.) twitter.com/raymondh/status/1430565266136698882?s=21Alex Waygood– Alex Waygood2021年09月02日 07:03:39 +00:00Commented Sep 2, 2021 at 7:03 -
1Yeah, I agree with you there, and I'm pretty sure I've seen it used that way in third-party libraries as well. I do see the utility in having something like the docs describe — a dedicated exception to indicate that you forgot something, kind of like Rust's
todo!()
macro — but then it would be handy to also have aNeverGonnaHappenError
.Chris Bouchard– Chris Bouchard2021年09月03日 01:34:11 +00:00Commented Sep 3, 2021 at 1:34
These don’t really violate LSP.
object and Foo cannot be used interchangeably, as object accepts exactly 0 parameters to its constructor, while Foo accepts exactly 1.
While that is true, they’re never really in a position to be used interchangeably. Constructors are weird beasts. Few languages let you take a type argument and then construct it without knowing what it is, trusting on LSP to ensure the behavior dispatches reliably. And while python can kinda do that, it’s far enough outside of normal usage that LSP guarantees shouldn’t be expected on constructors.
The same thing applies to static methods in general, as languages rarely perform dynamic dispatch on them. There’s simply no opportunity for users to be surprised when a derived type breaks expected behavior.
-
3Arguably classmethods are not part of the interface. You aren't calling
fromKeys
on an instance ofdict
(be it aCounter
or not)Caleth– Caleth2021年08月31日 14:09:37 +00:00Commented Aug 31, 2021 at 14:09 -
That’s a good point. Python is one of my worst languages, so there may be nuance here I’m missing.Telastyn– Telastyn2021年08月31日 14:12:56 +00:00Commented Aug 31, 2021 at 14:12
-
Really interesting, thanks for the response. Some follow-up questions. I've definitely seen it argued lots of times on Stack Overflow (and elsewhere) that it's bad form for a subclass's
__init__
method to take a different number of parameters to the superclass. Would you argue that's generally correct, and incorrect in this case becauseobject
is a special-case at the bottom of all inheritance chains? Or would you argue that it's just much more complex than that, i.e., whether it's okay to modify the signature of__init__
in a subclass will vary from case to case?Alex Waygood– Alex Waygood2021年08月31日 14:21:19 +00:00Commented Aug 31, 2021 at 14:21 -
1@AlexWaygood that seems like a strange restriction. It should probably call
super().__init__()
with appropriate params, but it's fine to hardcode or calculate those as appropriate.Caleth– Caleth2021年08月31日 14:35:34 +00:00Commented Aug 31, 2021 at 14:35 -
1You only need to care about
__init__
matching up if you're doing some weird cooperative multiple inheritance thing (in which case, you don't actually know what class will receive thesuper().__init__()
call, so you have to pass everything through transparently with**kwargs
or possibly*args
, and then the interfaces had darned well better be compatible with each other).Kevin– Kevin2021年09月01日 01:31:14 +00:00Commented Sep 1, 2021 at 1:31
Thanks, all, for the helpful responses -- much appreciated. I thought I might post a summary of various things people pointed out that I found especially clarifying:
- While they're often used interchangeably in Python, an object's class and its type are not the same thing from a theoretical point of view. While an object's type refers to certain properties and behaviours an object has (its "interface"), an object's class refers to the actual implementation of of the object in code.
- When the Liskov Substitution Principle refers to "properties of objects", this should be taken to refer to objects that have already been constructed. Constructors and initialisers are not generally seen as part of the object's interface, as they are not methods of the object instance itself -- they are either class methods or static methods.
- It is often good practice to see the Liskov Substitution Principle as "necessary, but not sufficient" for implementing well-structured, object-oriented code. For example, it does not break the LSP to override object constructors with a different signature in a subclass, and that's often the most logical and sensible thing to do. However, you should also be wary of doing so, since it may confound the expectations of users/readers of your code. As well as this, in some situations, such as cooperative multiple inheritance chains, it is essential to maintain identical signatures in the constructors of subclasses.
- Python is unusual as a language in two respects. Firstly, due to its dynamic nature, alternative constructors for a class usually take the form of classmethods. Secondly, classmethods are callable from instances of the class, as well as the class itself. Neither of these mean that the LSP works differently with regards to python — alternative constructors for a class are still generally out of scope with regards to the recommendations laid out in the LSP, as they would be with any other language.
Explore related questions
See similar questions with these tags.