The Liskov Substitution Principle, and Python

Question 1

Background

I've taught myself Python over the past year-and-a-bit, and would consider myself an intermediate Python user at this point, but never studied computing at school/university. As such, my knowledge of theory is a little weak. Python is the only language I know.

I'm trying to wrap my head around the Liskov Substitution Principle so that I can write better, more object-oriented code.

Question 1: The Python data model

As we know, in Python 3, all custom classes implicitly inherit from object. But the docstring of object includes this line:

When called, it accepts no arguments and returns a new featureless instance that has no instance attributes and cannot be given any.

With this in mind, how can any python classes that accept a nonzero number of parameters for their constructors be said to comply with the LSP? If I have a class Foo, like so:

class Foo:
 def __init__(self, bar):
 self.bar = bar

then, surely this class definition (and all others like it) violates the contract specified by object? object and Foo cannot be used interchangeably, as object accepts exactly 0 parameters to its constructor, while Foo accepts exactly 1.

Question 2: collections.Counter

As Raymond Hettinger tells us, the Python dictionary is an excellent example of the open/closed principle. The dict class is "open for extension, closed for modification" -- if you wish to modify some of the prepackaged behaviours of a dict, you're advised to inherit from collections.abc.MutableMapping or collections.UserDict instead.

collections.Counter, however, is a direct subclass of dict. While it is mostly an extension of dict rather than a modification of behaviours already defined in dict, this isn't true for collection.Counter.fromkeys. With a standard dict, this classmethod is an alternative constructor for a dictionary, but the method is overridden in Counter so that it raises an exception. Here is the comment explaining why this is the case:

# There is no equivalent method for counters because the semantics
# would be ambiguous in cases such as Counter.fromkeys('aaabbc', v=2).
# Initializing counters to zero values isn't necessary because zero
# is already the default value for counter lookups. Initializing
# to one is easily accomplished with Counter(set(iterable)). For
# more exotic cases, create a dictionary first using a dictionary
# comprehension or dict.fromkeys().

The explanation makes sense -- but surely this violates the LSP? According to the Wikipedia article, the LSP states that "New exceptions cannot be thrown by the methods in the subtype, except if they are subtypes of exceptions thrown by the methods of the supertype." dict.fromkeys does not throw a NotImplementedError; Counter.fromkeys does.

Comments

I'm interested in whether these examples do, in fact, break the LSP. However, I'm also interested in why they break the LSP, if indeed they do. In what situations is enforcing the LSP necessary/advisable/useful? In what situations is worrying about the LSP more of a meaningless distraction? Etc.

Question 2

In your entire question, you only talk about classes, subclasses, and superclasses. However, the LSP only talks about types, subtypes, and subtypes. The first thing you need to understand is that a class is not a type and a type is not a class. Start from there.

Question 3

LSP is not about inheritance in itself, but about subtyping, and it holds with respect to the type expected by client code. When you write code that uses an object of some type, then that code relies on the abstract behavior of that type, and if you provide a subtype (or a different implementation) instead, that client code shouldn't break. Code that only uses the interface of the 'object' class doesn't do much - you typically don't write such code, so you're not substituting for the 'object' type.

Question 4

Second, objects are often constructed elsewhere and then passed in as class dependencies or function parameters, so the constructor is generally not a part of the abstract type (which in python may be implicit) that is to be substituted by a subtype. In python, the type of the dependency might not be explicitly defined in code (e.g. it could just exist conceptually, defined in the documentation). In a statically typed language it would be the type of the parameter - which might differ from the actual, concrete type of the object passed in (polymorphism).

Question 5

Thanks @FilipMilovanović, that's extremely clarifying.

Question 6

I think it's good to go back to the actual definition of the Liskov substitution principle:

Subtype Requirement: Let φ(x) be a property provable about objects x of type T. Then φ(y) should be true for objects y of type S where S is a subtype of T.

— Liskov substitution principle

Note, the principle only refers to properties of objects. And, implicitly, only public properties, because the Liskov substitution principle is interested in behavioral typing — typing according to observable properties.

With that in mind...

Answer 1

With this in mind, how can any python classes that accept a nonzero number of parameters for their constructors be said to comply with the LSP?

There are two parts to this. First, __init__ is not a constuctor. __new__ is the constructor — the method that actually constructs a new class instance from whole cloth. __init__ is just a private method that is called by __new__ on the new instance. And since it's private, it's not part of your type and not subject to the Liskov substitution principle.

What about __new__ then? All the parameters of __init__ are by default implicitly parameters of __new__, so am I just kicking the can down the road? __new__ is a static method, so it's not a property of an instance of object — it's part of object itself. So it's not subject to the Liskov substitution principle for instance of object either.

(Answer 1 Digression)

Here's where it gets kind of interesting (to me, at least). object is a class, but in Python classes are objects. They're instances of type, which is called their metaclass. So while __new__ is a static method, that means it's an instance method of object itself.¹ So, it is subject to the Liskov substitution principle for instance of type. And if we look at the definition of __new__ in type, we see:

__new__(*args, **kwargs) method of builtins.type instance
 Create and return a new object. See help(type) for accurate signature.

So type's __new__ accepts any and all arguments. Since many classes __init__ methods — and thus their __new__ methods — don't accept arbitrary arguments, those class objects are kind of in violation of the Liskov substitution method as instances of type. But... as you pointed out later in your question,

New exceptions cannot be thrown by the methods in the subtype, except if they are subtypes of exceptions thrown by the methods of the supertype.

— Liskov substitution principle

And that's exactly what __new__ does. If you call type.__new__ with different arguments than it expects, it throws a TypeError:

>>> type.__new__()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: type.__new__(): not enough arguments

Which means that all subtypes of type (i.e., all class objects) are free to throw their own TypeErrors in __new__, and callers are obligated to handle it. And that's exactly what object.__new__ does, but under different conditions:

>>> object.__new__(object, 'foo', (), {}) # This would be valid for type.__new__
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: object() takes no arguments

So, by weakening the base type's precondition (the argument list) and opting to instead validate "at runtime" by throwing an exception, the __new__ method is able to meet the Liskov substitution principle as a property of instances of type.²

This means we really can't just instantiate any arbitrary type in Python (either by calling __new__ or by just calling the type) without knowing the target type, unless we're prepared to catch and handle TypeErrors — and I think that tracks with most programmers' intuitions.

Answer 2

collections.Counter, however, is a direct subclass of dict. While it is mostly an extension of dict rather than a modification of behaviours already defined in dict, this isn't true for collection.Counter.fromkeys.

The answer here is similar to the previous answer. Since fromkeys is a class method, it's not really a property of instances of dict and not subject to the Liskov substitution principle.³

But then, what about if we look at dict as a class object? Do we run into the same complications we did with object.__new__? No, we don't, because Counter and dict don't have any sort of hierarchical relationship as class objects — they're both direct instances of type. We can't assume anything about their fromkeys methods because they didn't inherit them from type.

On the other hand, in Python a class does inherit all its parents' properties, which includes static and class methods like fromkeys. So Counter has to do something with fromkeys. It could attempt to hide the method, e.g., by replacing it with a descriptor that always throws an AttributeError, or even just by setting the property to None. The author of Counter chose to keep the method visible and to throw NotImplementedError instead, perhaps to signal that the method is intentionally unusable.⁴

In the end, the Liskov substitution principle is just an attempt to formalize something very intuitive: don't surprise the users of your code. In that sense, it may be seen a necessary condition for "good code" (whatever that is), but not a sufficient condition.

¹ This is a slight lie. __new__ is not an instance method of object because it doesn't take the receiver (cls, a.k.a. self) as a parameter. It's a static method, so it's essentially just a function property of object — but it doesn't make a difference to the discussion.

² I put "at runtime" in scare quotes because technically everything in Python is at runtime, but hopefully the distinction is clear.

³ It is possible to call static and class methods through instances, e.g.,

d1 = dict()
d2 = d1.fromkeys(itr)

This just dispatches the method to type(d1), which is dict. As far as I know, it's pretty well accepted that this is a quirk of Python, and we still think of those methods as properties of the type and not properties of the instance. But I suppose, in the strictest sense, that is a violation of the Liskov substitution principle.

⁴ According to the docs for NotImplementedError, this is exactly how that exception is not meant to be used.

Note: It should not be used to indicate that an operator or method is not meant to be supported at all — in that case either leave the operator/method undefined or, if a subclass, set it to None.

But, I suppose the standard library is allowed to contradict itself.

Question 7

Out of interest, when you say "the author of Counter opted to keep fromkeys and have it throw an exception rather than just remove the method" -- what would be the way to "remove a method" from a subclass in Python? I'm not sure I've ever seen it done, except maybe through monkey-patching a class after it's been defined, or by using a metaclass.

Question 8

When I wrote that, I was thinking you could, but now I'm not sure at all. There are tricks to hide a method inherited from a parent class, but not to remove. On reflection, I'm going to rework that part of my answer.

Question 9

Hopefully the new paragraph is more factual and helpful. (I'm realizing I had C++'s = delete feature in mind when I wrote that originally.)

Question 10

Not a comment on your answer — but I find the documentation for NotImplementedError a little bizarre, frankly. I'm sure I've seen it used like that many times in the standard library, and I'd much prefer a helpful NotImplementedError message rather than the inexplicable NoneType is not callable error message you'd get if you replaced the method with None. It was also endorsed the other day by Raymond Hettinger on twitter, who's been a core dev for around 20 years now I believe! (I think he's also the author of Counter.) twitter.com/raymondh/status/1430565266136698882?s=21

Question 11

Yeah, I agree with you there, and I'm pretty sure I've seen it used that way in third-party libraries as well. I do see the utility in having something like the docs describe — a dedicated exception to indicate that you forgot something, kind of like Rust's todo!() macro — but then it would be handy to also have a NeverGonnaHappenError.

Question 12

These don’t really violate LSP.

object and Foo cannot be used interchangeably, as object accepts exactly 0 parameters to its constructor, while Foo accepts exactly 1.

While that is true, they’re never really in a position to be used interchangeably. Constructors are weird beasts. Few languages let you take a type argument and then construct it without knowing what it is, trusting on LSP to ensure the behavior dispatches reliably. And while python can kinda do that, it’s far enough outside of normal usage that LSP guarantees shouldn’t be expected on constructors.

The same thing applies to static methods in general, as languages rarely perform dynamic dispatch on them. There’s simply no opportunity for users to be surprised when a derived type breaks expected behavior.

Question 13

Arguably classmethods are not part of the interface. You aren't calling fromKeys on an instance of dict (be it a Counter or not)

Question 14

That’s a good point. Python is one of my worst languages, so there may be nuance here I’m missing.

Question 15

Really interesting, thanks for the response. Some follow-up questions. I've definitely seen it argued lots of times on Stack Overflow (and elsewhere) that it's bad form for a subclass's __init__ method to take a different number of parameters to the superclass. Would you argue that's generally correct, and incorrect in this case because object is a special-case at the bottom of all inheritance chains? Or would you argue that it's just much more complex than that, i.e., whether it's okay to modify the signature of __init__ in a subclass will vary from case to case?

Question 16

@AlexWaygood that seems like a strange restriction. It should probably call super().__init__() with appropriate params, but it's fine to hardcode or calculate those as appropriate.

Question 17

You only need to care about __init__ matching up if you're doing some weird cooperative multiple inheritance thing (in which case, you don't actually know what class will receive the super().__init__() call, so you have to pass everything through transparently with **kwargs or possibly *args, and then the interfaces had darned well better be compatible with each other).

Question 18

Thanks, all, for the helpful responses -- much appreciated. I thought I might post a summary of various things people pointed out that I found especially clarifying:

While they're often used interchangeably in Python, an object's class and its type are not the same thing from a theoretical point of view. While an object's type refers to certain properties and behaviours an object has (its "interface"), an object's class refers to the actual implementation of of the object in code.
When the Liskov Substitution Principle refers to "properties of objects", this should be taken to refer to objects that have already been constructed. Constructors and initialisers are not generally seen as part of the object's interface, as they are not methods of the object instance itself -- they are either class methods or static methods.
It is often good practice to see the Liskov Substitution Principle as "necessary, but not sufficient" for implementing well-structured, object-oriented code. For example, it does not break the LSP to override object constructors with a different signature in a subclass, and that's often the most logical and sensible thing to do. However, you should also be wary of doing so, since it may confound the expectations of users/readers of your code. As well as this, in some situations, such as cooperative multiple inheritance chains, it is essential to maintain identical signatures in the constructors of subclasses.
Python is unusual as a language in two respects. Firstly, due to its dynamic nature, alternative constructors for a class usually take the form of classmethods. Secondly, classmethods are callable from instances of the class, as well as the class itself. Neither of these mean that the LSP works differently with regards to python — alternative constructors for a class are still generally out of scope with regards to the recommendations laid out in the LSP, as they would be with any other language.

Chris Bouchard Chris Bouchard 2251 silver badge6 bronze badges · Accepted Answer · 2021-09-01 02:34:25Z

I think it's good to go back to the actual definition of the Liskov substitution principle:

Subtype Requirement: Let φ(x) be a property provable about objects x of type T. Then φ(y) should be true for objects y of type S where S is a subtype of T.

— Liskov substitution principle

Note, the principle only refers to properties of objects. And, implicitly, only public properties, because the Liskov substitution principle is interested in behavioral typing — typing according to observable properties.

With that in mind...

Answer 1

With this in mind, how can any python classes that accept a nonzero number of parameters for their constructors be said to comply with the LSP?

There are two parts to this. First, __init__ is not a constuctor. __new__ is the constructor — the method that actually constructs a new class instance from whole cloth. __init__ is just a private method that is called by __new__ on the new instance. And since it's private, it's not part of your type and not subject to the Liskov substitution principle.

What about __new__ then? All the parameters of __init__ are by default implicitly parameters of __new__, so am I just kicking the can down the road? __new__ is a static method, so it's not a property of an instance of object — it's part of object itself. So it's not subject to the Liskov substitution principle for instance of object either.

(Answer 1 Digression)

Here's where it gets kind of interesting (to me, at least). object is a class, but in Python classes are objects. They're instances of type, which is called their metaclass. So while __new__ is a static method, that means it's an instance method of object itself.¹ So, it is subject to the Liskov substitution principle for instance of type. And if we look at the definition of __new__ in type, we see:

__new__(*args, **kwargs) method of builtins.type instance
 Create and return a new object. See help(type) for accurate signature.

So type's __new__ accepts any and all arguments. Since many classes __init__ methods — and thus their __new__ methods — don't accept arbitrary arguments, those class objects are kind of in violation of the Liskov substitution method as instances of type. But... as you pointed out later in your question,

New exceptions cannot be thrown by the methods in the subtype, except if they are subtypes of exceptions thrown by the methods of the supertype.

— Liskov substitution principle

And that's exactly what __new__ does. If you call type.__new__ with different arguments than it expects, it throws a TypeError:

>>> type.__new__()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: type.__new__(): not enough arguments

Which means that all subtypes of type (i.e., all class objects) are free to throw their own TypeErrors in __new__, and callers are obligated to handle it. And that's exactly what object.__new__ does, but under different conditions:

>>> object.__new__(object, 'foo', (), {}) # This would be valid for type.__new__
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: object() takes no arguments

So, by weakening the base type's precondition (the argument list) and opting to instead validate "at runtime" by throwing an exception, the __new__ method is able to meet the Liskov substitution principle as a property of instances of type.²

This means we really can't just instantiate any arbitrary type in Python (either by calling __new__ or by just calling the type) without knowing the target type, unless we're prepared to catch and handle TypeErrors — and I think that tracks with most programmers' intuitions.

Answer 2

collections.Counter, however, is a direct subclass of dict. While it is mostly an extension of dict rather than a modification of behaviours already defined in dict, this isn't true for collection.Counter.fromkeys.

The answer here is similar to the previous answer. Since fromkeys is a class method, it's not really a property of instances of dict and not subject to the Liskov substitution principle.³

But then, what about if we look at dict as a class object? Do we run into the same complications we did with object.__new__? No, we don't, because Counter and dict don't have any sort of hierarchical relationship as class objects — they're both direct instances of type. We can't assume anything about their fromkeys methods because they didn't inherit them from type.

On the other hand, in Python a class does inherit all its parents' properties, which includes static and class methods like fromkeys. So Counter has to do something with fromkeys. It could attempt to hide the method, e.g., by replacing it with a descriptor that always throws an AttributeError, or even just by setting the property to None. The author of Counter chose to keep the method visible and to throw NotImplementedError instead, perhaps to signal that the method is intentionally unusable.⁴

In the end, the Liskov substitution principle is just an attempt to formalize something very intuitive: don't surprise the users of your code. In that sense, it may be seen a necessary condition for "good code" (whatever that is), but not a sufficient condition.

¹ This is a slight lie. __new__ is not an instance method of object because it doesn't take the receiver (cls, a.k.a. self) as a parameter. It's a static method, so it's essentially just a function property of object — but it doesn't make a difference to the discussion.

² I put "at runtime" in scare quotes because technically everything in Python is at runtime, but hopefully the distinction is clear.

³ It is possible to call static and class methods through instances, e.g.,

d1 = dict()
d2 = d1.fromkeys(itr)

This just dispatches the method to type(d1), which is dict. As far as I know, it's pretty well accepted that this is a quirk of Python, and we still think of those methods as properties of the type and not properties of the instance. But I suppose, in the strictest sense, that is a violation of the Liskov substitution principle.

⁴ According to the docs for NotImplementedError, this is exactly how that exception is not meant to be used.

Note: It should not be used to indicate that an operator or method is not meant to be supported at all — in that case either leave the operator/method undefined or, if a subclass, set it to None.

But, I suppose the standard library is allowed to contradict itself.

Out of interest, when you say "the author of Counter opted to keep fromkeys and have it throw an exception rather than just remove the method" -- what would be the way to "remove a method" from a subclass in Python? I'm not sure I've ever seen it done, except maybe through monkey-patching a class after it's been defined, or by using a metaclass.
When I wrote that, I was thinking you could, but now I'm not sure at all. There are tricks to hide a method inherited from a parent class, but not to remove. On reflection, I'm going to rework that part of my answer.
Hopefully the new paragraph is more factual and helpful. (I'm realizing I had C++'s = delete feature in mind when I wrote that originally.)
Not a comment on your answer — but I find the documentation for NotImplementedError a little bizarre, frankly. I'm sure I've seen it used like that many times in the standard library, and I'd much prefer a helpful NotImplementedError message rather than the inexplicable NoneType is not callable error message you'd get if you replaced the method with None. It was also endorsed the other day by Raymond Hettinger on twitter, who's been a core dev for around 20 years now I believe! (I think he's also the author of Counter.) twitter.com/raymondh/status/1430565266136698882?s=21
Yeah, I agree with you there, and I'm pretty sure I've seen it used that way in third-party libraries as well. I do see the utility in having something like the docs describe — a dedicated exception to indicate that you forgot something, kind of like Rust's todo!() macro — but then it would be handy to also have a NeverGonnaHappenError.

Stack Exchange Network

The Liskov Substitution Principle, and Python

3 Answers 3

Answer 1

(Answer 1 Digression)

Answer 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

The Liskov Substitution Principle, and Python

3 Answers 3

Answer 1

(Answer 1 Digression)

Answer 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions