Python 3.10+ deconstructing an over engineered solution to better understand how metaclasses work with properties, static methods, and classmethods

Question 1

TL;DR

This question examines an over-engineered example of python metaclasses and dataclasses to create a LiteralEnum (for validating a stringly-typed keyword argument) like LinkageMethod and a KeywordsArgumentBaseClass for making wrappers around SciPy methods like SciPyLinkage. The author would like to know how to best distinguish when something should be a property, staticmethod, classmethod, or instance method.

As to why someone would do this?

to override default keyword arguments of scipy methods
to expose keyword arguments that might be hidden under **kwargs and which get passed to another method for better developer experience.
to modify default behavior of scipy methods e.g. add some optional preprocessing / post processing and be able to distinguish which parameters belong to the method and which to the custom handling.

Disclaimer

Given the above explanation there is a lot of code and the M*.W.E. is not so minimal (as complexity is one of the key reasons to avoid metaclass usage especially in python which favors simplicity and readability)

Question(s)

Newbie question

I am new to using metaclasses. Are the LiteralEnum classes at least "pythonic"?

`staticmethod` vs `classmethod` vs `property` at the metaclass / class level?

The KeywordArgumentsMeta and KeywordArgumentsMixin classes setup some useful attributes for retrieving a dictionary of keyword arguments. With KeywordArgumentsBaseClass combining the KeywordArgumentsMixin and ClassMethodSignatureMixin.

This is where I am conflicted:

@dataclass
class BaseExample(KeywordArgumentsBaseClass):
 _: KW_ONLY
 strvar: str = 'default'
 intvar: int = 2
@dataclass
class ChildExample(BaseExample):
 _: KW_ONLY
 thirdvar: str = 'three'
 fourth: int = 4
ChildExample.keywords
> ['thirdvar', 'fourth']
ChildExample.ikeywords
> ['strvar', 'intvar']
ChildExample.akeywords
> ['strvar', 'intvar', 'thirdvar', 'fourth']
ChildExample.defaults
> {'thirdvar': 'three', 'fourth': 4}
...
ChildExample().kwargs
> {'thirdvar': 'three', 'fourth': 4}
...
ChildExample().params(**{'thirdvar': 'new', 'banana': 3})
> {'thirdvar': 'new', 'fourth': 4}

I am conflicted because I want to make a wrapper for SciPy Methods


@dataclass
class SciPyMethod(KeywordArgumentsBaseClass): 
 _: KW_ONLY
 @classmethod
 def get_method(cls):
 raise NotImplementedError
 
 @classmethod
 def call_scipy(cls, **kws):
 inst = cls()
 
 method = cls.get_method()
 params = inst.prepare_params(func = method, scope = locals(), **kws) 
 result = method(**cls.kwargs)
 raise NotImplementedError
 def call_scipy(self, **kwargs):
 cls = type(self)
 method = type(self).get_method()
 params = self.prepare_params(func = method, scope = locals(), **kwargs)
 print(params)
 raise NotImplementedError
 result = method(**cls.kwargs)
 return result
 def __call__(self, x: NPArray, **kwargs) -> NPArray:
 method = self.get_method()

but I need both classmethods and instance methods for this to work.

Since there are classmethods for getting default params, instance methods for getting current params, and the prepare_params methods for getting params for a function signature how can I make call_scipy work with both as classmethod and instance method?

How could this be simplified / make more pythonic?

Usefulness of `ClassMethodSignatureMixin`

While ClassMethodSignaturePriority seems useful at first glace, I am not actually sure if it is useful at all consider:

class Example(ClassMethodSignatureMixin):
 _: KW_ONLY
 test_var: str = 'default'
 def foo(self, test_var: Optional[str] = None, **kwargs):
 params = self.prepare_params(func=self.foo, scope=locals(), **kwargs)
 print(params)
 return params

The prepare_params method, without knowing the function signature will can handle explictly named keywords in the func which might be defined or passed in via **kwargs.

However, test_var must either be defined in the class, passed in as a (positional) keyword argument or passed in via **kwargs. Python will naturally prevent Example().foo(test_var='fine', **{'test_var': 'causes error'}).

The prepare_params method on the other hand is useful as it filters keyword arguments for the function signature only, using the local scope which helps make sure that in the case of foo method, the value of test_var gets put into params.

Or to restate more cleanly. Given a function with an unknown number of keyword arguments (like test_var in foo), prepare_params uses locals() and **kwargs to make sure there is a single dictionary to check for the values of the keyword arguments.

Code

Imports

import os, inspect
import numpy as np, pandas as pd, scipy as sp
from dataclasses import dataclass, KW_ONLY
from enum import Enum, StrEnum, EnumMeta, auto
from typing import Optional, Callable, List, Tuple, Any, Dict, Union, Literal

LiteralEnum

MetaClass

class LiteralEnumMeta(EnumMeta):
 '''LiteralEnumMeta
 See Also:
 --------
 - https://stackoverflow.com/questions/43730305/when-should-i-subclass-enummeta-instead-of-enum
 - https://peps.python.org/pep-3115/
 - https://blog.ionelmc.ro/2015/02/09/understanding-python-metaclasses/#class-attribute-lookup
 '''
 @classmethod
 def __prepare__(metacls, name, bases, **kwargs):
 enum_dict = super().__prepare__(name, bases, **kwargs)
 #print('PREPARE: <enum_dict> = \t', enum_dict)
 # NOTE: this will through an error since we are using StrEnum
 # enum_dict['_default'] = None
 return enum_dict
 def __init__(cls, clsname, bases, clsdict, **kwargs):
 super().__init__(clsname, bases, clsdict, **kwargs) 
 # print('INIT: <clsdict> = \t', clsname, clsdict) 
 def __new__(
 metacls, cls, bases, clsdict, *, 
 default: Optional[str] = None, elements: Optional[List[str]] = None
 ):
 # print('NEW: <clsdict> = \t', cls, clsdict)
 if elements is not None:
 for element in elements:
 clsdict[element.upper()] = auto()
 new_cls = super().__new__(metacls, cls, bases, clsdict)
 # NOTE: this will result in TypeError: cannot extend 
 if default: 
 setattr(new_cls, '_default', default)
 
 return new_cls 
 @property
 def members(cls):
 # NOTE: could also use cls._member_names_
 return [member.name for member in cls]
 @property
 def values(cls):
 return [member.value for member in cls]
 
 @property
 def items(cls):
 return list(zip(cls.members, cls.values))

LiteralEnum

class LiteralEnum(StrEnum, metaclass=LiteralEnumMeta):
 @classmethod
 def _missing_(cls, value):
 for member in cls:
 if member.value.lower() == value.lower():
 return member
 default = getattr(cls, cls._default, None)
 return default

Decorators

def enum_default(default: str = ''):
 def wrapper(cls):
 cls._default = default
 return cls
 return wrapper
def enum_set_attr(name: str = 'attr', attr: str = 'data'):
 def wrapper(cls):
 setattr(cls, f'_{name}', attr)
 return cls
 return wrapper
def set_method(method):
 def decorator(cls):
 cls.method = method
 return cls
 return decorator

SciPy LiteralEnum Examples

Linkage

@enum_default('SINGLE')
class LinkageMethod(LiteralEnum):
 '''
 See Also
 -------- 
 scipy.cluster.hierarchy.linkage : Performs hierarchical/agglomerative clustering on the condensed distance matrix y.
 https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html
 '''
 SINGLE = auto()
 COMPLETE = auto()
 AVERAGE = auto()
 WEIGHTED = auto()
 CENTROID = auto()
 MEDIAN = auto()
 WARD = auto()

PDistMetric

@enum_default('EUCLIDEAN')
class PDistMetric(LiteralEnum):
 '''
 See Also
 -------- 
 scipy.spatial.distance.pdist : Compute the pairwise distances between observations in n-dimensional space.
 https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html#scipy.spatial.distance.pdist
 '''
 BRAYCURTIS = auto()
 CANBERRA = auto()
 CHEBYSHEV = auto()
 CITYBLOCK = auto()
 CORRELATION = auto()
 COSINE = auto()
 DICE = auto()
 EUCLIDEAN = auto()
 HAMMING = auto()
 JACCARD = auto()
 JENSENSHANNON = auto()
 KULCZYNSKI1 = auto()
 MAHALANOBIS = auto()
 MATCHING = auto()
 MINKOWSKI = auto()
 ROGERSTANIMOTO = auto()
 RUSSELLRAO = auto()
 SEUCLIDEAN = auto()
 SOKALMICHENER = auto()
 SOKALSNEATH = auto()
 SQEUCLIDEAN = auto()
 YULE = auto()

ScoreMethod

@enum_default('ZSCORE')
class ScoreMethod(LiteralEnum):
 '''
 See Also
 --------
 scipy.stats.zscore : Compute the z-score. 
 scipy.stats.gzscore : Compute the geometric standard score.
 '''
 ZSCORE = auto()
 GZSCORE = auto()

ClassMethodSignaturePriority

@enum_default('OBJ')
@enum_set_attr('attr', 'data')
class ClassMethodSignaturePriority(LiteralEnum): 
 OBJ = auto()
 ARG = auto()
 KWS = auto()
 def get(self, obj: object, attr: Optional[str] = None, arg: Optional[Any] = None, **kws) -> Union[NPArray, DataFrame, Any]:
 match self:
 # try and get `attr` from `obj` defaulting back to `arg`
 case ClassMethodSignaturePriority.OBJ:
 val = getattr(obj, attr, arg)
 if val is None:
 return ClassMethodSignaturePriority('ARG').get(obj, attr, arg, **kws)
 
 # use `arg` as is unless it is None, then try and get `attr` from `obj`
 case ClassMethodSignaturePriority.ARG:
 val = arg
 if val is None:
 return ClassMethodSignaturePriority('KWS').get(obj, attr, arg, **kws) 
 
 # use `kws` assuming `attr` is in `kwargs` falling back to arg then try and get `attr` from `obj`
 case ClassMethodSignaturePriority.KWS:
 val = kws.get(attr, arg) 
 
 case _:
 pass
 if val is None:
 val = getattr(obj, attr, arg)
 if isinstance(val, (list, np.ndarray, )):
 val = np.asanyarray(val)
 return val
 @classmethod
 def prioritize(cls, obj: object, attr: str, arg: Optional[Any] = None, priority: Literal['obj', 'arg', 'kws'] = 'obj', **kws) -> Union[NPArray, DataFrame, Any]:
 return cls(priority).get(obj, attr, arg, **kws)
 @classmethod
 def _pobj(cls, obj: object, attr: str, arg: Optional[Any] = None, **kws) -> Union[NPArray, DataFrame, Any]:
 return cls.prioritize(obj, attr, arg, 'obj', **kws)
 @classmethod
 def _pargs(cls, obj: object, attr: str, arg: Optional[Any] = None, **kws) -> Union[NPArray, DataFrame, Any]:
 return cls.prioritize(obj, attr, arg, 'args', **kws)
 @classmethod
 def _pkws(cls, obj: object, attr: str, arg: Optional[Any] = None, **kws) -> Union[NPArray, DataFrame, Any]:
 return cls.prioritize(obj, attr, arg, 'kws', **kws)

Mixin

@dataclass
class ClassMethodSignatureMixin:
 def get_val(self, attr: str, arg: Optional[Any] = None, prioritize: Union[Literal['obj', 'arg', 'kws'], ClassMethodSignaturePriority] = 'arg', **kws): 
 # by default we will prioritize `arg` over `self` as `arg` might overwrite `self`'s attribute
 # arg --(fallbacks to)--> kws --(fallbacks to)--> self
 priority = ClassMethodSignaturePriority(prioritize)
 return priority.get(self, attr=attr, arg=arg, **kws)
 def _prioritize_kws(self, attr: str, arg: Optional[Any] = None, **kws):
 return self.get_arg(attr, arg, prioritize='kws', **kws)
 def _prioritize_arg(self, attr: str, arg: Optional[Any] = None, **kws):
 return self.get_arg(attr, arg, prioritize='arg', **kws)
 def _prioritize_obj(self, attr: str, arg: Optional[Any] = None, **kws):
 return self.get_arg(attr, arg, prioritize='obj', **kws)
 
 def get_arg(self, attr: str, func: Callable, scope: Dict[str, Any]):
 args = inspect.getfullargspec(func).args
 if attr in args and attr in scope:
 return scope[attr]
 return None
 
 def get_tuple(self, attr: str,func: Callable, scope: Dict[str, Any], **kws) -> Tuple[Any, Any, Any]:
 obj = getattr(self, attr, None)
 arg = self.get_arg(attr, func, scope)
 kwa = kws.get(attr, None)
 return obj, arg, kwa
 
 def update_params(self, **kws):
 params = self.aparams()
 for k, v in self.kwargs:
 v = self.get_val(attr=k, prioritize='kws', **kws)
 params[k] = v
 return params

KeywordArguments

KeywordArgumentsMeta

class KeywordArgumentsMeta(type): 
 @staticmethod
 def get_annots_kws(cls) -> list:
 '''Get annotated keyword only argument names'''
 annots = list(cls.__annotations__.keys())
 if '_' not in annots:
 return []
 return annots[annots.index('_') + 1:] 
 @staticmethod
 def get_cls_kws(cls) -> list:
 '''
 NOTES
 -----
 - if using inheritance this will get all keyword only arguments
 '''
 return inspect.getfullargspec(cls.__init__).kwonlyargs
 @staticmethod
 def attr_dict(obj: object, attrs: list) -> dict:
 return dict((k, getattr(obj, k, None)) for k in attrs)
 @staticmethod
 def inst_dict(inst: object, attr: str = 'defaults'): 
 attrs = getattr(type(inst), attr).items()
 return dict((k, getattr(inst, k, v)) for k, v in attrs)
 
 @property
 def keywords(cls) -> list: 
 '''Get current keyword only argument names'''
 return cls.get_annots_kws(cls)
 @property
 def ikeywords(cls) -> list:
 '''Get inherited keyword only argument names'''
 ignore = cls.keywords
 result = list()
 is_new = lambda kw: kw not in result and kw not in ignore
 for c in inspect.getmro(cls):
 if c is not object:
 new_kws = cls.get_annots_kws(c)
 result.extend(list(filter(is_new, new_kws)))
 return result
 
 @property
 def akeywords(cls) -> list:
 '''Get all keyword only argument names'''
 result = list()
 is_new = lambda kw: kw not in result
 for c in inspect.getmro(cls):
 if c is not object:
 new_kws = cls.get_annots_kws(c)
 result.extend(list(filter(is_new, new_kws)))
 return result
 
 @property
 def defaults(cls) -> dict:
 '''Get default keyword arguments only values'''
 instance = cls()
 return cls.attr_dict(instance, cls.keywords) 
 @property
 def idefaults(cls) -> dict:
 '''Get inherited default keyword arguments only values'''
 instance = cls()
 return cls.attr_dict(instance, cls.ikeywords)
 @property
 def adefaults(cls) -> dict:
 '''Get all default keyword arguments only values'''
 instance = cls()
 return cls.attr_dict(instance, cls.akeywords)

KeywordArgumentsMixin

@dataclass
class KeywordArgumentsMixin(metaclass=KeywordArgumentsMeta):
 _: KW_ONLY
 @property
 def kwargs(self) -> dict:
 '''Get instance specific default keyword arguments only values'''
 return type(self).inst_dict(self, attr='defaults')
 
 @property
 def ikwargs(self) -> dict:
 '''Get instance inherited default keyword arguments only values'''
 return type(self).inst_dict(self, attr='idefaults')
 @property
 def akwargs(self) -> dict:
 '''Get instance all default keyword arguments only values'''
 return type(self).inst_dict(self, attr='adefaults')
 def _merge_kws_to_dict(self, params: dict, **kwargs) -> dict:
 '''Only overwrite values in params with kwargs if key is in params'''
 values = params.copy()
 values.update(dict((k, v) for k, v in kwargs.items() if k in values))
 return values
 def params(self, **kwargs) -> dict:
 '''Get instance default keyword arguments only values but update with kwargs'''
 return self._merge_kws_to_dict(self.kwargs, **kwargs)
 def iparams(self, **kwargs) -> dict:
 '''Get instance inherited keyword arguments only values but update with kwargs'''
 return self._merge_kws_to_dict(self.ikwargs, **kwargs)
 
 def aparams(self, **kwargs) -> dict:
 '''Get instance all default keyword arguments only values but update with kwargs'''
 return self._merge_kws_to_dict(self.akwargs, **kwargs)

KeywordArgumentsBaseClass

@dataclass
class KeywordArgumentsBaseClass(KeywordArgumentsMixin, ClassMethodSignatureMixin):
 def prepare_params(self, func: Optional[Callable] = None, scope: Optional[Dict[str, Any]] = None, **kws) -> dict:
 params = self.aparams()
 for k, v in self.akwargs.items():
 
 arg = None
 if func and scope:
 arg = self.get_arg(attr=k, func=func, scope=scope)
 
 v = self.get_val(attr=k, arg=arg, prioritize='arg', **kws)
 params[k] = v
 return params

SciPyLinkage

#| export
@dataclass
class SciPyLinkage(KeywordArgumentsBaseClass):
 _: KW_ONLY
 
 method: LinkageMethod = LinkageMethod.SINGLE
 metric: PDistMetric = PDistMetric.CORRELATION
 optimal_ordering: bool = True
 def __post_init__(self): 
 self.method = LinkageMethod(self.method)
 self.metric = PDistMetric(self.metric)
 def __call__(self, x: NPArray, **kwargs) -> NPArray:
 l_func = sp.cluster.hierarchy.linkage
 params = self.prepare_params(func=l_func, scope=locals(), **kwargs) 
 print('LINKAGE', params)
 # linkage = l_func(x, **params)
 # return linkage

Question 2

@classmethod and static

would like to know how to best distinguish when something should be a property, staticmethod, classmethod, or instance method.

Ok, sure, I'll bite.

These two are pretty simple:

@staticmethod def _helper1():
@classmethod def _helper2(cls):

We write a def foo(self): instance method when the method will need to refer to self.x and other object attributes. This is very convenient; we might see half a dozen input parameters (6 attributes) passed in by that one little self reference. The downside is it induces coupling; the Gentle Reader now has more code to grok when trying to understand how class invariants are preserved. These two @decorators let us help the Gentle Reader out by showing that there's nothing up our sleeve, no self.y references lurking within, just the gozintas mentioned in the signature and whatever gozouta is offered by a return. The rules for applying them are very simple.

If foo makes no self references, then convert to static.

If we can't quite do that because there's a reference to a class attribute, like self.MAX_LOOPS, then consider passing in cls instead of self, via the @classmethod decorator.

properties

Sometimes we export public attributes, such a Point p that offers p.x and p.y. It's very simple; we don't need much ceremony.

Sometimes we reserve them as private attributes, such as Vehicle v._mass if we want to ensure that caller won't accidentally set it to a negative value. Then we'd offer read access through @property def mass(self): and optionally we could also offer write access through @mass.setter def mass(self, value):

Common reasons for using such properties:

Defend a class invariant, such as doing a cache invalidate upon update or forcing a value to always be non-negative.
Log that a value was read or written.
Author changed their mind, and wants existing calling code's variable references to transparently become method calls.

Returning to the mass example, we might want to record its canonical kg value in just one place, while also offering computed mass_in_g and mass_in_lb properties. ( mass_in_slug ? ) Or perhaps we start recording a scale_measurement reading and a tare_weight, so when caller asks for v.mass it transparently turns into a subtraction of tare weight behind the scenes.

inspect

expose keyword arguments that might be hidden under **kwargs ...

Friendly reminder: the inspect module is terrific for introspecting the call stack and viewing signatures of methods we might find on it.

You might find it helpful to do things beyond the very nice .ikeywords / .akeywords queries you're doing.

Notice that a keyword recognized by some method may potentially never have appeared in any method's signature, as python code is free to parse the kwargs dict any way it wishes. It's just a convention that authors usually put them in signatures.

raise objects

 raise NotImplementedError

Python has let you raise a type for a very long time. But please follow the better practice of raising an object:

 raise NotImplementedError()

For one thing, it admits of decorating the error with a diagnostic message.

but I need both classmethods and instance methods for this to work.

I confess I don't understand what motivated inst = cls(). I would think that just an instance method, with no classmethod, should suffice, but I'm sure you've done a bunch of debugging and there's some good reason. Recommend you write the reason down, as a # comment or """docstring""".

Recommend you prefer the conventional kwargs spelling over kws.

 params = self.prepare_params(...)
 print(params)
 raise NotImplementedError
 result = method(**cls.kwargs)
 return result

Recommend you delete some lines of source, leftover from a debugging session, which will clearly never execute.

Since there are classmethods for getting ...

OIC. Sorry, don't know. Recommend you express that in terms of unit tests, some of which fail to produce desired result, so we can better see the details.

isort

import numpy as np, pandas as pd, scipy as sp

Hmmm, interesting, I didn't even realize that was syntactically valid.

Please let isort sort that out for you.

use conventional names

 def __init__(cls, clsname, bases, clsdict, **kwargs):

I imagine you know what you're doing there, and decided not to name it self for a reason. It's worth putting an explanation into the code, as it isn't obvious to me.

Throwing Any into a Union[...] annotation probably has documentation value to highlight types such as NPArray. But consider adding more # comments to such annotations.

follow the contract

 def __call__(self, x: NPArray, **kwargs) -> NPArray:
 l_func = sp.cluster.hierarchy.linkage
 params = self.prepare_params(func=l_func, scope=locals(), **kwargs) 
 print('LINKAGE', params)
 # linkage = l_func(x, **params)
 # return linkage

Ummm, you promised to return an array, but you just fell off the end with implicit return None ?!?

logging

There's a bunch of leftover debug print statements scattered throughout, some of them commented.

Do yourself a favor. Use the logging module already! Rather than a comment, log at DEBUG or INFO, and reveal such messages when needed during a debug session.

This is an interesting piece of work.

I would need better motivating example(s) in the documentation, and a test suite, to be able to maintain such code.

Question 3

Thank you for taking the time to review and write this up. I am working on improving this example just for my own learning. As for the motivation, libraries like scipy has a lot of stringly typed keyword arguments like method and metric the only way to get all options, to my knowledge, is to read the docstring or import something like _METRICS. As for the kw class that is to change the defaults / add some extra processing before calling a method. The updated defaults could be done with a simple dict, tho..

Question 4

I'm going to ignore basically everything that was written here, because - yes, you identified it - it's over-engineered. Scipy applications are often performance-sensitive, and even if they weren't, this level of overhead complexity is simply not worth paying.

There is a way to improve the developer experience, and that's stubs and typing. This is (giant red exclamation marks) non-runtime, and added for the following purposes:

Static analysis, usually via mypy
IDE gadgets like drop-downs

From this perspective, your current code is both way too much and not at all enough. A quite educational example is the single function pdist. The scipy documentation is lacking, and a proper treatment of this function requires that you read the source code (scipy/spatial/distance.py).

pyi stub content could look like:

from typing import Protocol, Any, Literal, Optional, overload
from numpy._typing import ArrayLike, _NumberLike_co
class TwoArityFunction(Protocol):
 def __call__(self, u: _NumberLike_co, v: _NumberLike_co, **kwargs: Any) -> _NumberLike_co:
 ...
@overload
def pdist(
 X: ArrayLike,
 metric: TwoArityFunction,
 *,
 out: Optional[np.ndarray],
 **kwargs: Any,
) -> np.ndarray:
 ...
@overload
def pdist(
 X: ArrayLike,
 metric: Literal[
 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice',
 'hamming', 'jaccard', 'jensenshannon', 'kulczynski1', 'matching', 'rogerstanimoto',
 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule',
 ],
 *,
 out: Optional[np.ndarray],
 **kwargs: Any,
) -> np.ndarray:
 ...
@overload
def pdist(
 X: ArrayLike,
 metric: Literal['euclidean'],
 *,
 out: Optional[np.ndarray] = None,
 V: Optional[np.ndarray] = None, # variance vector
) -> np.ndarray:
 ...
@overload
def pdist(
 X: ArrayLike,
 metric: Literal['minkowski'],
 *,
 out: Optional[np.ndarray] = None,
 p: _NumberLike_co = 2, # p-norm (weighted and unweighted)
 w: Optional[np.ndarray] = None, # weight vector
) -> np.ndarray:
 ...
def pdist(
 X: ArrayLike,
 metric: str | TwoArityFunction,
 *,
 out: Optional[np.ndarray] = None,
 **kwargs: Any,
) -> np.ndarray:
 ...

This is definitely not complete (there are other weighted functions) and probably not entirely accurate, either - but it does work with mypy; for example:

from scipy.spatial.distance import pdist
v = np.array((3,))
pdist(v, 'euclidean', V=v)

passes, and

from scipy.spatial.distance import pdist
v = np.array((3,))
pdist(v, 'euclidean', w=v)

fails with

286464.py:69: error: No overload variant of "pdist" matches argument types "ndarray[Any, dtype[Any]]", "str", "ndarray[Any, dtype[Any]]" [call-overload]
286464.py:69: note: Possible overload variants:
286464.py:69: note: def pdist(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], metric: TwoArityFunction, *, out: ndarray[Any, Any] | None, **kwargs: Any) -> ndarray[Any, Any]
286464.py:69: note: def pdist(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], metric: Literal['braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'jensenshannon', 'kulczynski1', 'matching', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'], *, out: ndarray[Any, Any] | None, **kwargs: Any) -> ndarray[Any, Any]
286464.py:69: note: def pdist(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], metric: Literal['euclidean'], *, out: ndarray[Any, Any] | None = ..., V: ndarray[Any, Any] | None = ...) -> ndarray[Any, Any]
286464.py:69: note: def pdist(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], metric: Literal['minkowski'], *, out: ndarray[Any, Any] | None = ..., p: int | float | complex | number[Any] | bool_ = ..., w: ndarray[Any, Any] | None = ...) -> ndarray[Any, Any]
Found 3 errors in 1 file (checked 1 source file)

There are large stub libraries dedicated to this sort of hinting; see e.g. pandas-stubs. I didn't find any for scipy but I didn't look very hard.

Question 5

I greatly appreciate your feedback. Where should one look into / learn more about stubs? To clarify, in your opinion even the LiteralEnum instances like PDistMetric are too much? As I don't know another way to reliably check all valid metrics / methods without checking docstrings. Also thanks for showing me the imports for pdist. I'm now learning about Protocols

Question 6

Very good; to start: stub files; typing; PEP 484.

J_H J_H 41.4k3 gold badges38 silver badges157 bronze badges · Answer 1 · 2023-08-10 18:34:36Z

@classmethod and static

would like to know how to best distinguish when something should be a property, staticmethod, classmethod, or instance method.

Ok, sure, I'll bite.

These two are pretty simple:

@staticmethod def _helper1():
@classmethod def _helper2(cls):

We write a def foo(self): instance method when the method will need to refer to self.x and other object attributes. This is very convenient; we might see half a dozen input parameters (6 attributes) passed in by that one little self reference. The downside is it induces coupling; the Gentle Reader now has more code to grok when trying to understand how class invariants are preserved. These two @decorators let us help the Gentle Reader out by showing that there's nothing up our sleeve, no self.y references lurking within, just the gozintas mentioned in the signature and whatever gozouta is offered by a return. The rules for applying them are very simple.

If foo makes no self references, then convert to static.

If we can't quite do that because there's a reference to a class attribute, like self.MAX_LOOPS, then consider passing in cls instead of self, via the @classmethod decorator.

properties

Sometimes we export public attributes, such a Point p that offers p.x and p.y. It's very simple; we don't need much ceremony.

Sometimes we reserve them as private attributes, such as Vehicle v._mass if we want to ensure that caller won't accidentally set it to a negative value. Then we'd offer read access through @property def mass(self): and optionally we could also offer write access through @mass.setter def mass(self, value):

Common reasons for using such properties:

Defend a class invariant, such as doing a cache invalidate upon update or forcing a value to always be non-negative.
Log that a value was read or written.
Author changed their mind, and wants existing calling code's variable references to transparently become method calls.

Returning to the mass example, we might want to record its canonical kg value in just one place, while also offering computed mass_in_g and mass_in_lb properties. ( mass_in_slug ? ) Or perhaps we start recording a scale_measurement reading and a tare_weight, so when caller asks for v.mass it transparently turns into a subtraction of tare weight behind the scenes.

inspect

expose keyword arguments that might be hidden under **kwargs ...

Friendly reminder: the inspect module is terrific for introspecting the call stack and viewing signatures of methods we might find on it.

You might find it helpful to do things beyond the very nice .ikeywords / .akeywords queries you're doing.

Notice that a keyword recognized by some method may potentially never have appeared in any method's signature, as python code is free to parse the kwargs dict any way it wishes. It's just a convention that authors usually put them in signatures.

raise objects

 raise NotImplementedError

Python has let you raise a type for a very long time. But please follow the better practice of raising an object:

 raise NotImplementedError()

For one thing, it admits of decorating the error with a diagnostic message.

but I need both classmethods and instance methods for this to work.

I confess I don't understand what motivated inst = cls(). I would think that just an instance method, with no classmethod, should suffice, but I'm sure you've done a bunch of debugging and there's some good reason. Recommend you write the reason down, as a # comment or """docstring""".

Recommend you prefer the conventional kwargs spelling over kws.

 params = self.prepare_params(...)
 print(params)
 raise NotImplementedError
 result = method(**cls.kwargs)
 return result

Recommend you delete some lines of source, leftover from a debugging session, which will clearly never execute.

Since there are classmethods for getting ...

OIC. Sorry, don't know. Recommend you express that in terms of unit tests, some of which fail to produce desired result, so we can better see the details.

isort

import numpy as np, pandas as pd, scipy as sp

Hmmm, interesting, I didn't even realize that was syntactically valid.

Please let isort sort that out for you.

use conventional names

 def __init__(cls, clsname, bases, clsdict, **kwargs):

I imagine you know what you're doing there, and decided not to name it self for a reason. It's worth putting an explanation into the code, as it isn't obvious to me.

Throwing Any into a Union[...] annotation probably has documentation value to highlight types such as NPArray. But consider adding more # comments to such annotations.

follow the contract

 def __call__(self, x: NPArray, **kwargs) -> NPArray:
 l_func = sp.cluster.hierarchy.linkage
 params = self.prepare_params(func=l_func, scope=locals(), **kwargs) 
 print('LINKAGE', params)
 # linkage = l_func(x, **params)
 # return linkage

Ummm, you promised to return an array, but you just fell off the end with implicit return None ?!?

logging

There's a bunch of leftover debug print statements scattered throughout, some of them commented.

Do yourself a favor. Use the logging module already! Rather than a comment, log at DEBUG or INFO, and reveal such messages when needed during a debug session.

This is an interesting piece of work.

I would need better motivating example(s) in the documentation, and a test suite, to be able to maintain such code.

Thank you for taking the time to review and write this up. I am working on improving this example just for my own learning. As for the motivation, libraries like scipy has a lot of stringly typed keyword arguments like method and metric the only way to get all options, to my knowledge, is to read the docstring or import something like _METRICS. As for the kw class that is to change the defaults / add some extra processing before calling a method. The updated defaults could be done with a simple dict, tho..

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Answer 2 · 2023-08-11 03:24:11Z

I'm going to ignore basically everything that was written here, because - yes, you identified it - it's over-engineered. Scipy applications are often performance-sensitive, and even if they weren't, this level of overhead complexity is simply not worth paying.

There is a way to improve the developer experience, and that's stubs and typing. This is (giant red exclamation marks) non-runtime, and added for the following purposes:

Static analysis, usually via mypy
IDE gadgets like drop-downs

From this perspective, your current code is both way too much and not at all enough. A quite educational example is the single function pdist. The scipy documentation is lacking, and a proper treatment of this function requires that you read the source code (scipy/spatial/distance.py).

pyi stub content could look like:

from typing import Protocol, Any, Literal, Optional, overload
from numpy._typing import ArrayLike, _NumberLike_co
class TwoArityFunction(Protocol):
 def __call__(self, u: _NumberLike_co, v: _NumberLike_co, **kwargs: Any) -> _NumberLike_co:
 ...
@overload
def pdist(
 X: ArrayLike,
 metric: TwoArityFunction,
 *,
 out: Optional[np.ndarray],
 **kwargs: Any,
) -> np.ndarray:
 ...
@overload
def pdist(
 X: ArrayLike,
 metric: Literal[
 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice',
 'hamming', 'jaccard', 'jensenshannon', 'kulczynski1', 'matching', 'rogerstanimoto',
 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule',
 ],
 *,
 out: Optional[np.ndarray],
 **kwargs: Any,
) -> np.ndarray:
 ...
@overload
def pdist(
 X: ArrayLike,
 metric: Literal['euclidean'],
 *,
 out: Optional[np.ndarray] = None,
 V: Optional[np.ndarray] = None, # variance vector
) -> np.ndarray:
 ...
@overload
def pdist(
 X: ArrayLike,
 metric: Literal['minkowski'],
 *,
 out: Optional[np.ndarray] = None,
 p: _NumberLike_co = 2, # p-norm (weighted and unweighted)
 w: Optional[np.ndarray] = None, # weight vector
) -> np.ndarray:
 ...
def pdist(
 X: ArrayLike,
 metric: str | TwoArityFunction,
 *,
 out: Optional[np.ndarray] = None,
 **kwargs: Any,
) -> np.ndarray:
 ...

This is definitely not complete (there are other weighted functions) and probably not entirely accurate, either - but it does work with mypy; for example:

from scipy.spatial.distance import pdist
v = np.array((3,))
pdist(v, 'euclidean', V=v)

passes, and

from scipy.spatial.distance import pdist
v = np.array((3,))
pdist(v, 'euclidean', w=v)

fails with

286464.py:69: error: No overload variant of "pdist" matches argument types "ndarray[Any, dtype[Any]]", "str", "ndarray[Any, dtype[Any]]" [call-overload]
286464.py:69: note: Possible overload variants:
286464.py:69: note: def pdist(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], metric: TwoArityFunction, *, out: ndarray[Any, Any] | None, **kwargs: Any) -> ndarray[Any, Any]
286464.py:69: note: def pdist(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], metric: Literal['braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'jensenshannon', 'kulczynski1', 'matching', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'], *, out: ndarray[Any, Any] | None, **kwargs: Any) -> ndarray[Any, Any]
286464.py:69: note: def pdist(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], metric: Literal['euclidean'], *, out: ndarray[Any, Any] | None = ..., V: ndarray[Any, Any] | None = ...) -> ndarray[Any, Any]
286464.py:69: note: def pdist(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], metric: Literal['minkowski'], *, out: ndarray[Any, Any] | None = ..., p: int | float | complex | number[Any] | bool_ = ..., w: ndarray[Any, Any] | None = ...) -> ndarray[Any, Any]
Found 3 errors in 1 file (checked 1 source file)

There are large stub libraries dedicated to this sort of hinting; see e.g. pandas-stubs. I didn't find any for scipy but I didn't look very hard.

I greatly appreciate your feedback. Where should one look into / learn more about stubs? To clarify, in your opinion even the LiteralEnum instances like PDistMetric are too much? As I don't know another way to reliably check all valid metrics / methods without checking docstrings. Also thanks for showing me the imports for pdist. I'm now learning about Protocols

Stack Exchange Network

Python 3.10+ deconstructing an over engineered solution to better understand how metaclasses work with properties, static methods, and classmethods

TL;DR

Disclaimer

Question(s)

Newbie question

`staticmethod` vs `classmethod` vs `property` at the metaclass / class level?

Usefulness of `ClassMethodSignatureMixin`

Code

Imports

LiteralEnum

MetaClass

LiteralEnum

Decorators

SciPy LiteralEnum Examples

Linkage

PDistMetric

ScoreMethod

ClassMethodSignaturePriority

Mixin

KeywordArguments

KeywordArgumentsMeta

KeywordArgumentsMixin

KeywordArgumentsBaseClass

SciPyLinkage

2 Answers 2

@classmethod and static

properties

inspect

raise objects

isort

use conventional names

follow the contract

logging

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python 3.10+ deconstructing an over engineered solution to better understand how metaclasses work with properties, static methods, and classmethods

TL;DR

Disclaimer

Question(s)

Newbie question

staticmethod vs classmethod vs property at the metaclass / class level?

Usefulness of ClassMethodSignatureMixin

Code

Imports

LiteralEnum

MetaClass

LiteralEnum

Decorators

SciPy LiteralEnum Examples

Linkage

PDistMetric

ScoreMethod

ClassMethodSignaturePriority

Mixin

KeywordArguments

KeywordArgumentsMeta

KeywordArgumentsMixin

KeywordArgumentsBaseClass

SciPyLinkage

2 Answers 2

@classmethod and static

properties

inspect

raise objects

isort

use conventional names

follow the contract

logging

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`staticmethod` vs `classmethod` vs `property` at the metaclass / class level?

Usefulness of `ClassMethodSignatureMixin`