1
\$\begingroup\$

Goal: extract methods/functions defined in module. This excludes:

  1. Imports
  2. Lambda methods
  3. Magic methods
  4. Builtin methods
  5. Class methods
  6. Classes
  7. Non-original definitions (i.e. function assignments, alt_name = orig_fn)

My approach + test below. Any room for improvement, or false positives/negatives? (In particular I wonder if we can get away without typechecks, e.g. isinstance(method, types.LambdaType))


Code: live demo

import utils
def get_module_methods(module):
 def is_module_function(obj):
 return ('<function' in str(obj) and
 module.__name__ in getattr(obj, '__module__', ''))
 def not_lambda(obj):
 return not ('<lambda>' in getattr(obj, '__qualname__', ''))
 def not_magic(obj):
 s = getattr(obj, '__name__', '').split('__')
 return (len(s) < 2) or not (s[0] == s[-1] == '')
 def not_duplicate(name, obj):
 return name == getattr(obj, '__name__', '')
 def cond(name, obj):
 return (is_module_function(obj) and not_lambda(obj) and
 not_magic(obj) and not_duplicate(name, obj))
 objects = {name: getattr(module, name) for name in dir(module)}
 m_methods = {}
 for name, obj in objects.items():
 if cond(name, obj):
 m_methods[name] = obj
 return m_methods
mm = get_module_methods(utils)
_ = [print(k, '--', v) for k, v in mm.items()]

Test:

# utils.py
import random
import numpy as np
from inspect import getsource
def fn1(a, b=5):
 print("wrong animal", a, b)
class Dog():
 meow = fn1
 def __init__(self):
 pass
 def bark(self):
 print("WOOF")
d = Dog()
barker = d.bark
mewoer = d.meow
A = 5
arr = np.random.randn(100, 100)
def __getattr__(name):
 return getattr(random, name, None)
magic = __getattr__
duplicate = fn1
_builtin = str
lambd = lambda: 1
# main.py
import utils
# def get_module_methods(): ...
mm = get_module_methods(utils)
_ = [print(k, '--', v) for k, v in mm.items()]
fn1 -- <function fn1 at 0x7f00a7eb2820>

Edit: additional test case for a lambda surrogate:

import functools
def not_lambda():
 return 1
functools.wraps(lambda i: i)(not_lambda).__name__

Code in question and answer catch not_lambda as a lambda - false positive.

asked May 12, 2020 at 1:54
\$\endgroup\$

3 Answers 3

3
\$\begingroup\$

pyclbr

The standard library includes the pyclbr module, which provides functions for building a module browser. pyclbr.readmodule_ex() returns a tree of nested dicts with the functions (defstatements) and classes (class statements) in a module. So you just need to check the top level dict for items of class pyclbr.Function.

Note that readmodule_ex takes the name of the module (a str), not the actual module.

Just tested this, and it prints out imported functions and dunder functions as well, so those need to be filtered out.

import pyclbr
module = 'tutils'
for k,v in pyclbr.readmodule_ex(module).items():
 if ( isinstance(v, pyclbr.Function) 
 and v.module==module
 and not v.name.startswith('__')):
 print(f"{v.name} -- {v.file} at {v.lineno}")
answered May 12, 2020 at 15:56
\$\endgroup\$
3
  • \$\begingroup\$ Looks plausible, but the code given as-is fails the test case in the question (nothing is printed). \$\endgroup\$ Commented May 12, 2020 at 15:58
  • \$\begingroup\$ @OverLordGoldDragon, it printed out too much when I ran it, so I had to add tests for imported functions and dunder functions. Note, the source file for the module has to be on the search path so pyclbr can find it. \$\endgroup\$ Commented May 12, 2020 at 19:04
  • \$\begingroup\$ This does work, but it's more of an 'alternative' than 'code review' - also worth mentioning this docs' note: "information is extracted from the Python source code rather than by importing the module, so this module is safe to use with untrusted code" - which is an advantage over existing answers. \$\endgroup\$ Commented May 13, 2020 at 0:18
2
\$\begingroup\$
  • Not a fan of all these redundant functions.
  • Use equality for equality and in for in.

    • '<function' in str(obj) -> str(obj).startswith('<function')
    • module.__name__ in ... -> module.__name__ == ...
    • not ('<lambda>' in ...) -> '<lambda>' not in ... -> '<lambda>' != ...


    Lets play a game of what if:

    • What if foo.fn is copied to foo.bar.fn.
    • What if someone built a function from a lambda so they changed the name to fn_from_<lambda>.


    Yeah, let's stick to using == for equals.

  • Why is there a dictionary comprehension when the for after it consumes it?

    This is just a waste of memory.

  • like with the dictionary compression this is a waste of memory. Additionally it's normally easier to work from a plain for as you can use assignments.

    _ = [print(k, '--', v) for k, v in mm.items()]
    
import utils
def get_module_methods(module):
 output = {}
 for name in dir(module):
 obj = getattr(module, name)
 obj_name = getattr(obj, '__name__', '')
 if (str(obj).startswith('<function') # Is function
 and '<lambda>' != obj_name # Not a lambda
 and module.__name__ == getattr(obj, '__module__', '') # Same module
 and name == obj_name
 and not ( # Dunder
 obj_name.startswith('__')
 and obj_name.endswith('__')
 and len(obj_name) >= 5
 )
 ):
 output[name] = obj
 return output
mm = get_module_methods(utils)
for k, v in mm.items():
 print(k, '--', v)

Lets play another game of what if:

  • What if I have a class that's __str__ returns '<function...'?
  • What if someone built a function from a lambda so changed the __name__?

    import functools
    functools.wraps(lambda i: i)(...).__name__
    

In particular I wonder if we can get away without typechecks

No.

answered May 12, 2020 at 4:01
\$\endgroup\$
9
  • 1
    \$\begingroup\$ Thanks for the analysis. Valid loopholes in last two bullets, but the answer doesn't address them; I suppose a typecheck addresses the first, but how would one reliably identify a lambda? \$\endgroup\$ Commented May 12, 2020 at 14:27
  • 1
    \$\begingroup\$ To address some points: (1) I don't see the problem with "foo.fn is copied to foo.bar.fn"; (2) '<lambda>' != ... can't tell how this is better than '<lambda>' not in; (3) "waste of memory" - it's for readability, and their memory use is negligible (and garbage-collected); (4) # Same module not necessarily; if in PYTHONPATH, a package level can be bypassed, e.g. from a import b -> import b, but this has its own problems, and I agree with your preference for == here; still worth a mention. \$\endgroup\$ Commented May 12, 2020 at 14:33
  • 2
    \$\begingroup\$ No, you haven't - and your answer stands as incomplete by your own assessment. Maybe I'll still 'accept' it as a net-positive, but the shortcomings should be supposedly easy to correct. \$\endgroup\$ Commented May 12, 2020 at 16:30
  • 1
    \$\begingroup\$ To clarify, I see how '<lambda> != works - my question concerns not the running point "(2)", but "how to not filter a trojan horse" (e.g. functools's). Dedicated a question here. \$\endgroup\$ Commented May 12, 2020 at 17:23
  • 2
    \$\begingroup\$ I've done nothing to harass you. \$\endgroup\$ Commented May 12, 2020 at 18:13
0
\$\begingroup\$

Thanks to @Peilonrayz for the original answer, but I figure it's worth including a more complete version of the working function:

from types import LambdaType
def get_module_methods(module):
 output = {}
 for name in dir(module):
 obj = getattr(module, name)
 obj_name = getattr(obj, '__name__', '')
 if ((str(obj).startswith('<function')
 and isinstance(obj, LambdaType)) # is a function
 and module.__name__ == getattr(obj, '__module__', '') # same module
 and name in str(getattr(obj, '__code__', '')) # not a duplicate
 and "__%s__" % obj_name.strip('__') != obj_name # not a magic method
 and '<lambda>' not in str(getattr(obj, '__code__')) # not a lambda
 ):
 output[name] = obj
 return output

Lambda detection taken from this Q&A. Note that we can omit the isinstance(obj, LambdaType) typecheck, but with caveats:

  • str(obj).startswith('<function') is insufficient to tell whether obj is a function, since we can define a class whose __str__ returns '<function' (as pointed by Peilonrayz). However, __str__ is effective on a class instance, which fails the # not a duplicate check - so we can "luck out". This is unlikely in practice, but here's where a typecheck may be necessary:
class Dog():
 def __init__(self):
 self.__name__ = 'd'
d = Dog()

Edit: changed # not a duplicate check, which would falsely exclude not_lambda.

answered May 12, 2020 at 18:13
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.