5
\$\begingroup\$

I'm working on a sort of distributed build system. The system allows execution of snippets of scripts as build steps. I need to be able to hash these snippets of code in a way that comments and doc strings don't affect the hash. I'm getting part way there by using the ast module to parse the code, then doing an ast.dump and hashing the resulting string. The natural next step for me is to clear the first Expr(str()) node in the body of every FunctionDef node.

This is the best solution I've found so far.

Is there a better way to solve this problem?

import ast
import hashlib
import inspect
def _remove_docstring(node):
 '''
 Removes all the doc strings in a FunctionDef or ClassDef as node.
 Arguments:
 node (ast.FunctionDef or ast.ClassDef): The node whose docstrings to
 remove.
 '''
 if not (isinstance(node, ast.FunctionDef) or
 isinstance(node, ast.ClassDef)):
 return
 if len(node.body) != 0:
 docstr = node.body[0]
 if isinstance(docstr, ast.Expr) and isinstance(docstr.value, ast.Str):
 node.body.pop(0)
#-------------------------------------------------------------------------------
def hash_function(func):
 '''
 Produces a hash for the code in the given function.
 Arguments:
 func (types.FunctionObject): The function to produce a hash for
 '''
 func_str = inspect.getsource(func)
 module = ast.parse(func_str)
 assert len(module.body) == 1 and isinstance(module.body[0], ast.FunctionDef)
 # Clear function name so it doesn't affect the hash
 func_node = module.body[0]
 func_node.name = "" 
 # Clear all the doc strings
 for node in ast.walk(module):
 _remove_docstring(node)
 # Convert the ast to a string for hashing
 ast_str = ast.dump(module, annotate_fields=False)
 # Produce the hash
 fhash = hashlib.sha256(ast_str)
 result = fhash.hexdigest()
 return result
#-------------------------------------------------------------------------------
# Function 1
def test(blah):
 'This is a test'
 class Test(object):
 '''
 My test class
 '''
 print blah
 def sub_function(foo):
 '''arg'''
print hash_function(test)
#-------------------------------------------------------------------------------
# Function 2
def test2(blah):
 'This is a test'
 class Test(object):
 '''
 My test class
 '''
 print blah
 def sub_function(foo):
 '''arg meh'''
print hash_function(test2)
200_success
146k22 gold badges190 silver badges478 bronze badges
asked Apr 24, 2018 at 16:34
\$\endgroup\$
10
  • 1
    \$\begingroup\$ What about a function with closure? \$\endgroup\$ Commented Apr 24, 2018 at 16:43
  • \$\begingroup\$ ---I think you should use RegEx for this.--- Look into building a parse tree that is valid for all Python statements. You'll find a good many other uses for it besides this, I wager. \$\endgroup\$ Commented Apr 24, 2018 at 17:31
  • \$\begingroup\$ In my opinion, RegExes are only good for quick find-replace. It's better to avoid them in programs. \$\endgroup\$ Commented Apr 24, 2018 at 18:00
  • \$\begingroup\$ @dhu are you asking whether this will work for closures? I've tested it with functions within functions and classes within functions and it seems to work fine. \$\endgroup\$ Commented Apr 25, 2018 at 7:08
  • \$\begingroup\$ @Hosch250 I don't think it's a good idea to build another parser when python comes with a module already that already parses python perfectly. I would see this as over complicating the solution. \$\endgroup\$ Commented Apr 25, 2018 at 7:09

1 Answer 1

1
\$\begingroup\$

Looking at the source code for ast.get_docstring your code more or less does what it does. Here is their code for reference:

def get_docstring(node, clean=True):
 """
 Return the docstring for the given node or None if no docstring can
 be found. If the node provided does not have docstrings a TypeError
 will be raised.
 """
 if not isinstance(node, (AsyncFunctionDef, FunctionDef, ClassDef, Module)):
 raise TypeError("%r can't have docstrings" % node.__class__.__name__)
 if not(node.body and isinstance(node.body[0], Expr)):
 return
 node = node.body[0].value
 if isinstance(node, Str):
 text = node.s
 elif isinstance(node, Constant) and isinstance(node.value, str):
 text = node.value
 else:
 return
 if clean:
 import inspect
 text = inspect.cleandoc(text)
return text

A couple of the things they mention are only a concern for Python 3+ like AsyncFunctionDef. Also, it seems Constant is a grammar rule introduced in Python 3+. (However, you may want to account for this in the future.)

You seem to be also removing docstrings from ClassDef, I am not one hundred percent sure, but you may also want to add Module as well.

There is also a ast.NodeTransformer. However, I find it rather weird stylistically compared to your current method of looping. So... I'm not confident in making any comments about whether you should be using it or not. Nevertheless, I'm putting it out there so you are aware in case you weren't.

answered Apr 25, 2018 at 6:19
\$\endgroup\$
1
  • \$\begingroup\$ thanks for this. Yes I wrote it for python 2.7. Any suggestions for making it work with both? I included ClassDef because it's possible to define a class inside of a function, but I don't think its possible to declare a new module inside of a function which is why I left it out. I did try with ast.NodeTransformer and ast.NodeVisitor, but both seemed to be a few more lines of code and more difficult to read. \$\endgroup\$ Commented Apr 25, 2018 at 7:24

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.