2
\$\begingroup\$

Our testing suite is configured using a type of hierarchical config file similar to .ini or .toml files. Each 'block' is a test that is run on our cluster. I'm working on adding unique ids to each test that will help map results to tests over time. A small example of a config file would be:

[Tests]
 [test1]
 type = CSVDIFF
 input = test1.i
 output = 'test1_chkfile.csv'
 max_tol = 1.0e-10
 unique_id = fa3acd397ae0d633194702ba6982ee93da09b835945845771256f19f44816f31
 []
 [test2]
 type = CSVDIFF
 input = test2.i
 output = 'test2_chkfile.csv'
 []
[]

The idea is to check all of these files and for tests that don't have an id, create it and write it to the file. I would like a quick code-review on my style, convention, and logic. Here is the script that will be run as a part of the CI precheck:

#!/usr/bin/env python3
import hashlib
import sys
from glob import glob
from textwrap import dedent
from time import time
from collections import UserDict
from typing import List, AnyStr, Any
import pyhit # type: ignore
class StrictDict(UserDict):
 """Custom dictionary class that raises ValueError on duplicate keys.
 This class inherits from collections.UserDict, which is the proper
 way to create a subclass inheriting from `dict`. This dictionary
 will raise an error if it is given an `unique_id` key that it has
 previously indexed. Otherwise, it provides all methods and features
 of a standard dictionary.
 """
 def __setitem__(self, key: Any, value: Any) -> None:
 try:
 current_vals = self.__getitem__(key)
 raise ValueError(
 dedent(
 f"""\
 Duplicate key '{key}' found!
 First id found in {current_vals[0]}, line {current_vals[1]}.
 Duplicate id found in {value[0]}, line {value[1]}.\
 """
 )
 )
 except KeyError:
 self.data[key] = value
def hashnode(node: pyhit.Node) -> str:
 """Return a sha256 hash of spec block to be used as a unique id."""
 # time() returns the number of seconds since Jan. 1st, 1970 in UTC.
 hash_str = node.fullpath + str(time()) + node.render()
 sha_signature = hashlib.sha256(hash_str.encode()).hexdigest()
 return sha_signature
def fetchnodes(root: pyhit.Node) -> List[pyhit.Node]:
 """Return a list of children nodes that will either have or need ids."""
 nodes = []
 for node in root.children[0].descendants:
 # Ensure we only grab blocks that contain specification vars.
 if node.get("type") is None:
 continue
 nodes.append(node)
 return nodes
def indexnodes(file_paths: List[AnyStr]) -> StrictDict:
 """Return dictionary containing a list of nodes for every file."""
 node_dict = StrictDict()
 for file_path in file_paths:
 root = pyhit.load(file_path)
 node_dict[(file_path, root)] = fetchnodes(root)
 return node_dict
def indexids(node_dict: StrictDict) -> StrictDict:
 """Return a dictionary of ids containing file and line info."""
 id_dict = StrictDict()
 for (file_path, _), nodes in node_dict.items():
 for node in nodes:
 unique_id = node.get("unique_id")
 if unique_id is None:
 continue
 else:
 id_dict[unique_id] = (file_path, node.line("unique_id"))
 return id_dict
def writeids(node_dict: StrictDict, id_dict: StrictDict) -> int:
 """Return number of files written that needed a hashed id."""
 num = 0
 for (file_path, root), nodes in node_dict.items():
 # Assume we won't need to write any new files
 write_p = False
 for node in nodes:
 if node.get('unique_id') is None:
 hash_str = hashnode(node)
 node['unique_id'] = hash_str
 id_dict[hash_str] = (file_path, node.line("unique_id"))
 write_p = True
 if write_p:
 pyhit.write(file_path, root)
 num += 1
 return num
def main():
 """Driving function for script."""
 # Make sure to run script in root of BISON.
 assessment_specs = glob("./assessment/**/*/assessment", recursive=True)
 spec_dict = indexnodes(assessment_specs)
 id_dict = indexids(spec_dict)
 num_files_written = writeids(spec_dict, id_dict)
 if num_files_written > 0:
 print("Your code requires assessment file changes.")
 print("You can run ./scripts/unique_assessment_id.py in the top level of your repository.")
 print("Then commit the changes and resubmit.")
 return 1
 return 0
if __name__ == "__main__":
 sys.exit(main())
Reinderien
71k5 gold badges76 silver badges256 bronze badges
asked Jun 23, 2020 at 15:47
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Generators

In this:

nodes = []
for node in root.children[0].descendants:
 # Ensure we only grab blocks that contain specification vars.
 if node.get("type") is None:
 continue
 nodes.append(node)
return nodes

you do not need to construct a list like this. Instead, perhaps

return (
 node for node in root.children[0].descendants
 if node.get('type') is not None
)

This will remain a generator until it is materialized to a list or tuple, etc., which you might not need if you iterate over the results once.

answered Jun 23, 2020 at 16:56
\$\endgroup\$
3
  • \$\begingroup\$ Suppose I'm trying to modify those nodes placed in the generator. If I loop through the generator and modify those nodes, will it be saved in memory when I later write the modifications to the file? \$\endgroup\$ Commented Jun 23, 2020 at 19:56
  • \$\begingroup\$ I'm not completely clear on what you're asking, but you can either cast to a list, modify it and then write to a file; or you can iterate to a second generator with your modifications and then write to a file. The first method requires O(n) in memory and the second O(1). \$\endgroup\$ Commented Jun 23, 2020 at 19:58
  • 1
    \$\begingroup\$ Great, followed your advice and was able to refactor so I only iterate once. I totally forgot that generators get "used up" after looping through them. Thanks. \$\endgroup\$ Commented Jun 24, 2020 at 16:14

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.