After watching this presentation, I decided that it would be a good project for me to undergo to practice my python
skills. A summary of the presentation:
Jack Diederich, the presenter, talks about the overuse of classes in programming. A phrase he used a lot was "you shouldn't be writing a class if it has two methods, one of which is init"
Video description: "Classes are great but they are also overused. This talk will describe examples of class overuse taken from real world code and refactor the unnecessary classes, exceptions, and modules out of them."
Having his phrase as my model, I wrote a python
program that read a .py
file, and if it has only one method that isn't __init__
, the program refactors the code into one method and appends the new code to the end of the file. A more detailed description is available in the docstring
at the top of the source code of class_cruncher.py
.
THIS PROGRAM ASSUMES A LOT
- The entire file is a class. If there are any other methods that are not related to the class, but are contained in the file, the program will pick up on those methods and it won't work correctly. It is currently only written for code that it completely wrapped in a class.
I would like feedback on code efficiency and any other nitpicks you can find in my code. It took a couple hours to write, and I'm sure there are areas that can be improved.
class_cruncher.py
"""
Objective:
Take in python file, and determine if class inside can be
reduced. This program assumes all methods are inside the
class, and are all related to the class. It won't work if there are
methods outside the scope of the class; it will count those too.
Criteria for reduction:
- Two methods, one of which is __init__
- One method
- Empty class (has only pass)
If reduced:
The class should be restrained into one method, who's name will be
the name of the only method (other than __init__) in the class. This
method will be appended to the file in a docstring.
"""
import sys
def get_file_lines():
"""
Method written explicitly for not having to loop through
and re-open the file each time
"""
global file
lines = []
with open(file) as f:
for line in f:
lines.append(line)
return lines
def get_function_code(method_name):
"""
Gets all the code in the method name passed
"""
global function_params
code = []
lines = get_file_lines()
for i in range(len(lines)):
if method_name in lines[i]:
for j in range(len(lines)):
if j > i:
if lines[j] != '\n':
if "self" in lines[j]:
function_params += (", " + add_to_params(lines[j]))
code.append(remove_self(lines[j]))
else:
code.append(lines[j])
else:
break
return code
def get_function_paramaters(method_name):
"""
Gets the parameters for the method name specified
"""
for line in get_file_lines():
if method_name in line:
length = (len(method_name) * 2) + 6
header = line.strip()[length:]
parameters = ""
for char in header:
if char != ")":
parameters += char
else:
break
return parameters
def get_function_name():
"""
Searches through the file and finds the name of the function
that will be the base name for the reducing
"""
for line in get_file_lines():
if is_function(line):
func_line = line.strip()[4:]
name = ""
for char in func_line:
if char != "(":
name += char
else:
break
return name
def is_function(line):
"""
Checks if line contains "def", determining if it is a function
Also checks if it contains __init__, for which it returns False
"""
if "def" in line:
if "__init__" in line:
return False
return True
return False
def function_count():
"""
Counts how many functions there are in the class
"""
count = 0
for line in get_file_lines():
if is_function(line):
count += 1
return count
def add_to_params(line):
"""
Adds a variable to the params of the new function if it's included
in the code and has `self` before it.
"""
for i in range(len(line)):
j = i + 4
if line[i:j] == "self":
cut_line = line[j + 1:]
new_param = ""
for char in cut_line:
if char.isalpha():
new_param += char
else:
break
return new_param
def remove_self(line):
"""
Removes `self.` from the line, and returns the new formatted line
"""
for i in range(len(line)):
j = i + 4
if line[i:j] == "self":
new_line = line[:i] + line[j + 1:]
return new_line
def combine_all(name, params, code):
"""
Combines all the collected information into one new method.
The `line[1:]` removes the first '\t' for better formatting
"""
method = "def " + name + "(" + params + "):\n"
for line in code:
method += line[1:]
return method
def appendToFile(content):
"""
Appends the new method to the provided file
"""
global file
with open(file, "a") as f:
f.write("\n\n########################")
f.write("\n# REDUCED CLASS BELOW #")
f.write("\n########################\n")
f.write(content)
"""
The below code is all in the main guard and not in a seperate function
because I need access to `function_params` in order to add more params
should I need to. This was the only solution I could think of.
"""
if __name__ == '__main__':
file = input("Enter file path: ")
#code/python/class_cruncher/yes_reduce.py # Used as shortcut for testing
if function_count() > 1:
print("CLASS DOES NOT NEED TO BE REDUCED")
sys.exit(0)
function_name = get_function_name()
function_params = get_function_paramaters(function_name)
function_code = get_function_code(function_name)
new_function = combine_all(function_name, function_params, function_code)
appendToFile(new_function)
print("Class successfully reduced!")
Below are two test cases I used. yes_reduce.py
is a class that should be reduced by this program, and no_reduce.py
is a class that should not be reduced by this program. The code in the two test cases are irrelevant; I wrote random code so that something would be in the methods.
yes_reduce.py
"""
This is a class that should be reduced
"""
class Greeting():
def __init__(self, message="Hello"):
self.message = message
def greet(self, person):
print(f"{self.message}, {person}!")
print("Test1")
print("test2")
no_reduce.py
"""
This is a class that should not be reduced
"""
class API():
def __init__(self, param1, param2, param3):
self.param1 = param1
self.param2 = param2
self.param3 = param3
def method1(self):
return self.param1 + self.param2
def method2(self):
return True if self.param1 == self.pram2 else False
def xyz(self, name):
return f"xyz: {self.param3}"
Below is what yes_reduce.py
looks like after being reduced.
yes_reduce.py after compression
"""
This is a class that should be reduced
"""
class Greeting():
def __init__(self, message="Hello"):
self.message = message
def greet(self, person):
print(f"{self.message}, {person}!")
print("Test1")
print("test2")
########################
# REDUCED CLASS BELOW #
########################
def greet(person, message):
print(f"{message}, {person}!")
print("Test1")
print("test2")
2 Answers 2
That's a difficult problem to attack. Without going into those difficulties, here are some comments on your code.
Global variables are generally frowned upon. They are basically a hidden function parameter, which can make it hard to debug and maintain the software. So, for example, it would be better to explicitly pass file
as a parameter of get_file_lines()
rather than use a global variable:
def get_file_lines(file):
"""
Method written explicitly for not having to loop through
and re-open the file each time
"""
lines = []
with open(file) as f:
for line in f:
lines.append(line)
return lines
The doc string for get_file_lines()
saya it is so the file doesn't have to be reopened and reread. However, this function is called four times by other functions in the program. Each time the file is being opened and read. To correct this, call get_file_lines()
once when initializing the program and saving the lines
that gets returned. Then pass lines
into whatever other functions need to look at the source code like so:
lines = get_file_lines(file)
function_name = get_function_name(lines)
function_params = get_function_paramaters(lines, function_name)
....etc....
Then instead of for line in get_file_lines():
in the other functions use:
for line in lines:
....
Functions can return more than one value, so instead of making function_params a global variable, it can be passed as an argument and returned. Also, for-loops can have a starting value. So, get_function_code()
can be written something like this:
def get_function_code(lines, method_name, function_params):
"""
Gets all the code in the method name passed
"""
code = []
for i in range(len(lines)):
if method_name in lines[i]:
for j in range(i+1, len(lines)): #<-- changed start
if lines[j] == '\n':
break
if "self" in lines[j]:
function_params += (", " + add_to_params(lines[j]))
code.append(remove_self(lines[j]))
else:
code.append(lines[j])
return code, function_params
There are other things that could be cleaned up, but that's all I've time for right now.
Also, have a look at the pyclbr module in the standard library. It provides functions to read a python file and return a dictionary of all the classes and functions. For classes, it can be used to find the line number where the class is defined, a list of all its methods, and the lines where the methods are defined.
Loop like a native
This:
for i in range(len(lines)):
if method_name in lines[i]:
is an anti-pattern. If you need to iterate over lines, simply
for line in lines:
However, since you also do an index comparison, you might need
for i, line in enumerate(lines):
Also, that nested i/j loop does not need to be nested. As soon as you find where the method name is, you should break out of the first loop.
No magic numbers
What does the 4
in line.strip()[4:]
do? This should be put into a variable for easier understanding.
The bigger problem
You've written a Python parser. This is deeply unnecessary. Have a read through https://docs.python.org/3.7/library/ast.html#module-ast
Explore related questions
See similar questions with these tags.
message
? \$\endgroup\$inspect
library. It would allow you to get the method names, count and attributes of the class as from python objects, not by reading the file as it's a string. \$\endgroup\$