Since I am fairly new to Python I was wondering whether anyone can help me by making the code more efficient. I know the output stinks; I will be using Pandas to make this a little nicer.
from xlrd import *
def main():
'''This Proram reads input (A:clone name, B:sequence, C:elisa) from an
Excel file and makes a cross comparison of each sequence pair'''
book = open_workbook("mtask.xlsx")
Input = book.sheet_by_index(0)
# naming of input data
a = (Input.col_values(0,0))
b = (Input.col_values(1,0))
c = (Input.col_values(2,0))
# make dictionary: keys are seq numbers; values are residues
y = {}
for i in range(Input.nrows):
x = []
for j in b[i]:
x.append(j)
y[a[i]] = x
# comparison of sequences and extraction of mutations for each sequence pair
List = []
for shit in range(Input.nrows):
for seq in range(Input.nrows):
seq12 = []
z = 0
for i in y[a[seq]]:
try:
for j in y[a[shit]][z]:
if i == j:
seq12.append(i.lower()+j.lower())
else:
seq12.append(i+j)
z = z+1
except IndexError:
print("oops")
lib = [a[seq],a[shit],c[seq],c[shit]]
for position, item in enumerate(seq12):
if item.isupper():
x = (str(item[0])+str(position+1)+str(item[1]))
lib.append(x)
List.append(lib)
# comparison of sequences and extraction of mutations for each sequence pair
dic = {}
for i in range(Input.nrows*Input.nrows):
x = []
for j in List[i]:
x.append(j)
dic[i] = x
# sort
a = []
for i in dic.values():
a.append(i)
# collect number of mutations in data files
import csv
null = []
one = []
two = []
three = []
four = []
five = []
six = []
seven = []
eight = []
nine = []
ten = []
for i in range(Input.nrows*Input.nrows):
if len(a[i]) <= 4:
null.append(a[i])
with open("no_mut.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(null)
elif len(a[i]) == 5:
one.append(a[i])
with open("one.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(one)
elif len(a[i]) == 6:
two.append(a[i])
with open("two.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(two)
elif len(a[i]) == 7:
three.append(a[i])
with open("three.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(three)
elif len(a[i]) == 8:
four.append(a[i])
with open("four.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(four)
elif len(a[i]) == 9:
five.append(a[i])
with open("five.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(five)
elif len(a[i]) == 10:
six.append(a[i])
with open("six.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(six)
elif len(a[i]) == 11:
seven.append(a[i])
with open("seven.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(seven)
elif len(a[i]) == 12:
eight.append(a[i])
with open("eight.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(eight)
elif len(a[i]) == 13:
nine.append(a[i])
with open("nine.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(nine)
elif len(a[i]) == 14:
ten.append(a[i])
with open("ten.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(ten)
main()
1 Answer 1
Follow PEP8
PEP8 is the official coding style guide of Python. You have several notable violations:
Avoid wildcard imports like
from xlrd import *
. It makes it hard to tell which names in the code come fromxlrd
.frim xlrd import open_workbook
would be much better and easier to understand.Use
snake_case
for variable names. The variablesInput
andList
violate this, but I admit these are a bit tricky becauseinput
andlist
could shadow builtin names, so you need to find something better instead. Instead ofInput
,sheet
seems a clear choice, and instead ofList
, something that describes what kind of list it is would be good, for examplesequence_pairs
These are just examples, there are many more PEP8 violations.
Use the pep8
command line utility (you can install with pip install --user pep8
)
to detect all the PEP8 violations in your source files.
Although it's not a PEP8 violation, but for better readability I recommend extra spaces around operators, and to remove unnecessary parentheses. So for example:
- Instead of:
x = (str(item[0])+str(position+1)+str(item[1]))
- Write like this:
x = (str(item[0]) + str(position+1) + str(item[1]))
Use list comprehensions
List comprehensions are a powerful and elegant sexy feature of Python. For example instead of this:
x = [] for j in b[i]: x.append(j) y[a[i]] = x
You can write simply:
y[a[i]] = [j for j in b[i]]
But actually, if you just want to clone a list, this is the best:
y[a[i]] = b[i][:]
Cleaning up the csv writing
This of course is a complete mess:
null = [] one = [] two = [] # ... for i in range(Input.nrows*Input.nrows): if len(a[i]) <= 4: null.append(a[i]) with open("no_mut.csv", "w", newline="") as f: writer = csv.writer(f) writer.writerows(null) elif len(a[i]) == 5: one.append(a[i]) with open("one.csv", "w", newline="") as f: writer = csv.writer(f) writer.writerows(one) elif len(a[i]) == 6: # ...
The code duplication might actually be the smaller of two big problems.
The biggest problem is that each file is potentially rewritten multiple times.
I don't know what kind of data you have,
but for example if you have 10 rows with len(a[i]) == 5
,
then the file one.csv
will be first written with 1 line, then rewritten with 2 lines, 3 lines, ..., in the end 10 lines.
This is crazy.
You should rework this to write each file only once.
Here's an untested approach that I think should work, and solves both the multiple writing and the code duplication:
files = (
(lambda length: length <= 4, "no_mut.csv", []),
(lambda length: length == 5, "one.csv", []),
(lambda length: length == 6, "two.csv", []),
(lambda length: length == 7, "three.csv", []),
# ... and so on
)
for i in range(sheet.nrows * sheet.nrows):
for matcher, _, rows in files:
if matcher(a[i]):
rows.append(a[i])
break
for _, filename, rows in files:
with open(filename, 'w', newline='') as fh:
writer = csv.writer(fh)
writer.writerows(rows)
-
\$\begingroup\$ This is a common misconception, but PEP 8 only requires spaces around assignment and comparison operators, for others it only requires equal amounts of whitespace on both sides. It does recommend using space to point out operators with different priorities, see here, but that doesn't really apply to the example you chose. \$\endgroup\$Jaime– Jaime2015年02月22日 14:47:04 +00:00Commented Feb 22, 2015 at 14:47
-
\$\begingroup\$ If you meant
str(position+1)
instead ofstr(position + 1)
, I corrected it now, thanks! If you meant something else please let me know! \$\endgroup\$janos– janos2015年02月22日 16:51:39 +00:00Commented Feb 22, 2015 at 16:51 -
\$\begingroup\$ Even the spaces between the
str
calls are not mandatory. I think they do increase readability, but PEP 8 leaves it to you to decide. \$\endgroup\$Jaime– Jaime2015年02月22日 18:29:25 +00:00Commented Feb 22, 2015 at 18:29 -
\$\begingroup\$ On closer look, you're absolutely right! I rephrased that in my answer, and thanks a lot for pointing out! \$\endgroup\$janos– janos2015年02月22日 18:34:52 +00:00Commented Feb 22, 2015 at 18:34
Explore related questions
See similar questions with these tags.