The below Python 3 program scans 4 files a.txt
, b.txt
, c.txt
, d.txt
and then outputs the read data into a file output.txt
in a formatted way. The first line of each file is guaranteed to have a header and the second line of each file will be blank. I'm required to scan those four files.
Program:
def main():
with open('a.txt', 'r') as file_a, open('b.txt', 'r') as file_b, open('c.txt', 'r') as file_c, open('d.txt', 'r') as file_d:
lines1, lines2, lines3, lines4 = file_a.readlines(), file_b.readlines(), file_c.readlines(), file_d.readlines()
lines = [lines1, lines2, lines3, lines4]
number_of_spaces = 5
assert len(lines1) == len(lines2) == len(lines3) == len(lines4), "Error. Different number of lines found in the files"
row, column = 0, 1
with open('output.txt', 'w') as output:
while row < len(lines):
output.write(lines[row][0].strip() + ' ' * number_of_spaces)
row += 1
output.write('\n')
row = 0
while column < len(lines1):
while row < len(lines):
output.write(lines[row][column].strip() + ' ' * (number_of_spaces + len(lines[row][0].strip()) - len(lines[row][column].strip())))
row += 1
output.write('\n')
column += 1
row = 0
if __name__ == "__main__":
main()
When executed gives output.txt
:
Sl.No Elements Abbreviation Mass
1 Hydrogen H 1
2 Helium He 4
3 Lithium Li 7
4 Beryllium Be 9
...
98 Californium Cf 251
99 Einsteinium Es 252
100 Fermium Fm 257
Is there any room for improvement?
Additional information (if necessary):
Note that ...
in the files mean that there are a lot of similar data and not that the file contains those dots.
a.txt
:
Sl.No
1
2
3
...
99
100
b.txt
:
Elements
Hydrogen
Helium
Lithium
Beryllium
...
Californium
Einsteinium
Fermium
c.txt
:
Abbreviation
H
He
Li
Be
...
Cf
Es
Fm
d.txt
:
Mass
1
4
7
9
...
251
252
257
2 Answers 2
Is there any benefit in having all that inside a function, if you only have one function and all you do is call it once?
Having variables numbered, such as
line1, line2, line3, line4
and all the readlines() calls is repetitive; use a list to hold the list, and then loop over it.Calling
.strip()
on the same value in several places in a single line is hard to read, and risks that you'll forget it. Generally, sanitize inputs in one place, so you can control it carefully.I saw someone else say "
assert
is for catching bugs, not detecting errors", so I'll shamelessly pinch it, and apply it to your code. You could check lengths as you load the files, and say which one is failing (but I'm leaving it out because if the script crashes it's only going to leave an unfinished output.txt, which isn't so bad).You use the variable row to count through the columns, and column to count through the rows. That's confusing.
number_of_spaces
looks adjustable; put it right at the top of the file, where it's easy to see and get to. (And maybe give it a more descriptive name?).Counting through things with
row=0; while ...: row+=1
is not idiomatic Python; iterating directly over a sequence is neater, or iterating over arange()
orenumerate()
if you need a counter or index is also good.with-ass-cover(Using
with open()...
is probably a good habit to be in, so I don't dare recommend against it): but this is a half a page script with minimal error checking, "open file descriptors hanging around too long" isn't a massive concern, so I'm quietly removing it as clutter in this case.Using a fixed number_of_spaces means that if any of the lines go over the column width it will just push the layout out. It might be more resilient to set the width by the widest value, rather than the header.
These comments lead me to a slightly different version, all named in terms of rows and columns. :
files = ['a.txt', 'b.txt', 'c.txt', 'd.txt']
number_of_spaces = 5
columns = []
for f in files:
rows = open(f, 'r').readlines()
if columns and (len(rows) != len(columns[0])):
print "File {} has a mismatched number of lines".format(f)
else:
columns.append([row.strip() for row in rows])
output = open('output.txt', 'w')
headers = [column[0] for column in columns]
header_row = (' ' * number_of_spaces).join(headers)
output.write(header_row + '\n')
for row_number in range(1, len(columns[0])):
for column in columns:
output.write(column[row_number] + ' ' * (number_of_spaces + len(column[0]) - len(column[row_number])))
output.write('\n')
But this still feels like a heavy, dense lot of code to read four files into four fixed-width columns. Use string.ljust()
to pad the values to the right width.
@Caridorc's suggestion of zip()
is good; here I'm calling it with *args which effectively does the column-to-row conversion.
buffer_width = 5
columns = []
files = ['a.txt', 'b.txt', 'c.txt', 'd.txt']
for f in files:
lines = open(f, 'r').readlines()
header_width = len(lines[0].strip()) + buffer_width
columns.append( [L.strip().ljust(header_width) for L in lines] )
output = open('output.txt', 'w')
for row in zip(*columns):
output.write(''.join(row) + '\n')
And after writing and rewriting this far more times than you'd guess, and going back and forth on naming things lines or rows or values or columns, vacillating about how much to do with list comprehensions vs. how much with for loops, and taking entirely too long dismissing zip
, then coming back to it and realising I should have understood what Caridorc was saying before dismissing it ... after that, I quite like this version.
-
\$\begingroup\$ Nice answer. I use python3, not python2 as seen in the tags of the question. \$\endgroup\$Spikatrix– Spikatrix2015年06月27日 09:33:54 +00:00Commented Jun 27, 2015 at 9:33
-
\$\begingroup\$ Oops; I should pay more attention to Python3. \$\endgroup\$TessellatingHeckler– TessellatingHeckler2015年06月27日 15:49:26 +00:00Commented Jun 27, 2015 at 15:49
I think that zip
can save you some work:
with open('a.txt', 'r') as file_a, open('b.txt', 'r') as file_b, \
open('c.txt', 'r') as file_c, open('d.txt', 'r') as file_d:
for content in zip(file_a.readlines(), file_b.readlines(),
file_c.readlines(), file_d.readlines()):
print(content)
The above is just a basic example, you will want to:
- Add error checking (
assert
ing for example) - Add an output file
- Add necessary spaces