Program that reads 4 files and outputs contents of it into another file after formatting it

Question 1

The below Python 3 program scans 4 files a.txt, b.txt, c.txt, d.txt and then outputs the read data into a file output.txt in a formatted way. The first line of each file is guaranteed to have a header and the second line of each file will be blank. I'm required to scan those four files.

Program:

def main():
 with open('a.txt', 'r') as file_a, open('b.txt', 'r') as file_b, open('c.txt', 'r') as file_c, open('d.txt', 'r') as file_d:
 lines1, lines2, lines3, lines4 = file_a.readlines(), file_b.readlines(), file_c.readlines(), file_d.readlines()
 lines = [lines1, lines2, lines3, lines4]
 number_of_spaces = 5
 assert len(lines1) == len(lines2) == len(lines3) == len(lines4), "Error. Different number of lines found in the files"
 row, column = 0, 1
 with open('output.txt', 'w') as output:
 while row < len(lines):
 output.write(lines[row][0].strip() + ' ' * number_of_spaces)
 row += 1
 output.write('\n')
 row = 0
 while column < len(lines1):
 while row < len(lines):
 output.write(lines[row][column].strip() + ' ' * (number_of_spaces + len(lines[row][0].strip()) - len(lines[row][column].strip())))
 row += 1
 output.write('\n') 
 column += 1
 row = 0 
if __name__ == "__main__":
 main()

When executed gives output.txt:

Sl.No Elements Abbreviation Mass 
1 Hydrogen H 1 
2 Helium He 4 
3 Lithium Li 7 
4 Beryllium Be 9 
...
98 Californium Cf 251 
99 Einsteinium Es 252 
100 Fermium Fm 257

Is there any room for improvement?

Additional information (if necessary):

Note that ... in the files mean that there are a lot of similar data and not that the file contains those dots.

a.txt:

Sl.No
1 
2 
3 
...
99
100

b.txt:

Elements
Hydrogen 
Helium
Lithium
Beryllium
...
Californium
Einsteinium
Fermium

c.txt:

Abbreviation
H
He
Li
Be
...
Cf
Es
Fm

d.txt:

Mass
1
4
7
9
...
251
252
257

Question 2

Is there any benefit in having all that inside a function, if you only have one function and all you do is call it once?
Having variables numbered, such as line1, line2, line3, line4 and all the readlines() calls is repetitive; use a list to hold the list, and then loop over it.
Calling .strip() on the same value in several places in a single line is hard to read, and risks that you'll forget it. Generally, sanitize inputs in one place, so you can control it carefully.
I saw someone else say "assert is for catching bugs, not detecting errors", so I'll shamelessly pinch it, and apply it to your code. You could check lengths as you load the files, and say which one is failing (but I'm leaving it out because if the script crashes it's only going to leave an unfinished output.txt, which isn't so bad).
You use the variable row to count through the columns, and column to count through the rows. That's confusing.
number_of_spaces looks adjustable; put it right at the top of the file, where it's easy to see and get to. (And maybe give it a more descriptive name?).
Counting through things with row=0; while ...: row+=1 is not idiomatic Python; iterating directly over a sequence is neater, or iterating over a range() or enumerate() if you need a counter or index is also good.
with-ass-cover(Using with open()... is probably a good habit to be in, so I don't dare recommend against it): but this is a half a page script with minimal error checking, "open file descriptors hanging around too long" isn't a massive concern, so I'm quietly removing it as clutter in this case.
Using a fixed number_of_spaces means that if any of the lines go over the column width it will just push the layout out. It might be more resilient to set the width by the widest value, rather than the header.

These comments lead me to a slightly different version, all named in terms of rows and columns. :

files = ['a.txt', 'b.txt', 'c.txt', 'd.txt']
number_of_spaces = 5
columns = []
for f in files:
 rows = open(f, 'r').readlines()
 if columns and (len(rows) != len(columns[0])):
 print "File {} has a mismatched number of lines".format(f)
 else:
 columns.append([row.strip() for row in rows])
output = open('output.txt', 'w')
headers = [column[0] for column in columns]
header_row = (' ' * number_of_spaces).join(headers)
output.write(header_row + '\n')
for row_number in range(1, len(columns[0])):
 for column in columns:
 output.write(column[row_number] + ' ' * (number_of_spaces + len(column[0]) - len(column[row_number])))
 output.write('\n')

But this still feels like a heavy, dense lot of code to read four files into four fixed-width columns. Use string.ljust() to pad the values to the right width.

@Caridorc's suggestion of zip() is good; here I'm calling it with *args which effectively does the column-to-row conversion.

buffer_width = 5 
columns = []
files = ['a.txt', 'b.txt', 'c.txt', 'd.txt']
for f in files:
 lines = open(f, 'r').readlines()
 header_width = len(lines[0].strip()) + buffer_width 
 columns.append( [L.strip().ljust(header_width) for L in lines] )
output = open('output.txt', 'w')
for row in zip(*columns):
 output.write(''.join(row) + '\n')

And after writing and rewriting this far more times than you'd guess, and going back and forth on naming things lines or rows or values or columns, vacillating about how much to do with list comprehensions vs. how much with for loops, and taking entirely too long dismissing zip, then coming back to it and realising I should have understood what Caridorc was saying before dismissing it ... after that, I quite like this version.

Question 3

Nice answer. I use python3, not python2 as seen in the tags of the question.

Question 4

Oops; I should pay more attention to Python3.

Question 5

I think that zip can save you some work:

with open('a.txt', 'r') as file_a, open('b.txt', 'r') as file_b, \
 open('c.txt', 'r') as file_c, open('d.txt', 'r') as file_d:
 for content in zip(file_a.readlines(), file_b.readlines(),
 file_c.readlines(), file_d.readlines()):
 print(content)

The above is just a basic example, you will want to:

Add error checking (asserting for example)
Add an output file
Add necessary spaces

TessellatingHeckler TessellatingHeckler 6714 silver badges9 bronze badges · Accepted Answer · 2015-06-27 00:37:09Z

Is there any benefit in having all that inside a function, if you only have one function and all you do is call it once?
Having variables numbered, such as line1, line2, line3, line4 and all the readlines() calls is repetitive; use a list to hold the list, and then loop over it.
Calling .strip() on the same value in several places in a single line is hard to read, and risks that you'll forget it. Generally, sanitize inputs in one place, so you can control it carefully.
I saw someone else say "assert is for catching bugs, not detecting errors", so I'll shamelessly pinch it, and apply it to your code. You could check lengths as you load the files, and say which one is failing (but I'm leaving it out because if the script crashes it's only going to leave an unfinished output.txt, which isn't so bad).
You use the variable row to count through the columns, and column to count through the rows. That's confusing.
number_of_spaces looks adjustable; put it right at the top of the file, where it's easy to see and get to. (And maybe give it a more descriptive name?).
Counting through things with row=0; while ...: row+=1 is not idiomatic Python; iterating directly over a sequence is neater, or iterating over a range() or enumerate() if you need a counter or index is also good.
with-ass-cover(Using with open()... is probably a good habit to be in, so I don't dare recommend against it): but this is a half a page script with minimal error checking, "open file descriptors hanging around too long" isn't a massive concern, so I'm quietly removing it as clutter in this case.
Using a fixed number_of_spaces means that if any of the lines go over the column width it will just push the layout out. It might be more resilient to set the width by the widest value, rather than the header.

These comments lead me to a slightly different version, all named in terms of rows and columns. :

files = ['a.txt', 'b.txt', 'c.txt', 'd.txt']
number_of_spaces = 5
columns = []
for f in files:
 rows = open(f, 'r').readlines()
 if columns and (len(rows) != len(columns[0])):
 print "File {} has a mismatched number of lines".format(f)
 else:
 columns.append([row.strip() for row in rows])
output = open('output.txt', 'w')
headers = [column[0] for column in columns]
header_row = (' ' * number_of_spaces).join(headers)
output.write(header_row + '\n')
for row_number in range(1, len(columns[0])):
 for column in columns:
 output.write(column[row_number] + ' ' * (number_of_spaces + len(column[0]) - len(column[row_number])))
 output.write('\n')

But this still feels like a heavy, dense lot of code to read four files into four fixed-width columns. Use string.ljust() to pad the values to the right width.

@Caridorc's suggestion of zip() is good; here I'm calling it with *args which effectively does the column-to-row conversion.

buffer_width = 5 
columns = []
files = ['a.txt', 'b.txt', 'c.txt', 'd.txt']
for f in files:
 lines = open(f, 'r').readlines()
 header_width = len(lines[0].strip()) + buffer_width 
 columns.append( [L.strip().ljust(header_width) for L in lines] )
output = open('output.txt', 'w')
for row in zip(*columns):
 output.write(''.join(row) + '\n')

And after writing and rewriting this far more times than you'd guess, and going back and forth on naming things lines or rows or values or columns, vacillating about how much to do with list comprehensions vs. how much with for loops, and taking entirely too long dismissing zip, then coming back to it and realising I should have understood what Caridorc was saying before dismissing it ... after that, I quite like this version.

Nice answer. I use python3, not python2 as seen in the tags of the question.

Stack Exchange Network

Program that reads 4 files and outputs contents of it into another file after formatting it

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Program that reads 4 files and outputs contents of it into another file after formatting it

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions