I am writing a code which compares file1 (single column of entries) with file 2 (3 column of entries) and fetch matched records from file 2 on basis of first column. The problem is that it is evaluating the loop only once.
File1:
ABC
DEF
JKL
File2:
IJK,123,SDF
ABC,456,HJK
QWE,876,GFT
JKL,098,HGF
.....
My code:
for entry in fh_file1:
mir = entry.strip('\n')
print(mir)
for row in fh_file2:
row_splt = row.split(',')
print(row_splt[0])
if mir in row_splt[0]:
print (row.strip('\n'))
else:
pass
Result from that code:
is just the match of first entry of file 1:
ABC 456 HJK
Please help me on this.
2 Answers 2
Files are streams of data. When you loop over them, you read them a line at a time. At the end of the inner loop, that file has reached the end. It will not start again at the beginning for the next iteration of the outer loop, because that's not how files work.
You should usually read the file into memory first: list(fh_file1) will give you a list of lines that you can loop over as many times as you like.
1 Comment
fh_file2 is the problem, not fh_file1, and file.readlines() is usually more efficient and clearer than list(file). Also, instead of reading everything into memory you can re-wind or re-open the file.You need to add fh_file2.seek(0) before the second for loop to start over at the beginning of the file.
You'd be better served, however, by reading it into memory once:
file2_lines = fh.file2.readlines()
then iterating over file2_lines. Reading the file from disk for each line in another file is going to be very slow.
1 Comment
Explore related questions
See similar questions with these tags.
if mir in row_splt[0]:notif mir == row_splt[0]:?inhere.string.split()set to split on commas?