0

I am writing a code which compares file1 (single column of entries) with file 2 (3 column of entries) and fetch matched records from file 2 on basis of first column. The problem is that it is evaluating the loop only once.

File1:

ABC
DEF
JKL

File2:

IJK,123,SDF
ABC,456,HJK
QWE,876,GFT
JKL,098,HGF

.....

My code:

for entry in fh_file1:
 mir = entry.strip('\n')
 print(mir)
 for row in fh_file2:
 row_splt = row.split(',') 
 print(row_splt[0])
 if mir in row_splt[0]:
 print (row.strip('\n'))
 else:
 pass

Result from that code:

is just the match of first entry of file 1:

ABC 456 HJK

Please help me on this.

asked Mar 22, 2012 at 19:30
6
  • Why if mir in row_splt[0]: not if mir == row_splt[0]:? Commented Mar 22, 2012 at 19:34
  • 1
    This sort of question is asked very often, but I am having trouble finding a good duplicate. Commented Mar 22, 2012 at 19:35
  • @tauran That's not even nearly the same. One checks for exact equality, the other is a substring/element (depending on the types, didn't look at the code too closely) search. Commented Mar 22, 2012 at 19:36
  • 1
    @delnan: I know. But I don't understand why he uses in here. Commented Mar 22, 2012 at 19:38
  • I don't see any commas in your data. Why is your string.split() set to split on commas? Commented Mar 22, 2012 at 21:00

2 Answers 2

4

Files are streams of data. When you loop over them, you read them a line at a time. At the end of the inner loop, that file has reached the end. It will not start again at the beginning for the next iteration of the outer loop, because that's not how files work.

You should usually read the file into memory first: list(fh_file1) will give you a list of lines that you can loop over as many times as you like.

answered Mar 22, 2012 at 19:34
Sign up to request clarification or add additional context in comments.

1 Comment

fh_file2 is the problem, not fh_file1, and file.readlines() is usually more efficient and clearer than list(file). Also, instead of reading everything into memory you can re-wind or re-open the file.
3

You need to add fh_file2.seek(0) before the second for loop to start over at the beginning of the file.

You'd be better served, however, by reading it into memory once:

file2_lines = fh.file2.readlines()

then iterating over file2_lines. Reading the file from disk for each line in another file is going to be very slow.

answered Mar 22, 2012 at 19:43

1 Comment

Thanks. The 'seek' suggestion did the job perfectly. I tried to find a solution online but couldn't find it anywhere. Many Thanks.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.