0

The task is to write the unique_file function which takes an input filename and an output filename as parameters. Your function should read contents from the input file and create a list of unique words --> Basically means no two or more of the same words can be writen in thee output file. The code I used is:

def unique_file(input_filename, output_filename):
 file = open(input_filename,"r")
 contents = file.read()
 word_list = contents.split()
 output_file = open(output_filename,'w+')
 for word in word_list:
 if word not in output_file:
 output_file.write(word + '\n')
 file.close()
 output_file.close()
 print('Done')

But this function just copies everything from the input file to the output file. So I get words like 'and' 'I' that occur more than once in the output file.

Please help.

Artsiom Rudzenka
29.3k5 gold badges36 silver badges53 bronze badges
asked Apr 15, 2014 at 10:41
2
  • Not 100% clear what you are asking: Do you want each word in the output file, but just once, or just the words that occurred just once? Commented Apr 15, 2014 at 10:45
  • wordlist = set(contents.split()) -- convert it to a set and you'll only have unique entries Commented Apr 15, 2014 at 10:46

2 Answers 2

1

You can't really check if word not in output_file: like that. I would suggest you use a set to get unique words:

def unique_file(input_filename, output_filename):
 with open(input_filename) as file:
 contents = file.read()
 word_set = set(contents.split())
 with open(output_filename, "w+") as output_file:
 for word in word_set:
 output_file.write(word + '\n')
 print("Done")

Note the use of with to handle files - see the last paragraph of the docs.

answered Apr 15, 2014 at 10:46
Sign up to request clarification or add additional context in comments.

Comments

1

That's because you cannot ask if a file contains a word like that. You'll have to create a list of words you're adding. EDIT: You should actually make seen a set(). Membership checking is less costly than with the list.

def unique_file(input_filename, output_filename):
 file = open(input_filename,"r")
 contents = file.read()
 word_list = contents.split()
 output_file = open(output_filename,'w+')
 seen = set()
 for word in word_list:
 if word not in seen:
 output_file.write(word + '\n')
 seen.add(word)
 file.close()
 output_file.close()
 print('Done')

If you don't need to worry about the order of the words you can just use the builtin set() which is a container that does not allow duplicates. Something like this should work:

def unique_file(input_filename, output_filename):
 with open(input_filename, "r") as inp, open(output_filename, "w") as out:
 out.writelines(set(inp.readlines()))
answered Apr 15, 2014 at 10:47

1 Comment

Even if the OP does want ordered output, it would be more efficient to make seen a set.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.