Can't get this function to work in python

Question 1

The task is to write the unique_file function which takes an input filename and an output filename as parameters. Your function should read contents from the input file and create a list of unique words --> Basically means no two or more of the same words can be writen in thee output file. The code I used is:

def unique_file(input_filename, output_filename):
 file = open(input_filename,"r")
 contents = file.read()
 word_list = contents.split()
 output_file = open(output_filename,'w+')
 for word in word_list:
 if word not in output_file:
 output_file.write(word + '\n')
 file.close()
 output_file.close()
 print('Done')

But this function just copies everything from the input file to the output file. So I get words like 'and' 'I' that occur more than once in the output file.

Please help.

Question 2

Not 100% clear what you are asking: Do you want each word in the output file, but just once, or just the words that occurred just once?

Question 3

wordlist = set(contents.split()) -- convert it to a set and you'll only have unique entries

Question 4

You can't really check if word not in output_file: like that. I would suggest you use a set to get unique words:

def unique_file(input_filename, output_filename):
 with open(input_filename) as file:
 contents = file.read()
 word_set = set(contents.split())
 with open(output_filename, "w+") as output_file:
 for word in word_set:
 output_file.write(word + '\n')
 print("Done")

Note the use of with to handle files - see the last paragraph of the docs.

Question 5

That's because you cannot ask if a file contains a word like that. You'll have to create a list of words you're adding. EDIT: You should actually make seen a set(). Membership checking is less costly than with the list.

def unique_file(input_filename, output_filename):
 file = open(input_filename,"r")
 contents = file.read()
 word_list = contents.split()
 output_file = open(output_filename,'w+')
 seen = set()
 for word in word_list:
 if word not in seen:
 output_file.write(word + '\n')
 seen.add(word)
 file.close()
 output_file.close()
 print('Done')

If you don't need to worry about the order of the words you can just use the builtin set() which is a container that does not allow duplicates. Something like this should work:

def unique_file(input_filename, output_filename):
 with open(input_filename, "r") as inp, open(output_filename, "w") as out:
 out.writelines(set(inp.readlines()))

Question 6

Even if the OP does want ordered output, it would be more efficient to make seen a set.

jonrsharpe 123k31 gold badges278 silver badges489 bronze badges · Answer 1 · 2014-04-15 10:46:36Z

You can't really check if word not in output_file: like that. I would suggest you use a set to get unique words:

def unique_file(input_filename, output_filename):
 with open(input_filename) as file:
 contents = file.read()
 word_set = set(contents.split())
 with open(output_filename, "w+") as output_file:
 for word in word_set:
 output_file.write(word + '\n')
 print("Done")

Note the use of with to handle files - see the last paragraph of the docs.

msvalkon 12.1k2 gold badges46 silver badges38 bronze badges · Answer 2 · 2014-04-15 10:47:29Z

That's because you cannot ask if a file contains a word like that. You'll have to create a list of words you're adding. EDIT: You should actually make seen a set(). Membership checking is less costly than with the list.

def unique_file(input_filename, output_filename):
 file = open(input_filename,"r")
 contents = file.read()
 word_list = contents.split()
 output_file = open(output_filename,'w+')
 seen = set()
 for word in word_list:
 if word not in seen:
 output_file.write(word + '\n')
 seen.add(word)
 file.close()
 output_file.close()
 print('Done')

If you don't need to worry about the order of the words you can just use the builtin set() which is a container that does not allow duplicates. Something like this should work:

def unique_file(input_filename, output_filename):
 with open(input_filename, "r") as inp, open(output_filename, "w") as out:
 out.writelines(set(inp.readlines()))

Even if the OP does want ordered output, it would be more efficient to make seen a set.

CollectivesTM on Stack Overflow

Can't get this function to work in python

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related