Printing random lines from a file in Python

Question 1

Is this the most efficient solution or can the performance be improved? Just opens a file and prints random words (lines with newlines stripped out).

import random
filename = "20k.txt"
def main(filename):
 words_file = open(filename, 'r')
 words = words_file.read().split("\n")
 temp_words = []
 for word in words:
 temp_words.append(word.strip("\r"))
 words = temp_words
 while True:
 length = int(raw_input("Words:"))
 for i in range(0,length):
 print(words[random.randrange(0,len(words)+1)]),
 print("\n")
if __name__ == "__main__":
 main(filename)

Question 2

First, you should always close your files. Or, use the with keyword, which takes care of this for you.

Then, the 'r' mode is already the default mode for opening a file. Also, consider storing your words in a set, instead of a list (gets rid of duplicate words, which might be wanted, or not if you want a higher probability for some words).

You should consider using random.choice to get a random element from an iterable (actually an iterable with len defined).

I would also separate the file reading into a separate method. and make it more explicit what the prompt actually means (the number of words to generate).

I would use str.join and give it a generator expression rather than multiple calls to print.

import random
def get_words(file_name):
 with open(filename) as word_file:
 return list(set(word.strip() for word in word_file))
def random_words(n, words)
 return " ".join(random.choice(words) for _ in range(n))
def main(file_name):
 words = get_words(file_name)
 while True:
 n = int(raw_input("Number of words:"))
 print(random_words(n, words))
if __name__ == "__main__":
 file_name = "20k.txt"
 main(file_name)

Question 3

Right, I forgot about the close. Using a set is a better option, yes. Didn't know about random.choice. Thanks a lot.

Question 4

Your code ran: TypeError: 'set' object does not support indexing

Question 5

What does .strip() do without any arguments?

Question 6

@SamuelShifterovich It strips away whitespace from the edges, including all kinds of line-breaks. So " asgsg \n\r\n".strip() == 'asgsg'.

Question 7

Your answer contains words_file = open(filename) and with open(filename) as word_file:. I guess the first line is a mistake?

Question 8

You would greatly improve performance if you stop reading the whole 20k lines file into memory. The difference for 100 executions (3 words per execution) is 0.018 vs 0.793 seconds.

I suggest you to use word_file.seek method instead. It manipulates stream position to read just the word you need, not the whole file.

To get random position, use random.randint(0, EOF).

Details are here: https://github.com/artgromov/CorrectHorse.

I have made pull request to your repo.

Question 9

I should've specified it's Python 2.7

Graipher Graipher 41.6k7 gold badges70 silver badges134 bronze badges · Accepted Answer · 2016-10-04 16:12:29Z

First, you should always close your files. Or, use the with keyword, which takes care of this for you.

Then, the 'r' mode is already the default mode for opening a file. Also, consider storing your words in a set, instead of a list (gets rid of duplicate words, which might be wanted, or not if you want a higher probability for some words).

You should consider using random.choice to get a random element from an iterable (actually an iterable with len defined).

I would also separate the file reading into a separate method. and make it more explicit what the prompt actually means (the number of words to generate).

I would use str.join and give it a generator expression rather than multiple calls to print.

import random
def get_words(file_name):
 with open(filename) as word_file:
 return list(set(word.strip() for word in word_file))
def random_words(n, words)
 return " ".join(random.choice(words) for _ in range(n))
def main(file_name):
 words = get_words(file_name)
 while True:
 n = int(raw_input("Number of words:"))
 print(random_words(n, words))
if __name__ == "__main__":
 file_name = "20k.txt"
 main(file_name)

Right, I forgot about the close. Using a set is a better option, yes. Didn't know about random.choice. Thanks a lot.
Your code ran: TypeError: 'set' object does not support indexing
@SamuelShifterovich It strips away whitespace from the edges, including all kinds of line-breaks. So " asgsg \n\r\n".strip() == 'asgsg'.
Your answer contains words_file = open(filename) and with open(filename) as word_file:. I guess the first line is a mistake?

Stack Exchange Network

Printing random lines from a file in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Printing random lines from a file in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions