12
\$\begingroup\$

I read about how markov-chains were handy at creating text-generators and wanted to give it a try in python.

I'm not sure if this is the proper way to make a markov-chain. I've left comments in the code. Any feedback would be appreciated.

import random
def Markov(text_file):
 with open(text_file) as f: # provide a text-file to parse
 data = f.read()
 data = [i for i in data.split(' ') if i != ''] # create a list of all words 
 data = [i.lower() for i in data if i.isalpha()] # i've been removing punctuation
 markov = {i:[] for i in data} # i create a dict with the words as keys and empty lists as values
 pos = 0
 while pos < len(data) - 1: # add a word to the word-key's list if it immediately follows that word
 markov[data[pos]].append(data[pos+1])
 pos += 1
 new = {k:v for k,v in zip(range(len(markov)), [i for i in markov])} # create another dict for the seed to match up with 
 length_sentence = random.randint(15, 20) # create a random length for a sentence stopping point
 seed = random.randint(0, len(new) - 1) # randomly pick a starting point
 sentence_data = [new[start_index]] # use that word as the first word and starting point
 current_word = new[start_index]
 while len(sentence_data) < length_sentence:
 next_index = random.randint(0, len(markov[current_word]) - 1) # randomly pick a word from the last words list.
 next_word = markov[current_word][next_index]
 sentence_data.append(next_word)
 current_word = next_word
 return ' '.join([i for i in sentence_data])
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 23, 2013 at 11:57
\$\endgroup\$

1 Answer 1

8
\$\begingroup\$
import random
def Markov(text_file):

Python convention is to name function lowercase_with_underscores. I'd also probably have this function take a string as input rather then a filename. That way this function doesn't make assumptions about where the data is coming from

 with open(text_file) as f: # provide a text-file to parse
 data = f.read()

data is a bit too generic. I'd call it text.

 data = [i for i in data.split(' ') if i != ''] # create a list of all words 
 data = [i.lower() for i in data if i.isalpha()] # i've been removing punctuation

Since ''.isalpha() == False, you could easily combine these two lines

 markov = {i:[] for i in data} # i create a dict with the words as keys and empty lists as values
 pos = 0
 while pos < len(data) - 1: # add a word to the word-key's list if it immediately follows that word
 markov[data[pos]].append(data[pos+1])
 pos += 1

Whenever possible, avoid iterating over indexes. In this case I'd use

 for before, after in zip(data, data[1:]):
 markov[before] += after

I think that's much clearer.

 new = {k:v for k,v in zip(range(len(markov)), [i for i in markov])} # create another dict for the seed to match up with 

[i for i in markov] can be written list(markov) and it produces a copy of the markov list. But there is no reason to making a copy here, so just pass markov directly.

zip(range(len(x)), x) can be written as enumerate(x)

{k:v for k,v in x} is the same as dict(x)

So that whole line can be written as

 new = dict(enumerate(markov))

But that's a strange construct to build. Since you are indexing with numbers, it'd make more sense to have a list. An equivalent list would be

 new = markov.keys()

Which gives you a list of the keys

 length_sentence = random.randint(15, 20) # create a random length for a sentence stopping point
 seed = random.randint(0, len(new) - 1) # randomly pick a starting point

Python has a function random.randrange such that random.randrange(x) = random.randint(0, x -1) It good to use that when selecting from a range of indexes like this

 sentence_data = [new[start_index]] # use that word as the first word and starting point
 current_word = new[start_index]

To select a random item from a list, use random.choice, so in this case I'd use

 current_word = random.choice(markov.keys())
 while len(sentence_data) < length_sentence:

Since you know how many iterations you'll need I'd use a for loop here.

 next_index = random.randint(0, len(markov[current_word]) - 1) # randomly pick a word from the last words list.
 next_word = markov[current_word][next_index]

Instead do next_word = random.choice(markov[current_word])

 sentence_data.append(next_word)
 current_word = next_word
 return ' '.join([i for i in sentence_data])

Again, no reason to be doing this i for i dance. Just use ' '.join(sentence_data)

answered Mar 23, 2013 at 17:08
\$\endgroup\$
2
  • 1
    \$\begingroup\$ thanks for taking the time to respond. Your markups will be very helpful. \$\endgroup\$ Commented Mar 23, 2013 at 18:02
  • 1
    \$\begingroup\$ It's a bit difficult to figure out which comment belongs to which code snippet (above or below?). Also sometimes I think you wanted to have two separate code snippets, but they were merged because there was no text in between. \$\endgroup\$ Commented Jun 2, 2015 at 11:59

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.