Implementation of a Markov Chain

Question 1

I read about how markov-chains were handy at creating text-generators and wanted to give it a try in python.

I'm not sure if this is the proper way to make a markov-chain. I've left comments in the code. Any feedback would be appreciated.

import random
def Markov(text_file):
 with open(text_file) as f: # provide a text-file to parse
 data = f.read()
 data = [i for i in data.split(' ') if i != ''] # create a list of all words 
 data = [i.lower() for i in data if i.isalpha()] # i've been removing punctuation
 markov = {i:[] for i in data} # i create a dict with the words as keys and empty lists as values
 pos = 0
 while pos < len(data) - 1: # add a word to the word-key's list if it immediately follows that word
 markov[data[pos]].append(data[pos+1])
 pos += 1
 new = {k:v for k,v in zip(range(len(markov)), [i for i in markov])} # create another dict for the seed to match up with 
 length_sentence = random.randint(15, 20) # create a random length for a sentence stopping point
 seed = random.randint(0, len(new) - 1) # randomly pick a starting point
 sentence_data = [new[start_index]] # use that word as the first word and starting point
 current_word = new[start_index]
 while len(sentence_data) < length_sentence:
 next_index = random.randint(0, len(markov[current_word]) - 1) # randomly pick a word from the last words list.
 next_word = markov[current_word][next_index]
 sentence_data.append(next_word)
 current_word = next_word
 return ' '.join([i for i in sentence_data])

Question 2

import random
def Markov(text_file):

Python convention is to name function lowercase_with_underscores. I'd also probably have this function take a string as input rather then a filename. That way this function doesn't make assumptions about where the data is coming from

 with open(text_file) as f: # provide a text-file to parse
 data = f.read()

data is a bit too generic. I'd call it text.

 data = [i for i in data.split(' ') if i != ''] # create a list of all words 
 data = [i.lower() for i in data if i.isalpha()] # i've been removing punctuation

Since ''.isalpha() == False, you could easily combine these two lines

 markov = {i:[] for i in data} # i create a dict with the words as keys and empty lists as values
 pos = 0
 while pos < len(data) - 1: # add a word to the word-key's list if it immediately follows that word
 markov[data[pos]].append(data[pos+1])
 pos += 1

Whenever possible, avoid iterating over indexes. In this case I'd use

 for before, after in zip(data, data[1:]):
 markov[before] += after

I think that's much clearer.

 new = {k:v for k,v in zip(range(len(markov)), [i for i in markov])} # create another dict for the seed to match up with

[i for i in markov] can be written list(markov) and it produces a copy of the markov list. But there is no reason to making a copy here, so just pass markov directly.

zip(range(len(x)), x) can be written as enumerate(x)

{k:v for k,v in x} is the same as dict(x)

So that whole line can be written as

 new = dict(enumerate(markov))

But that's a strange construct to build. Since you are indexing with numbers, it'd make more sense to have a list. An equivalent list would be

 new = markov.keys()

Which gives you a list of the keys

 length_sentence = random.randint(15, 20) # create a random length for a sentence stopping point
 seed = random.randint(0, len(new) - 1) # randomly pick a starting point

Python has a function random.randrange such that random.randrange(x) = random.randint(0, x -1) It good to use that when selecting from a range of indexes like this

 sentence_data = [new[start_index]] # use that word as the first word and starting point
 current_word = new[start_index]

To select a random item from a list, use random.choice, so in this case I'd use

 current_word = random.choice(markov.keys())
 while len(sentence_data) < length_sentence:

Since you know how many iterations you'll need I'd use a for loop here.

 next_index = random.randint(0, len(markov[current_word]) - 1) # randomly pick a word from the last words list.
 next_word = markov[current_word][next_index]

Instead do next_word = random.choice(markov[current_word])

 sentence_data.append(next_word)
 current_word = next_word
 return ' '.join([i for i in sentence_data])

Again, no reason to be doing this i for i dance. Just use ' '.join(sentence_data)

Question 3

thanks for taking the time to respond. Your markups will be very helpful.

Question 4

It's a bit difficult to figure out which comment belongs to which code snippet (above or below?). Also sometimes I think you wanted to have two separate code snippets, but they were merged because there was no text in between.

Winston Ewert Winston Ewert 30.7k4 gold badges52 silver badges79 bronze badges · Accepted Answer · 2013-03-23 17:08:12Z

import random
def Markov(text_file):

Python convention is to name function lowercase_with_underscores. I'd also probably have this function take a string as input rather then a filename. That way this function doesn't make assumptions about where the data is coming from

 with open(text_file) as f: # provide a text-file to parse
 data = f.read()

data is a bit too generic. I'd call it text.

 data = [i for i in data.split(' ') if i != ''] # create a list of all words 
 data = [i.lower() for i in data if i.isalpha()] # i've been removing punctuation

Since ''.isalpha() == False, you could easily combine these two lines

 markov = {i:[] for i in data} # i create a dict with the words as keys and empty lists as values
 pos = 0
 while pos < len(data) - 1: # add a word to the word-key's list if it immediately follows that word
 markov[data[pos]].append(data[pos+1])
 pos += 1

Whenever possible, avoid iterating over indexes. In this case I'd use

 for before, after in zip(data, data[1:]):
 markov[before] += after

I think that's much clearer.

 new = {k:v for k,v in zip(range(len(markov)), [i for i in markov])} # create another dict for the seed to match up with

[i for i in markov] can be written list(markov) and it produces a copy of the markov list. But there is no reason to making a copy here, so just pass markov directly.

zip(range(len(x)), x) can be written as enumerate(x)

{k:v for k,v in x} is the same as dict(x)

So that whole line can be written as

 new = dict(enumerate(markov))

But that's a strange construct to build. Since you are indexing with numbers, it'd make more sense to have a list. An equivalent list would be

 new = markov.keys()

Which gives you a list of the keys

 length_sentence = random.randint(15, 20) # create a random length for a sentence stopping point
 seed = random.randint(0, len(new) - 1) # randomly pick a starting point

Python has a function random.randrange such that random.randrange(x) = random.randint(0, x -1) It good to use that when selecting from a range of indexes like this

 sentence_data = [new[start_index]] # use that word as the first word and starting point
 current_word = new[start_index]

To select a random item from a list, use random.choice, so in this case I'd use

 current_word = random.choice(markov.keys())
 while len(sentence_data) < length_sentence:

Since you know how many iterations you'll need I'd use a for loop here.

 next_index = random.randint(0, len(markov[current_word]) - 1) # randomly pick a word from the last words list.
 next_word = markov[current_word][next_index]

Instead do next_word = random.choice(markov[current_word])

 sentence_data.append(next_word)
 current_word = next_word
 return ' '.join([i for i in sentence_data])

Again, no reason to be doing this i for i dance. Just use ' '.join(sentence_data)

thanks for taking the time to respond. Your markups will be very helpful.
It's a bit difficult to figure out which comment belongs to which code snippet (above or below?). Also sometimes I think you wanted to have two separate code snippets, but they were merged because there was no text in between.

Stack Exchange Network

Implementation of a Markov Chain

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Implementation of a Markov Chain

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions