DFA creation from string

Question 1

I have fairly simple Python 3 code that creates a DFA from a string.

It works great, but I believe it can be made more efficient. There are a total of 3 loops:

First iterates over the string
Second iterates the alphabet the string can be composed from. In this scenario it is always length 26. The standard English alphabet.
Third iterates over the string again

From what I believe, I can only break out of the third loop and only if a prefixed string is found.

My comments provide a good explanation of what I am doing in the code.

def create_from_text(self, text):
 """
 Create/edit the DFA from the provided text
 """
 # We start by editing the first state to accept the first character in
 # the text provided. After that we check the next character and create
 # a new state that goes to the 2nd character. Here we check if a
 # character that does not go to the 2nd character will create a prefix
 # of the text in combination with an any number of characters preceeding
 # it that . e.g. (oopoops)
 # oo -> o We want a `p` but got `o`.
 # But we check if, starting with 2nd character, `o`,
 # any combination after that, appended with the bad
 # character creates a prefix of the text. The first
 # found will always be the best match.
 # `oo` is a prefix of `oopoops` so we can start back
 # at index 2 since `oo` is of length 2. We mark the
 # state we are on with that state at that character.
 self.states[0].transitions[text[0]] = 1
 for i in range(len(text)):
 if i == 0:
 continue
 new_state = State(i)
 for j in self.alphabet:
 if j == text[i]:
 # Easiest part, just set the transition to the next state
 new_state.addTransition(i + 1, j)
 else:
 # Now do that check we spoke of earlier
 found = False
 for k in range(len(text)):
 if k == 0:
 continue
 try:
 if text.index(text[k:i] + j) == 0 and k <= i:
 new_state.addTransition(len(text[k:i] + j), j)
 found = True
 break
 except ValueError:
 pass
 if not found:
 new_state.addTransition(0, j)
 self.states.append(new_state)
 # Make every transition in the last state point to itself
 last_state = State(len(self.states))
 for j in self.alphabet:
 last_state.addTransition(len(self.states), j)
 self.states.append(last_state)

A sample output for string oopoops:

Number of states: 1
Accepting states: 0
Alphabet: abcdefghijklmnopqrstuvwxyz
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 6 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 7 0 0 0 0 0 0 0 
7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7

Run time on a string of 2048 characters was about 5 minutes.

Question 2

I am not sure I understand. What language does the input string define?

Question 3

The language is defined. The input string is contained by that language. Alphabet in the output code is the defined language.

Question 4

A lot of the code you are using is missing. But to generate a DFA from a string, you don't need all that logic. The following code would generate a DFA (which is just a transition map, really) for a string: {k:{(v,k+1)} for (k,v) in enumerate("abcd")} 0 is the start state and 4 the only accepting state.

Question 5

Constructors should be class methods

Your method suggests, that an instance of the respective class is being created using the given text. Thus you should make it a classmethod:

@classmethod
def create_from_text(cls, text):
 ...

Comments vs. docstrings

Your multi-line comment describes the underlying algorithm. You should summarize it in the method's docstring and move the complete description to your respective documentation docs.

Useful comments

This is useless:

# Easiest part, just set the transition to the next state
# Now do that check we spoke of earlier

while that may be helpful:

# Make every transition in the last state point to itself

Loop variables

While the i in for i in range(len(text)) is pretty self-explanatory, the j in for j in self.alphabet: is not.
What is j? Another index? A word? A character?
Consider renaming j to index, word or char respectively.

PEP 8

You follow it consistently with the exception of self.addTransition.

Iteration over strings.

Instead of

for i in range(len(text)):
 ...
 if j == text[i]:

Use enumerate:

for index, char in enumerate(text):
 ...
 if j == char:

Question 6

Use for i in range(1, len(text)) and avoid the if i == 0: continue
Use for k in range(1, i + 1) instead of having the k <= i condition inside the loop.
Use if text.startswith(text[k:i] + j) instead of if text.index(text[k:i] + j) == 0 to avoid unnecessary searching.

Put an else clause on your for k loop to avoid the found variable:

for k in range(1, i + 1):
 if text.startswith(text[k:i] + j):
 new_state.addTransition(len(text[k:i] + j), j)
 break
else:
 new_state.addTransition(0, j)

Richard Neumann Richard Neumann 6,42122 silver badges41 bronze badges · Answer 1 · 2017-11-13 10:58:29Z

Constructors should be class methods

Your method suggests, that an instance of the respective class is being created using the given text. Thus you should make it a classmethod:

@classmethod
def create_from_text(cls, text):
 ...

Comments vs. docstrings

Your multi-line comment describes the underlying algorithm. You should summarize it in the method's docstring and move the complete description to your respective documentation docs.

Useful comments

This is useless:

# Easiest part, just set the transition to the next state
# Now do that check we spoke of earlier

while that may be helpful:

# Make every transition in the last state point to itself

Loop variables

While the i in for i in range(len(text)) is pretty self-explanatory, the j in for j in self.alphabet: is not.
What is j? Another index? A word? A character?
Consider renaming j to index, word or char respectively.

PEP 8

You follow it consistently with the exception of self.addTransition.

Iteration over strings.

Instead of

for i in range(len(text)):
 ...
 if j == text[i]:

Use enumerate:

for index, char in enumerate(text):
 ...
 if j == char:

Janne Karila Janne Karila 10.6k21 silver badges34 bronze badges · Answer 2 · 2017-11-13 20:14:15Z

Use for i in range(1, len(text)) and avoid the if i == 0: continue
Use for k in range(1, i + 1) instead of having the k <= i condition inside the loop.
Use if text.startswith(text[k:i] + j) instead of if text.index(text[k:i] + j) == 0 to avoid unnecessary searching.

Put an else clause on your for k loop to avoid the found variable:

for k in range(1, i + 1):
 if text.startswith(text[k:i] + j):
 new_state.addTransition(len(text[k:i] + j), j)
 break
else:
 new_state.addTransition(0, j)

Stack Exchange Network

DFA creation from string

2 Answers 2

Constructors should be class methods

Comments vs. docstrings

Useful comments

Loop variables

PEP 8

Iteration over strings.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

DFA creation from string

2 Answers 2

Constructors should be class methods

Comments vs. docstrings

Useful comments

Loop variables

PEP 8

Iteration over strings.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions