I have fairly simple Python 3 code that creates a DFA from a string.
It works great, but I believe it can be made more efficient. There are a total of 3 loops:
- First iterates over the string
- Second iterates the alphabet the string can be composed from. In this scenario it is always length 26. The standard English alphabet.
- Third iterates over the string again
From what I believe, I can only break out of the third loop and only if a prefixed string is found.
My comments provide a good explanation of what I am doing in the code.
def create_from_text(self, text):
"""
Create/edit the DFA from the provided text
"""
# We start by editing the first state to accept the first character in
# the text provided. After that we check the next character and create
# a new state that goes to the 2nd character. Here we check if a
# character that does not go to the 2nd character will create a prefix
# of the text in combination with an any number of characters preceeding
# it that . e.g. (oopoops)
# oo -> o We want a `p` but got `o`.
# But we check if, starting with 2nd character, `o`,
# any combination after that, appended with the bad
# character creates a prefix of the text. The first
# found will always be the best match.
# `oo` is a prefix of `oopoops` so we can start back
# at index 2 since `oo` is of length 2. We mark the
# state we are on with that state at that character.
self.states[0].transitions[text[0]] = 1
for i in range(len(text)):
if i == 0:
continue
new_state = State(i)
for j in self.alphabet:
if j == text[i]:
# Easiest part, just set the transition to the next state
new_state.addTransition(i + 1, j)
else:
# Now do that check we spoke of earlier
found = False
for k in range(len(text)):
if k == 0:
continue
try:
if text.index(text[k:i] + j) == 0 and k <= i:
new_state.addTransition(len(text[k:i] + j), j)
found = True
break
except ValueError:
pass
if not found:
new_state.addTransition(0, j)
self.states.append(new_state)
# Make every transition in the last state point to itself
last_state = State(len(self.states))
for j in self.alphabet:
last_state.addTransition(len(self.states), j)
self.states.append(last_state)
A sample output for string oopoops
:
Number of states: 1
Accepting states: 0
Alphabet: abcdefghijklmnopqrstuvwxyz
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 6 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 7 0 0 0 0 0 0 0
7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
Run time on a string of 2048 characters was about 5 minutes.
2 Answers 2
Constructors should be class methods
Your method suggests, that an instance of the respective class is being created using the given text.
Thus you should make it a classmethod
:
@classmethod
def create_from_text(cls, text):
...
Comments vs. docstrings
Your multi-line comment describes the underlying algorithm. You should summarize it in the method's docstring and move the complete description to your respective documentation docs.
Useful comments
This is useless:
# Easiest part, just set the transition to the next state
# Now do that check we spoke of earlier
while that may be helpful:
# Make every transition in the last state point to itself
Loop variables
While the i
in for i in range(len(text))
is pretty self-explanatory, the j
in for j in self.alphabet:
is not.
What is j
? Another index? A word? A character?
Consider renaming j
to index
, word
or char
respectively.
PEP 8
You follow it consistently with the exception of self.addTransition
.
Iteration over strings.
Instead of
for i in range(len(text)):
...
if j == text[i]:
Use enumerate:
for index, char in enumerate(text):
...
if j == char:
- Use
for i in range(1, len(text))
and avoid theif i == 0: continue
- Use
for k in range(1, i + 1)
instead of having thek <= i
condition inside the loop. - Use
if text.startswith(text[k:i] + j)
instead ofif text.index(text[k:i] + j) == 0
to avoid unnecessary searching. Put an
else
clause on yourfor k
loop to avoid thefound
variable:for k in range(1, i + 1): if text.startswith(text[k:i] + j): new_state.addTransition(len(text[k:i] + j), j) break else: new_state.addTransition(0, j)
Explore related questions
See similar questions with these tags.
{k:{(v,k+1)} for (k,v) in enumerate("abcd")}
0 is the start state and 4 the only accepting state. \$\endgroup\$