I made a program today that solves the following problem on a programming competition site (open.kattis.com): Check if all statements with the following format X is Y or X not Y are consistent. X and Y are words of the length 1-20 and a-z.
The catch is that every word that rhymes with X in any provided statement is said to be equivalent to X. For simplicity it is said that two words rhyme if the last min(3, |X|, |Y|) character are the same, so for example foo and boo would be the same word but not skoo and troo). Also words which do not appear in any statement do not exist, and should not be considered rhyming with anything. For example, the two words foo and moo do not rhyme with each other, unless the word oo or o exists in the input.
So to break it down, here are the rules:
- Check if all statements of the form "X is Y" or "X not Y" is consistent.
- Two words rhyme if the last min(3, |X|, |Y|) characters are the same. So if this condition holds for two words X and Y that are in the list of statements, it is equivalent to having a statement "X is Y".
- Words which do not appear in any statement do not exist, and should not be considered rhyming with anything
The first line consists of an integer 0≤N≤100000, the number of statements. The next N lines consists of statements of the two given forms.
Output should the string "yes", if the statements are consistent with each other or if there is a contradiction the output should be "wait what?".
The problem provides 4 sample inputs and corresponding outputs:
4
herp is derp
derp is herp
herp is herp
derp is derp
Output to the input above should be yes
.
3
oskar not lukas
oskar is poptart
lukas is smart
... should output wait what?
1
moo not foo
... should output yes
2
moo not foo
oo is blah
... should output wait what?
Here is my program that should hopefully be commented well enough to understand:
import sys
# read all input
indata = sys.stdin.readlines()
inlen = int(indata[0])
# return yes if the first line was 0
if inlen == 0:
print("yes")
exit(0)
Xs = []
Ys = []
equivalent = {}
words = []
# build lists of words in data
for i in range(inlen):
item = indata[i+1]
parts = item.split(" ")
word1 = parts[0].strip()
word2 = parts[2].strip()
Xs.append(word1)
Ys.append(word2)
words.append(word1)
words.append(word2)
# add word in equivalent list if not present
if word1 not in equivalent:
equivalent[word1] = [word1]
if word2 not in equivalent:
equivalent[word2] = [word2]
# check which words rhyme and add to equivalent list
for i in range(len(words)):
word1 = words[i].strip()
for j in range(len(words)):
word2 = words[j]
if word1 == word2:
continue
endinglen = min(3, len(word1), len(word2)) # min(3, |X|, |Y|)
ending1 = word1[-endinglen:]
ending2 = word2[-endinglen:]
if ending1 == ending2:
equivalent[word1].append(word2)
equivalent[word2].append(word1)
# check if any word in eq1 matches any word in eq2
def any_equals(eq1, eq2):
for i in eq1:
for j in eq2:
if i == j:
return True
return False
# build the statements data structure
statements = {}
for i in range(inlen):
item = indata[i+1]
parts = item.split(" ")
X = parts[0].strip()
b = True if parts[1] == "is" else False
Y = parts[2].strip()
if X not in statements:
statements[X] = []
# this tuiple is of the format (boolean, word) where boolean means if X is equal to or not equal to Y
statements[X].append((b, Y))
# check consistency of statements
for i in range(inlen):
item = indata[i+1]
parts = item.split(" ")
X = parts[0].strip()
Y = parts[2].strip()
for eq1 in equivalent[X]:
if eq1 not in statements:
continue
for key in statements[eq1]:
equals = any_equals(equivalent[X], equivalent[Y])
# we check if the statement is true for all words that rhyme with X
if (key[0] and equals) or (not key[0] and not equals):
continue
else:
print("wait what?") # the statement was not consistent
sys.exit(0)
print("yes") # all statements were consistent
I think due to the fact I have two nested loops the time complexity is O(n^2) but I think there is a O(n) algorithm but I can't figure it out. Any tips on how to improve my code to run faster would be much appreciated!
1 Answer 1
General remarks
Organizing your code into functions is a good thing. Unfortunately, you only have one function,
any_equals()
, and its definition is buried in the middle of a sea of sequential statements, which makes it weird and hard to find. As it turns out, your function# check if any word in eq1 matches any word in eq2 def any_equals(eq1, eq2): for i in eq1: for j in eq2: if i == j: return True return False
... would be better written as
def any_equals(eq1, eq2): """Check if any element of eq1 equals any element of eq2""" return not(set.isdisjoint(set(eq1), set(eq2)))
... with a docstring, and a data structure that takes advantage of hashing to avoid O(n2) performance. With an implementation that is so succinct, it might not be worth defining as a function after all.
So, basically, you have no functions in your code. It's hard to analyze eighty-something lines of code all at once without having it broken down into smaller self-contained chunks. There is also no way to get an overview of how your program works.
Code should be compartmentalized and composable. This is just a corollary to the first point. Say, hypothetically, that we no longer cared about rhyming. With a well designed program, you should be able to easily disable the code that implements rhyming. That's not the case with your code — your
equivalent
dictionary is mentioned all over the place.Trivial special cases aren't worth the trouble. Why bother writing this?
# return yes if the first line was 0 if inlen == 0: print("yes") exit(0)
It's a very unlikely case — basically dead code. Furthermore, the existence of the special case makes it hard for you to verify whether your code works in the general case.
A more interesting special case, if you wanted to code it, is that any ruleset with no "not" rules is automatically consistent.
Avoid parsing the input repeatedly. You have not just two, but three loops that look like this:
for i in range(inlen): item = indata[i+1] parts = item.split(" ") X = parts[0].strip() Y = parts[2].strip()
That means that you failed to parse the input into a good representation. You should be able to read the input in only one pass.
Algorithms and efficiency
Building the
equivalent
structure for rhymesThis is the code you use to read the rules and note any rhyming equivalence:
# build lists of words in data for i in range(inlen): item = indata[i+1] parts = item.split(" ") word1 = parts[0].strip() word2 = parts[2].strip() Xs.append(word1) Ys.append(word2) words.append(word1) words.append(word2) # add word in equivalent list if not present if word1 not in equivalent: equivalent[word1] = [word1] if word2 not in equivalent: equivalent[word2] = [word2] # check which words rhyme and add to equivalent list for i in range(len(words)): word1 = words[i].strip() for j in range(len(words)): word2 = words[j] if word1 == word2: continue endinglen = min(3, len(word1), len(word2)) # min(3, |X|, |Y|) ending1 = word1[-endinglen:] ending2 = word2[-endinglen:] if ending1 == ending2: equivalent[word1].append(word2) equivalent[word2].append(word1)
The result, I believe, is a dictionary, whose keys are every word encountered, and whose values are a list of words that rhyme with it. That's not what you really want, though. You want a dictionary that maps each word to the shortest word that rhymes with it. In other words, you want one canonical representation of each word; you don't want to expand each word into more alternative representations.
You can actually take a shortcut here. Only the last three letters of each word matter. You might as well immediately ignore the first few characters of any longer word. One way to accomplish would be using a regular expression match, instead of splitting on spaces.
Note that the second loop above, with
i
andj
, is O(N2).As a nitpick:
word1 = words[i].strip()
is not appropriate. The time for sanitizing the input is when you read it. You already stripped them earlier anyway. And if you had doneitem.split()
, it would have split on any whitespace, such that you wouldn't even have tostrip
anything.Building the
statements
structure# build the statements data structure statements = {} for i in range(inlen): item = indata[i+1] parts = item.split(" ") X = parts[0].strip() b = True if parts[1] == "is" else False Y = parts[2].strip() if X not in statements: statements[X] = [] # this tuiple is of the format (boolean, word) where boolean means if X is equal to or not equal to Y statements[X].append((b, Y))
statements[X].append((b, Y))
results in a weird data structure. The most important aspect of a statement is whether it is an "is" rule or a "not" rule, so I would classify the statements primarily according to that dichotomy. The order of the statements does not matter, nor does it matter whether each word appears on the left or right. So, it seems arbitrary thatstatements
is keyed byX
, when in factX
andY
should be treated symmetrically.Statement consistency check
# check consistency of statements for i in range(inlen): item = indata[i+1] parts = item.split(" ") X = parts[0].strip() Y = parts[2].strip() for eq1 in equivalent[X]: if eq1 not in statements: continue for key in statements[eq1]: equals = any_equals(equivalent[X], equivalent[Y]) # we check if the statement is true for all words that rhyme with X if (key[0] and equals) or (not key[0] and not equals): continue else: print("wait what?") # the statement was not consistent sys.exit(0)
The most important line here is
if (key[0] and equals) or (not key[0] and not equals)
— and unfortunately it is buried quite deep in the loops.The right strategy to use, I believe, is to transform the ruleset twice: once by rhyming equivalence, then by "is" equivalence. Then, examine the "not" rules, and see if any of them is of the form
X not X
, which is an indication of a contradiction.So, the consistency check should primarily iterate through the "not" rules. (This observation leads to the insight I pointed out above: that any ruleset without any "not" rules is automatically self-consistent.)
Suggested solution
I'd rewrite the code altogether. Instead of posting the solution here, I have posted it as a Code Review question.
Observe, in particular, that the main()
function is only five lines long, and gives an overview of the strategy:
def main(fileinput):
rules = read_rules(fileinput)
if rules['not']: # Ruleset without "not" rules must be self-consistent
rules = subst_rule_words(rules, rhyme_map(rules))
rules = subst_rule_words(rules, union_map(rules))
print('wait what?' if any(x == y for x, y in rules['not']) else 'yes')
Also note that the functions are reusable and composable. If I take out the line with rhyme_map()
, then all support for rhyming equivalence is dropped. If I take out the line with union_map()
, then the code will only detect self-inconsistencies within each independent statement.
-
\$\begingroup\$ I think a regex might be overkill here, if only because split and substring are probably significantly faster \$\endgroup\$Vogel612– Vogel6122017年03月04日 10:29:56 +00:00Commented Mar 4, 2017 at 10:29
Explore related questions
See similar questions with these tags.