2
\$\begingroup\$

I am working my way through some code challenges — partly to improve my problem solving, but also to improve my code quality.

I think my current (fully-functional) solution to a challenge is pretty solid, and fairly understandable. I would love to know, however, if there is anything in my style which could be improved.

Readers are encouraged to be opinionated, but to clearly-demarcate what is purely personal preference from what is a preference, but also strongly advised.

Good morning! Here's your coding interview problem for today.

This problem was asked by Facebook.

Implement regular expression matching with the following special characters:

. (period) which matches any single character
* (asterisk) which matches zero or more of the preceding element

That is, implement a function that takes in a string and a valid regular expression and returns whether or not the string matches the regular expression.

For example, given the regular expression "ra." and the string "ray", your function should return true. The same regular expression on the string "Raymond" should return false.

Given the regular expression ".\*at" and the string "chat", your function should return true. The same regular expression on the string "chats" should return false.

# solution.py
from collections import deque
import logging
from typing import Deque, NamedTuple, Optional, Union
logging.basicConfig(level=logging.INFO,format="%(asctime)s - %(levelname)s - %(name)s - %(funcName)s - %(lineno)d : %(message)s")
class Attempt(NamedTuple):
 matched_string : str
 remaining_string : str
 remaining_pattern : str
class DecisionPoint(NamedTuple):
 greedy : Attempt
 non_greedy : Attempt
def advance_attempt(attempt: Attempt, greedy = False) -> Attempt:
 char = attempt.remaining_pattern[0]
 if greedy and char!= '*':
 raise RuntimeError("Can't perform a greedy advancement on a non wildcard char.")
 new_matched_string = attempt.matched_string + char
 if greedy:
 new_remaining_pattern = attempt.remaining_pattern
 else:
 new_remaining_pattern = attempt.remaining_pattern[1:]
 new_remaining_string = attempt.remaining_string[1:]
 return Attempt(matched_string=new_matched_string,
 remaining_pattern=new_remaining_pattern,
 remaining_string=new_remaining_string)
def next_char_is_match(next_char : str, string : str) -> bool:
 """Returns true if the supplied next_char in the pattern matches the 
 head of the string. Next char is expected to be a single char or an 
 empty string."""
 if next_char == '':
 return False
 if next_char == '*':
 return True
 if next_char == '.' and len(string) != 0:
 return True
 return next_char == string[0]
 
def take_next_greedy(attempt: Attempt) -> Attempt:
 return advance_attempt(attempt=attempt,greedy=True)
def take_next_non_greedy(attempt : Attempt) -> Attempt:
 return advance_attempt(attempt)
def take_next(attempt : Attempt) -> Optional[Union[DecisionPoint,Attempt]]:
 """Returns None if no next match is possible; otherwise, returns either 
 a single progression of the attempt — i.e., a single character moved into
 the matched field from the remaining field — or, in the case of a wildcard,
 both the greedy and non-greedy next possibilities."""
 if (len(attempt.remaining_string) == 0 \
 or len(attempt.remaining_pattern) == 0):
 return None
 next_char_to_match_in_pattern = attempt.remaining_pattern[0]
 next_char_to_match_in_string = attempt.remaining_string[0]
 
 if next_char_to_match_in_pattern == '*':
 return DecisionPoint(greedy=take_next_greedy(attempt),
 non_greedy=take_next_non_greedy(attempt))
 if (next_char_to_match_in_pattern == next_char_to_match_in_string\
 or next_char_to_match_in_pattern == '.'):
 return advance_attempt(attempt)
 
 
 
def main_loop(queue : Deque[Attempt]) -> bool:
 logging.debug(queue)
 logging.info(len(queue))
 attempt = queue.popleft()
 logging.info(attempt)
 if attempt.remaining_pattern == ''\
 and attempt.remaining_string == '':
 return True
 next_step = take_next(attempt)
 if next_step is None:
 return False
 elif isinstance(next_step,DecisionPoint):
 queue.appendleft(next_step.non_greedy)
 queue.appendleft(next_step.greedy)
 return False
 else:
 queue.appendleft(next_step)
 return False
def main(pattern : str, string : str) -> bool:
 q = deque()
 q.append(Attempt(matched_string='',
 remaining_string=string,
 remaining_pattern=pattern))
 match_found = False
 while len(q) != 0 and not match_found:
 match_found = main_loop(q)
 
 logging.info(match_found) 
 return match_found
if __name__ == '__main__':
 main('.*at*.rq','chatsdafrzafafrq')
 
 

# tests.py
import unittest
from .solution import main
class BasicTests(unittest.TestCase):
 def test_given_which_is_expected_to_match_matches(self):
 computed_result = main('ra.','ray')
 self.assertTrue(computed_result)
 def test_given_which_is_expected_to_not_match_does_not_match(self):
 computed_result = main('ra.','raymond')
 self.assertFalse(computed_result)
 def test_other_given_which_is_expected_to_match_matches(self):
 computed_result = main('.*at','chat')
 self.assertTrue(computed_result)
 
 def test_complex_which_is_expected_to_match_matches(self):
 computed_result = main('.*jk*.wee*.weq','jkhkh;llkjkljklkjljl;weeklsfdjdsfj;weq')
 self.assertTrue(computed_result)
 def test_complex_which_is_expected_not_to_match_does_not_match(self):
 computed_result = main('.*jk*.wee*.weqr','jkhkh;llkjkljklkjljl;weeklsfdjdsfj;weqtjhuafjs')
 self.assertFalse(computed_result)
if __name__ == '__main__':
 unittest.main()
Sᴀᴍ Onᴇᴌᴀ
29.5k16 gold badges45 silver badges202 bronze badges
asked Feb 4, 2021 at 23:15
\$\endgroup\$
1

4 Answers 4

3
\$\begingroup\$

PEP 8

The Style Guide for Python Code enumerates several conventions for Python programs

Imports

All import ... statements should precede from ... import ...

Reading through the PEP 8 documentation, I'm not seeing this, but my pylinter complains if this ordering is not obeyed.

Indentation

Use 4 spaces per indentation level.

Functions like advance_attempt are indented too far.

Code Organization

  • Imports
  • Classes & function definitions
  • Code

Your logging.basicConfig(level=logging.INFO, ...) statement falls under the "Code" category, and should be after the class/function definitions.

Moreover, it is EXECUTED when the module is imported. It should be inside the main-guard, to prevent unexpected side-effects.

Spaces

Comma

logging.basicConfig(level=logging.INFO,format="...

There should be a space after every comma.

Binary operators

if greedy and char!= '*':

There should be a space on both sides of the binary operator !=.

Type Hints

def take_next_non_greedy(attempt : Attempt) -> Attempt:

There should be no space before the :

Unnecessary line continuations

 if (len(attempt.remaining_string) == 0 \
 or len(attempt.remaining_pattern) == 0):
 return None

There is no need for the \, since the expression is inside parenthesis.

Broken implementation

Why is '.*jk*.wee*.weq' expected to match 'jkhkh;llkjkljklkjljl;weeklsfdjdsfj;weq'?

That pattern is:

  • .* any number of any character
  • j a single j
  • k* 0 or more k's
  • . any character
  • we two explicit characters
  • e* any number of e's
  • . any character
  • weq three explicit characters

The given string does not match that pattern. The test asserts that it should match. If the pattern was written with .* instead of *. patterns, I could actually see the match succeeding. But it wasn't. Assuming a match is declared (I haven't run the code), then the implementation must be broken.

answered Feb 5, 2021 at 1:49
\$\endgroup\$
1
  • \$\begingroup\$ Appreciated! I was already aware of PEP8, but I didn't realise that it recommended putting all imports .... before from ... import ... Broken Implementation: I think I actually misread the brief: I thought that * matches any character, not that is matched any number of the preceding character. I'm not sure why I didn't realise that, given that that is how actual regex definitions work. \$\endgroup\$ Commented Feb 5, 2021 at 9:14
1
\$\begingroup\$

For completeness, here is an updated answer which incorporates the feedback:

# solution.py
from __future__ import annotations
import logging
from collections import deque
from typing import Deque, List,NamedTuple, Optional, Union
class Attempt(NamedTuple):
 matched_string : str
 remaining_string : str
 remaining_tokens : List[Token]
class DecisionPoint(NamedTuple):
 greedy : Attempt
 non_greedy : Attempt
class Token(NamedTuple):
 character : str
 is_wildcarded : bool = False
def tokenize(pattern: str) -> List[Token]:
 tokens = []
 while pattern:
 if len(pattern) == 1:
 tokens.append(Token(character=pattern))
 pattern = ''
 else:
 head = pattern[0]
 pattern = pattern[1:]
 if pattern[0] == '*':
 pattern = pattern[1:]
 tokens.append(Token(character=head,
 is_wildcarded=True))
 else:
 tokens.append(Token(character=head))
 return tokens
def advance_attempt(attempt: Attempt, greedy: bool = False) -> Attempt:
 next_token = attempt.remaining_tokens[0]
 char = next_token.character
 new_matched_string = attempt.matched_string + char
 if greedy:
 new_remaining_tokens = attempt.remaining_tokens
 else:
 new_remaining_tokens = attempt.remaining_tokens[1:]
 new_remaining_string = attempt.remaining_string[1:]
 return Attempt(matched_string=new_matched_string,
 remaining_tokens=new_remaining_tokens,
 remaining_string=new_remaining_string)
def next_token_is_match(next_token: Token, string: str) -> bool:
 """Returns true if the supplied next_char in the pattern matches the 
 head of the string. Next char is expected to be a single char or an 
 empty string."""
 if next_token.character == '.':
 return True
 return next_token.character == string[0]
 
def take_next_greedy(attempt: Attempt) -> Attempt:
 return advance_attempt(attempt=attempt,greedy=True)
def take_next_non_greedy(attempt: Attempt) -> Attempt:
 return advance_attempt(attempt)
def take_next(attempt : Attempt) -> Optional[Union[DecisionPoint,Attempt]]:
 """Returns None if no next match is possible; otherwise, returns either 
 a single progression of the attempt — i.e., a single character moved into
 the matched field from the remaining field — or, in the case of a wildcard,
 both the greedy and non-greedy next possibilities."""
 if (len(attempt.remaining_string) == 0 
 or len(attempt.remaining_tokens) == 0):
 return None
 next_token_to_match_in_pattern = attempt.remaining_tokens[0]
 next_char_to_match_in_string = attempt.remaining_string[0]
 
 if next_token_to_match_in_pattern.is_wildcarded:
 return DecisionPoint(greedy=take_next_greedy(attempt),
 non_greedy=take_next_non_greedy(attempt))
 if (next_token_to_match_in_pattern.character == next_char_to_match_in_string
 or next_token_to_match_in_pattern.character == '.'):
 return advance_attempt(attempt)
 
 
 
def main_loop(queue: Deque[Attempt]) -> bool:
 logging.debug(queue)
 logging.info(len(queue))
 attempt = queue.popleft()
 logging.info(attempt)
 if (attempt.remaining_tokens == []
 and attempt.remaining_string == ''):
 return True
 next_step = take_next(attempt)
 if next_step is None:
 return False
 elif isinstance(next_step,DecisionPoint):
 queue.appendleft(next_step.non_greedy)
 queue.appendleft(next_step.greedy)
 return False
 else:
 queue.appendleft(next_step)
 return False
def main(pattern: str, string : str) -> bool:
 q = deque()
 q.append(Attempt(matched_string='',
 remaining_string=string,
 remaining_tokens=tokenize(pattern)))
 match_found = False
 while len(q) != 0 and not match_found:
 match_found = main_loop(q)
 
 logging.info(match_found) 
 return match_found
logging.basicConfig(level=logging.INFO,
 format="%(asctime)s - %(levelname)s - %(name)s - %(funcName)s\
 - %(lineno)d : %(message)s")
if __name__ == '__main__':
 main('.*at.*rq','chatsdatfrzafafrq')
# tests.py
import unittest
from .solution import main, tokenize, Token
class TestTokenize(unittest.TestCase):
 def test_basic(self):
 test_pattern = 'a*.d.*'
 expected_result = [Token('a',True),
 Token('.'),
 Token('d'),
 Token('.',True)
 ]
 computed_result = tokenize(test_pattern)
 self.assertEqual(expected_result,computed_result)
class TestSolution(unittest.TestCase):
 def test_given_which_is_expected_to_match_matches(self):
 computed_result = main('ra.','ray')
 self.assertTrue(computed_result)
 def test_given_which_is_expected_to_not_match_does_not_match(self):
 computed_result = main('ra.','raymond')
 self.assertFalse(computed_result)
 def test_other_given_which_is_expected_to_match_matches(self):
 computed_result = main('.*at','chat')
 self.assertTrue(computed_result)
 
 def test_complex_which_is_expected_to_match_matches(self):
 computed_result = main('.*jk.*wee*.*weq','jkhkh;llkjkljklkjljl;weeklsfdjdsfj;weq')
 self.assertTrue(computed_result)
 def test_complex_which_is_expected_not_to_match_does_not_match(self):
 computed_result = main('.*jk*.wee.*weqr','jkhkh;llkjkljklkjljl;weeklsfdjdsfj;weqtjhuafjs')
 self.assertFalse(computed_result)
if __name__ == '__main__':
 unittest.main()
answered Feb 5, 2021 at 22:16
\$\endgroup\$
1
  • 2
    \$\begingroup\$ FYI the updated implementation doesn't implement * matching 0 of the preceding character. For example, main('.*chat','chat') returns False when it should return True. \$\endgroup\$ Commented Feb 6, 2021 at 2:00
1
\$\begingroup\$

This has been very illuminating in that it highlights that even a fairly simple brief can have all kinds of corner cases which must be considered in testing.

Hopefully, this solution is the one final one.

# solution.py
from __future__ import annotations
import logging
from collections import deque
from typing import Deque, List, NamedTuple, Optional, Union
class Attempt(NamedTuple):
 matched_string: str
 remaining_string: str
 remaining_tokens: List[Token]
class DecisionPoint(NamedTuple):
 greedy: Attempt
 non_greedy: Attempt
class Token(NamedTuple):
 character: str
 is_starred: bool = False
def tokenize(pattern: str) -> List[Token]:
 tokens = []
 while pattern:
 if len(pattern) == 1:
 tokens.append(Token(character=pattern))
 pattern = ''
 else:
 head = pattern[0]
 pattern = pattern[1:]
 if pattern[0] == '*':
 pattern = pattern[1:]
 tokens.append(Token(character=head,
 is_starred=True))
 else:
 tokens.append(Token(character=head))
 return tokens
def advance_attempt_starred(attempt: Attempt, greedy: bool = False) -> Attempt:
 next_token = attempt.remaining_tokens[0]
 next_token_char = next_token.character
 new_matched_string = attempt.matched_string + next_token_char
 if greedy:
 new_remaining_tokens = attempt.remaining_tokens
 new_remaining_string = attempt.remaining_string[1:]
 else:
 new_remaining_tokens = attempt.remaining_tokens[1:]
 new_remaining_string = attempt.remaining_string
 return Attempt(matched_string=new_matched_string,
 remaining_tokens=new_remaining_tokens,
 remaining_string=new_remaining_string)
def advance_attempt(attempt: Attempt, greedy: bool = False) -> Attempt:
 next_token = attempt.remaining_tokens[0]
 next_token_char = next_token.character
 new_matched_string = attempt.matched_string + next_token_char
 new_remaining_tokens = attempt.remaining_tokens[1:]
 new_remaining_string = attempt.remaining_string[1:]
 return Attempt(matched_string=new_matched_string,
 remaining_tokens=new_remaining_tokens,
 remaining_string=new_remaining_string)
def take_next_greedy(attempt: Attempt) -> Attempt:
 return advance_attempt_starred(attempt, greedy=True)
def take_next_non_greedy(attempt: Attempt, starred: bool = False) -> Attempt:
 if starred:
 return advance_attempt_starred(attempt)
 return advance_attempt(attempt)
def take_next(attempt: Attempt) -> Optional[Union[DecisionPoint, Attempt]]:
 """Returns None if no next match is possible; otherwise, returns either 
 a single progression of the attempt — i.e., a single character moved into
 the matched field from the remaining field — or, in the case of a wildcard,
 both the greedy and non-greedy next possibilities."""
 if (len(attempt.remaining_string) == 0
 or len(attempt.remaining_tokens) == 0):
 return None
 next_token_to_match_in_pattern = attempt.remaining_tokens[0]
 next_char_to_match_in_string = attempt.remaining_string[0]
 if next_token_to_match_in_pattern.is_starred:
 return DecisionPoint(greedy=take_next_greedy(attempt),
 non_greedy=take_next_non_greedy(attempt,
 starred=True))
 if (next_token_to_match_in_pattern.character == next_char_to_match_in_string
 or next_token_to_match_in_pattern.character == '.'):
 return take_next_non_greedy(attempt)
def main_loop(queue: Deque[Attempt]) -> bool:
 logging.debug(queue)
 logging.info(len(queue))
 attempt = queue.popleft()
 logging.info(attempt)
 if (attempt.remaining_tokens == []
 and attempt.remaining_string == ''):
 return True
 next_step = take_next(attempt)
 if next_step is None:
 return False
 elif isinstance(next_step, DecisionPoint):
 queue.appendleft(next_step.non_greedy)
 queue.appendleft(next_step.greedy)
 return False
 else:
 queue.appendleft(next_step)
 return False
def main(pattern: str, string: str) -> bool:
 q = deque()
 q.append(Attempt(matched_string='',
 remaining_string=string,
 remaining_tokens=tokenize(pattern)))
 match_found = False
 while len(q) != 0 and not match_found:
 match_found = main_loop(q)
 logging.info(match_found)
 return match_found
logging.basicConfig(level=logging.INFO,
 format="%(asctime)s - %(levelname)s - %(name)s - %(funcName)s\
 - %(lineno)d : %(message)s")
if __name__ == '__main__':
 main('.*at.*rq', 'chatsdatfrzafafrq')
# solution.py
import unittest
from .solution import main, tokenize, Token
class TestTokenize(unittest.TestCase):
 def test_basic(self):
 test_pattern = 'a*.d.*'
 expected_result = [Token('a',True),
 Token('.'),
 Token('d'),
 Token('.',True)
 ]
 computed_result = tokenize(test_pattern)
 self.assertEqual(expected_result,computed_result)
class TestSolution(unittest.TestCase):
 def test_given_which_is_expected_to_match_matches(self):
 computed_result = main('ra.','ray')
 self.assertTrue(computed_result)
 def test_given_which_is_expected_to_not_match_does_not_match(self):
 computed_result = main('ra.','raymond')
 self.assertFalse(computed_result)
 def test_other_given_which_is_expected_to_match_matches(self):
 computed_result = main('.*at','chat')
 self.assertTrue(computed_result)
 
 def test_complex_which_is_expected_to_match_matches(self):
 computed_result = main('.*jk.*wee*.*weq','jkhkh;llkjkljklkjljl;weeklsfdjdsfj;weq')
 self.assertTrue(computed_result)
 def test_complex_which_is_expected_not_to_match_does_not_match(self):
 computed_result = main('.*jk*.wee.*weqr','jkhkh;llkjkljklkjljl;weeklsfdjdsfj;weqtjhuafjs')
 self.assertFalse(computed_result)
 def test_starred_consumes_none_or_many(self):
 computed_result = main('.*chat','chat')
 self.assertTrue(computed_result)
if __name__ == '__main__':
 unittest.main()
answered Feb 13, 2021 at 22:48
\$\endgroup\$
1
  • 1
    \$\begingroup\$ This is very close, I think you just need to cover one more test case: main('.*', '') being True and then it should be complete. \$\endgroup\$ Commented Feb 13, 2021 at 23:14
1
\$\begingroup\$

There are a few bugs that are still there starting from the original implementation, so I'm posting this as a late review because this won't fit in a comment. The shape of Attempt has changed from the original with the most recent updates from @Moye, but the root causes of the bugs are essentially the same.

Let's temporarily add some debugging print statements to take_next:

def take_next(attempt: Attempt) -> Optional[Union[DecisionPoint, Attempt]]:
 """Returns None if no next match is possible; otherwise, returns either 
 a single progression of the attempt — i.e., a single character moved into
 the matched field from the remaining field — or, in the case of a wildcard,
 both the greedy and non-greedy next possibilities."""
 print(attempt)
 if (len(attempt.remaining_string) == 0
 or len(attempt.remaining_tokens) == 0):
 print(" No next steps.")
 return None
 next_token_to_match_in_pattern = attempt.remaining_tokens[0]
 next_char_to_match_in_string = attempt.remaining_string[0]
 if next_token_to_match_in_pattern.is_starred:
 greedy = take_next_greedy(attempt)
 non_greedy = take_next_non_greedy(attempt, starred=True)
 print(f" greedy --> {greedy}")
 print(f" non-greedy --> {non_greedy}")
 return DecisionPoint(greedy=greedy, non_greedy=non_greedy)
 if (next_token_to_match_in_pattern.character == next_char_to_match_in_string
 or next_token_to_match_in_pattern.character == '.'):
 next_step = take_next_non_greedy(attempt)
 print(f" --> {next_step}")
 return next_step
 print(" No next steps.")

The first bug: when comparing with a starred character in the pattern, we don't check if that character is the same as the current character in the string (either that, or if the starred character is the wildcard .).

For example, for main(pattern='x*', string='a'), we have:

Attempt(matched_string='', remaining_string='a', remaining_tokens=[Token(character='x', is_starred=True)])
 greedy --> Attempt(matched_string='x', remaining_string='', remaining_tokens=[Token(character='x', is_starred=True)])
 non-greedy --> Attempt(matched_string='x', remaining_string='a', remaining_tokens=[])
Attempt(matched_string='x', remaining_string='', remaining_tokens=[Token(character='x', is_starred=True)])
 No next steps.
Attempt(matched_string='x', remaining_string='a', remaining_tokens=[])
 No next steps.
  • The greedy next step consumed a from the string, which shouldn't have happened given that the pattern is x*.
  • The non-greedy next step didn't consume any characters from the string yet it has matched_string='x'. Only matched_string being wrong here makes this more of a cosmetic issue since the implementation doesn't depend on the value of matched_string for decision making.
  • Examining an Attempt with remaining_string='', remaining_tokens=[Token(character='x', is_starred=True)] yields no next steps, which is incorrect. This is actually a concrete example of the second bug; more on this below.

For this query the implementation returns the correct answer of False, but for the wrong reasons.


The second bug: take_next immediately concludes that there are no valid next steps if either of the following are true:

  • attempt.remaining_string is an empty string
  • attempt.remaining_tokens is empty

This is incorrect because even if attempt.remaining_string is an empty string, we can still have a match if all the tokens in attempt.remaining_tokens are starred.

So for main(pattern='.*', string=''),

Attempt(matched_string='', remaining_string='', remaining_tokens=[Token(character='.', is_starred=True)])
 No next steps.

the current implementation doesn't see a path forward from an Attempt with an empty remaining_string and immediately returns None from take_next. What it should return is Attempt(matched_string='', remaining_string='', remaining_tokens=[]).

To summarize what we can conclude based on the state of Attempt:

Attempt state Conclusion
remaining_string=='' and remaining_tokens==[] pattern successfully matches the string (which you've correctly covered in main_loop)
remaining_string=='' and remaining_tokens is not empty could have next valid steps; must inspect further to find out
remaining_string is not empty and remaining_tokens==[] no possible next valid steps
remaining_string is not empty and remaining_tokens is not empty could have next valid steps; must inspect further to find out
answered Mar 7, 2021 at 0:48
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.