Return to Answer

added 329 characters in body

Source Link

edited Nov 24, 2017 at 18:09

200_success

edited Nov 24, 2017 at 18:09

200_success

145.6k
22
190
479

FurthermoreStrictly speaking, your regex "([A-Z][a-z]?)" makes an assumption that does not meet the specification:

All atom names consist of lowercase letters, except for the first character which is uppercase.

Three-letter element abbreviations such as Uue are possible.

Finally, your add_up() function did not need to be written — you've just reinvented the addition method of collections.Counter.

Furthermore, your add_up() function did not need to be written — you've just reinvented the addition method of collections.Counter.

Strictly speaking, your regex "([A-Z][a-z]?)" makes an assumption that does not meet the specification:

All atom names consist of lowercase letters, except for the first character which is uppercase.

Three-letter element abbreviations such as Uue are possible.

Finally, your add_up() function did not need to be written — you've just reinvented the addition method of collections.Counter.

Source Link

answered Nov 24, 2017 at 17:49

200_success

answered Nov 24, 2017 at 17:49

200_success

145.6k
22
190
479

You have a couple of good ideas:

Using regular expressions to assist with the parsing
Splitting out some of the code into smaller functions

Unfortunately, those ideas were ineffectively applied, such that the main code is still very complicated and difficult to follow.

Iterating through the formula character by character (using your while i < len(formula) loop) defeats the purpose of using regular expressions. Furthermore, you should not need tests like c == "(", c == ")", and .isdigit().

Rather, the parsing should be mainly driven by re.finditer(), using one regular expression that is constructed to classify everything that it can encounter:

atomic element (possibly followed by a number)
opening parenthesis
closing parenthesis (possibly followed by a number)

Each of those tokens should have a named capture group to help you figure out what was matched.

Furthermore, your add_up() function did not need to be written — you've just reinvented the addition method of collections.Counter.

Suggested solution

Unfortunately, LeetCode expects the solution to be packaged inside a weird Solution class that is not really a class, but a namespace. (It calls the countOfAtoms() "method" even though there is no meaningful constructor.) I've decided to tweak it into a @classmethod instead.

from collections import Counter
import re
class Solution(object):
 RE = re.compile(
 r'(?P<atom>[A-Z][a-z]*)(?P<atom_count>\d*)|'
 r'(?P<new_group>\()|'
 r'\)(?P<group_count>\d*)|'
 r'(?P<UNEXPECTED_CHARACTER_IN_FORMULA>.+)'
 )
 @classmethod
 def atom_count(cls, stack, atom, atom_count='', **_):
 """Handle an atom with an optional count, e.g. H or Mg2"""
 stack[-1][atom] += (1 if atom_count == '' else int(atom_count))
 @classmethod
 def new_group(cls, stack, **_):
 """Handle an opening parenthesis"""
 stack.append(Counter())
 @classmethod
 def group_count(cls, stack, group_count='', **_):
 """Handle a closing parenthesis with an optional group count"""
 group_count = 1 if group_count == '' else int(group_count)
 group = stack.pop()
 for atom in group:
 group[atom] *= group_count
 stack[-1] += group
 @classmethod
 def countOfAtoms(cls, formula):
 stack = []
 cls.new_group(stack)
 for m in cls.RE.finditer(formula):
 getattr(cls, m.lastgroup)(stack, **m.groupdict())
 return ''.join(
 atom + (str(count) if count > 1 else '')
 for atom, count in sorted(stack.pop().items())
 )

lang-py