FurthermoreStrictly speaking, your regex "([A-Z][a-z]?)"
makes an assumption that does not meet the specification:
All atom names consist of lowercase letters, except for the first character which is uppercase.
Three-letter element abbreviations such as Uue
are possible.
Finally, your add_up()
function did not need to be written — you've just reinvented the addition method of collections.Counter
.
Furthermore, your add_up()
function did not need to be written — you've just reinvented the addition method of collections.Counter
.
Strictly speaking, your regex "([A-Z][a-z]?)"
makes an assumption that does not meet the specification:
All atom names consist of lowercase letters, except for the first character which is uppercase.
Three-letter element abbreviations such as Uue
are possible.
Finally, your add_up()
function did not need to be written — you've just reinvented the addition method of collections.Counter
.
You have a couple of good ideas:
- Using regular expressions to assist with the parsing
- Splitting out some of the code into smaller functions
Unfortunately, those ideas were ineffectively applied, such that the main code is still very complicated and difficult to follow.
Iterating through the formula character by character (using your while i < len(formula)
loop) defeats the purpose of using regular expressions. Furthermore, you should not need tests like c == "("
, c == ")"
, and .isdigit()
.
Rather, the parsing should be mainly driven by re.finditer()
, using one regular expression that is constructed to classify everything that it can encounter:
- atomic element (possibly followed by a number)
- opening parenthesis
- closing parenthesis (possibly followed by a number)
Each of those tokens should have a named capture group to help you figure out what was matched.
Furthermore, your add_up()
function did not need to be written — you've just reinvented the addition method of collections.Counter
.
Suggested solution
Unfortunately, LeetCode expects the solution to be packaged inside a weird Solution
class that is not really a class, but a namespace. (It calls the countOfAtoms()
"method" even though there is no meaningful constructor.) I've decided to tweak it into a @classmethod
instead.
from collections import Counter
import re
class Solution(object):
RE = re.compile(
r'(?P<atom>[A-Z][a-z]*)(?P<atom_count>\d*)|'
r'(?P<new_group>\()|'
r'\)(?P<group_count>\d*)|'
r'(?P<UNEXPECTED_CHARACTER_IN_FORMULA>.+)'
)
@classmethod
def atom_count(cls, stack, atom, atom_count='', **_):
"""Handle an atom with an optional count, e.g. H or Mg2"""
stack[-1][atom] += (1 if atom_count == '' else int(atom_count))
@classmethod
def new_group(cls, stack, **_):
"""Handle an opening parenthesis"""
stack.append(Counter())
@classmethod
def group_count(cls, stack, group_count='', **_):
"""Handle a closing parenthesis with an optional group count"""
group_count = 1 if group_count == '' else int(group_count)
group = stack.pop()
for atom in group:
group[atom] *= group_count
stack[-1] += group
@classmethod
def countOfAtoms(cls, formula):
stack = []
cls.new_group(stack)
for m in cls.RE.finditer(formula):
getattr(cls, m.lastgroup)(stack, **m.groupdict())
return ''.join(
atom + (str(count) if count > 1 else '')
for atom, count in sorted(stack.pop().items())
)