LeetCode 726. Number of Atoms
Given a chemical formula (given as a string), return the count of each atom.
An atomic element always starts with an uppercase character, then zero or more lowercase letters, representing the name.
1 or more digits representing the count of that element may follow if the count is greater than 1. If the count is 1, no digits will follow. For example,
H2O
andH2O2
are possible, butH1O2
is impossible.Two formulas concatenated together produce another formula. For example,
H2O2He3Mg4
is also a formula.A formula placed in parentheses, and a count (optionally added) is also a formula. For example,
(H2O2)
and(H2O2)3
are formulas.Given a formula, output the count of all elements as a string in the following form: the first name (in sorted order), followed by its count (if that count is more than 1), followed by the second name (in sorted order), followed by its count (if that count is more than 1), and so on.
Formula =
K4(ON(SO3)2)2
Output:
K4N2O14S4
Solution:
There are three cases then needs to be handled:
In case it is
)
then next character can be either(
or a digit. In both cases we multiply by either 1 or the digit. Before multiplying, add all the elements till(
is encountered in stack.In case of
(
then just add to stack.In case of any other character then just add to stack in the form of dictionary with its count added as value and element as key.
In the end just add all elements in the stack and sort it.
import re
class Solution:
def countOfAtoms(self, formula):
def add_up(formulas):
''' add up all the elements provided in a list'''
result = {}
for i in range(len(formulas)):
result = {f: formulas[i].get(f, 0) + result.get(f, 0) for f in set(result).union(formulas[i])}
return result
def mul_up(formula, factor):
'''multiply the formula with the factor'''
result = {}
for k, v in formula.iteritems():
result[k] = v*factor
return result
def get_matches(regex, string):
return re.match(regex, string)
formula = "(" + formula + ")"
stack = []
i = 0
while i < len(formula):
c = formula[i]
if c == "(":
stack.append(c)
i += 1
continue
elif (c == ")" and i == len(formula)-1):
cc = stack.pop()
list_add_open_element = []
while cc != "(":
list_add_open_element.append(cc)
cc = stack.pop()
stack.append(add_up(list_add_open_element))
i += 1
continue
elif (c == ")" and (i+1 < len(formula) and formula[i+1].isdigit())) or (c == ")" and (i+1 < len(formula) and formula[i+1] == "(")):
cc = stack.pop()
list_add_open_element = [] #list to add all element up to (
while cc != "(":
list_add_open_element.append(cc)
cc = stack.pop()
r = add_up(list_add_open_element)
j = i+1
factor = ""
while j < len(formula) and formula[j].isdigit():
factor += formula[j]
j += 1
if not factor:
factor = 1
i += 1
else:
i = j
cc = mul_up(r, int(factor))
stack.append(cc)
continue
element = get_matches(ur"([A-Z][a-z]?\d*)", formula[i:])
key = get_matches(ur"([A-Z][a-z]?)", element.group())
value = re.search(ur"([0-9]+)", element.group())
stack.append({key.group():(1 if not value else int(value.group()))})
i += len(key.group()) + (len(value.group()) if value else 0)
result = [(k, v) for k, v in stack[0].iteritems()]
result.sort()
return "".join(str(i) + (str(j) if int(j) > 1 else "") for i,j in result)
1 Answer 1
You have a couple of good ideas:
- Using regular expressions to assist with the parsing
- Splitting out some of the code into smaller functions
Unfortunately, those ideas were ineffectively applied, such that the main code is still very complicated and difficult to follow.
Iterating through the formula character by character (using your while i < len(formula)
loop) defeats the purpose of using regular expressions. Furthermore, you should not need tests like c == "("
, c == ")"
, and .isdigit()
.
Rather, the parsing should be mainly driven by re.finditer()
, using one regular expression that is constructed to classify everything that it can encounter:
- atomic element (possibly followed by a number)
- opening parenthesis
- closing parenthesis (possibly followed by a number)
Each of those tokens should have a named capture group to help you figure out what was matched.
Strictly speaking, your regex "([A-Z][a-z]?)"
makes an assumption that does not meet the specification:
All atom names consist of lowercase letters, except for the first character which is uppercase.
Three-letter element abbreviations such as Uue
are possible.
Finally, your add_up()
function did not need to be written — you've just reinvented the addition method of collections.Counter
.
Suggested solution
Unfortunately, LeetCode expects the solution to be packaged inside a weird Solution
class that is not really a class, but a namespace. (It calls the countOfAtoms()
"method" even though there is no meaningful constructor.) I've decided to tweak it into a @classmethod
instead.
from collections import Counter
import re
class Solution(object):
RE = re.compile(
r'(?P<atom>[A-Z][a-z]*)(?P<atom_count>\d*)|'
r'(?P<new_group>\()|'
r'\)(?P<group_count>\d*)|'
r'(?P<UNEXPECTED_CHARACTER_IN_FORMULA>.+)'
)
@classmethod
def atom_count(cls, stack, atom, atom_count='', **_):
"""Handle an atom with an optional count, e.g. H or Mg2"""
stack[-1][atom] += (1 if atom_count == '' else int(atom_count))
@classmethod
def new_group(cls, stack, **_):
"""Handle an opening parenthesis"""
stack.append(Counter())
@classmethod
def group_count(cls, stack, group_count='', **_):
"""Handle a closing parenthesis with an optional group count"""
group_count = 1 if group_count == '' else int(group_count)
group = stack.pop()
for atom in group:
group[atom] *= group_count
stack[-1] += group
@classmethod
def countOfAtoms(cls, formula):
stack = []
cls.new_group(stack)
for m in cls.RE.finditer(formula):
getattr(cls, m.lastgroup)(stack, **m.groupdict())
return ''.join(
atom + (str(count) if count > 1 else '')
for atom, count in sorted(stack.pop().items())
)
Explore related questions
See similar questions with these tags.