Leetcode number of atoms solution using stack

Question 1

LeetCode 726. Number of Atoms

Given a chemical formula (given as a string), return the count of each atom.

An atomic element always starts with an uppercase character, then zero or more lowercase letters, representing the name.

1 or more digits representing the count of that element may follow if the count is greater than 1. If the count is 1, no digits will follow. For example, H2O and H2O2 are possible, but H1O2 is impossible.

Two formulas concatenated together produce another formula. For example, H2O2He3Mg4 is also a formula.

A formula placed in parentheses, and a count (optionally added) is also a formula. For example, (H2O2) and (H2O2)3 are formulas.

Given a formula, output the count of all elements as a string in the following form: the first name (in sorted order), followed by its count (if that count is more than 1), followed by the second name (in sorted order), followed by its count (if that count is more than 1), and so on.

Formula = K4(ON(SO3)2)2

Output: K4N2O14S4

Solution:

There are three cases then needs to be handled:

In case it is ) then next character can be either ( or a digit. In both cases we multiply by either 1 or the digit. Before multiplying, add all the elements till ( is encountered in stack.
In case of ( then just add to stack.
In case of any other character then just add to stack in the form of dictionary with its count added as value and element as key.

In the end just add all elements in the stack and sort it.

import re
class Solution:
 def countOfAtoms(self, formula):
 def add_up(formulas):
 ''' add up all the elements provided in a list'''
 result = {}
 for i in range(len(formulas)):
 result = {f: formulas[i].get(f, 0) + result.get(f, 0) for f in set(result).union(formulas[i])}
 return result
 def mul_up(formula, factor):
 '''multiply the formula with the factor'''
 result = {}
 for k, v in formula.iteritems():
 result[k] = v*factor
 return result
 def get_matches(regex, string):
 return re.match(regex, string)
 formula = "(" + formula + ")"
 stack = []
 i = 0
 while i < len(formula):
 c = formula[i]
 if c == "(":
 stack.append(c)
 i += 1
 continue
 elif (c == ")" and i == len(formula)-1):
 cc = stack.pop()
 list_add_open_element = []
 while cc != "(":
 list_add_open_element.append(cc)
 cc = stack.pop()
 stack.append(add_up(list_add_open_element))
 i += 1
 continue
 elif (c == ")" and (i+1 < len(formula) and formula[i+1].isdigit())) or (c == ")" and (i+1 < len(formula) and formula[i+1] == "(")):
 cc = stack.pop()
 list_add_open_element = [] #list to add all element up to (
 while cc != "(":
 list_add_open_element.append(cc)
 cc = stack.pop()
 r = add_up(list_add_open_element)
 j = i+1
 factor = ""
 while j < len(formula) and formula[j].isdigit():
 factor += formula[j]
 j += 1
 if not factor:
 factor = 1
 i += 1
 else:
 i = j
 cc = mul_up(r, int(factor))
 stack.append(cc)
 continue
 element = get_matches(ur"([A-Z][a-z]?\d*)", formula[i:])
 key = get_matches(ur"([A-Z][a-z]?)", element.group())
 value = re.search(ur"([0-9]+)", element.group())
 stack.append({key.group():(1 if not value else int(value.group()))})
 i += len(key.group()) + (len(value.group()) if value else 0)
 result = [(k, v) for k, v in stack[0].iteritems()]
 result.sort()
 return "".join(str(i) + (str(j) if int(j) > 1 else "") for i,j in result)

Question 2

You have a couple of good ideas:

Using regular expressions to assist with the parsing
Splitting out some of the code into smaller functions

Unfortunately, those ideas were ineffectively applied, such that the main code is still very complicated and difficult to follow.

Iterating through the formula character by character (using your while i < len(formula) loop) defeats the purpose of using regular expressions. Furthermore, you should not need tests like c == "(", c == ")", and .isdigit().

Rather, the parsing should be mainly driven by re.finditer(), using one regular expression that is constructed to classify everything that it can encounter:

atomic element (possibly followed by a number)
opening parenthesis
closing parenthesis (possibly followed by a number)

Each of those tokens should have a named capture group to help you figure out what was matched.

Strictly speaking, your regex "([A-Z][a-z]?)" makes an assumption that does not meet the specification:

All atom names consist of lowercase letters, except for the first character which is uppercase.

Three-letter element abbreviations such as Uue are possible.

Finally, your add_up() function did not need to be written — you've just reinvented the addition method of collections.Counter.

Suggested solution

Unfortunately, LeetCode expects the solution to be packaged inside a weird Solution class that is not really a class, but a namespace. (It calls the countOfAtoms() "method" even though there is no meaningful constructor.) I've decided to tweak it into a @classmethod instead.

from collections import Counter
import re
class Solution(object):
 RE = re.compile(
 r'(?P<atom>[A-Z][a-z]*)(?P<atom_count>\d*)|'
 r'(?P<new_group>\()|'
 r'\)(?P<group_count>\d*)|'
 r'(?P<UNEXPECTED_CHARACTER_IN_FORMULA>.+)'
 )
 @classmethod
 def atom_count(cls, stack, atom, atom_count='', **_):
 """Handle an atom with an optional count, e.g. H or Mg2"""
 stack[-1][atom] += (1 if atom_count == '' else int(atom_count))
 @classmethod
 def new_group(cls, stack, **_):
 """Handle an opening parenthesis"""
 stack.append(Counter())
 @classmethod
 def group_count(cls, stack, group_count='', **_):
 """Handle a closing parenthesis with an optional group count"""
 group_count = 1 if group_count == '' else int(group_count)
 group = stack.pop()
 for atom in group:
 group[atom] *= group_count
 stack[-1] += group
 @classmethod
 def countOfAtoms(cls, formula):
 stack = []
 cls.new_group(stack)
 for m in cls.RE.finditer(formula):
 getattr(cls, m.lastgroup)(stack, **m.groupdict())
 return ''.join(
 atom + (str(count) if count > 1 else '')
 for atom, count in sorted(stack.pop().items())
 )

200_success 146k22 gold badges192 silver badges481 bronze badges · Accepted Answer · 2017-11-24 17:49:12Z

You have a couple of good ideas:

Using regular expressions to assist with the parsing
Splitting out some of the code into smaller functions

Unfortunately, those ideas were ineffectively applied, such that the main code is still very complicated and difficult to follow.

Iterating through the formula character by character (using your while i < len(formula) loop) defeats the purpose of using regular expressions. Furthermore, you should not need tests like c == "(", c == ")", and .isdigit().

Rather, the parsing should be mainly driven by re.finditer(), using one regular expression that is constructed to classify everything that it can encounter:

atomic element (possibly followed by a number)
opening parenthesis
closing parenthesis (possibly followed by a number)

Each of those tokens should have a named capture group to help you figure out what was matched.

Strictly speaking, your regex "([A-Z][a-z]?)" makes an assumption that does not meet the specification:

All atom names consist of lowercase letters, except for the first character which is uppercase.

Three-letter element abbreviations such as Uue are possible.

Finally, your add_up() function did not need to be written — you've just reinvented the addition method of collections.Counter.

Stack Exchange Network

Leetcode number of atoms solution using stack

LeetCode 726. Number of Atoms

1 Answer 1

Suggested solution

You must log in to answer this question.

Hot Network Questions

1 Answer 1

Suggested solution

You must log in to answer this question.

Related