49
\$\begingroup\$

The Challenge

Write a program that can break down an input chemical formula (see below), and output its respective atoms in the form element: atom-count.


Input

Sample input:

H2O

Your input will always contain at least one element, but no more than ten. Your program should accept inputs that contain parentheses, which may be nested.

Elements in the strings will always match [A-Z][a-z]*, meaning they will always start with an uppercase letter. Numbers will always be single digits.


Output

Sample output (for the above input):

H: 2
O: 1

Your output can be optionally followed by a newline.


Breaking Down Molecules

Numbers to the right of a set of parentheses are distributed to each element inside:

Mg(OH)2

Should output:

Mg: 1
O: 2
H: 2

The same principle applies to individual atoms:

O2

Should output:

O: 2

And also chaining:

Ba(NO2)2

Should output:

Ba: 1
N: 2
O: 4

Testcases

Ba(PO3)2
->
Ba: 1
P: 2
O: 6
C13H18O2
->
C: 13
H: 18
O: 2
K4(ON(SO3)2)2
->
K: 4
O: 14
N: 2
S: 4
(CH3)3COOC(CH3)3
->
C: 8
H: 18
O: 2
(C2H5)2NH
->
C: 4
H: 11
N: 1
Co3(Fe(CN)6)2
->
Co: 3
Fe: 2
C: 12
N: 12

Scoreboard

For your score to appear on the board, it should be in this format:

# Language, Score

Or if you earned a bonus:

# Language, Score (Bytes - Bonus%)

function getURL(e){return"https://api.stackexchange.com/2.2/questions/"+QUESTION_ID+"/answers?page="+e+"&pagesize=100&order=desc&sort=creation&site=codegolf&filter="+ANSWER_FILTER}function getAnswers(){$.ajax({url:getURL(answer_page++),method:"get",dataType:"jsonp",crossDomain:!0,success:function(e){answers.push.apply(answers,e.items),answers_hash=[],answer_ids=[],e.items.forEach(function(e){var s=+e.share_link.match(/\d+/);answer_ids.push(s),answers_hash[s]=e}),useData(answers)}})}function getOwnerName(e){return e.owner.display_name}function useData(e){var s=[];e.forEach(function(e){var a=e.body.replace(/<s>.*<\/s>/,"").replace(/<strike>.*<\/strike>/,"");console.log(a),VALID_HEAD.test(a)&&s.push({user:getOwnerName(e),language:a.match(VALID_HEAD)[1],score:+a.match(VALID_HEAD)[2],link:e.share_link})}),s.sort(function(e,s){var a=e.score,r=s.score;return a-r}),s.forEach(function(e,s){var a=$("#score-template").html();a=a.replace("{{RANK}}",s+1+"").replace("{{NAME}}",e.user).replace("{{LANGUAGE}}",e.language).replace("{{SCORE}}",e.score),a=$(a),$("#scores").append(a)})}var QUESTION_ID=58469,ANSWER_FILTER="!t)IWYnsLAZle2tQ3KqrVveCRJfxcRLe",answers=[],answer_ids,answers_hash,answer_page=1;getAnswers();var VALID_HEAD=/<h\d>([^\n,]*)[, ]*(\d+).*<\/h\d>/;
body{text-align:left!important}table thead{font-weight:700}table td{padding:10px 0 0 30px}#scores-cont{padding:10px;width:600px}#scores tr td:first-of-type{padding-left:0}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><link rel="stylesheet" type="text/css" href="//cdn.sstatic.net/codegolf/all.css?v=83c949450c8b"><div id="scores-cont"><h2>Scores</h2><table class="score-table"><thead> <tr><td></td><td>User</td><td>Language</td><td>Score</td></tr></thead> <tbody id="scores"></tbody></table></div><table style="display: none"> <tbody id="score-template"><tr><td>{{RANK}}</td><td>{{NAME}}</td><td>{{LANGUAGE}}</td><td>{{SCORE}}</td></tr></tbody></table>

Edit: Square brackets are no longer a part of the question. Any answers posted before 3AM UTC time, September 23, are safe and will not be affected by this change.

DialFrost
5,1892 gold badges15 silver badges58 bronze badges
asked Sep 22, 2015 at 15:37
\$\endgroup\$
18
  • \$\begingroup\$ What are the allowed forms of input? \$\endgroup\$ Commented Sep 22, 2015 at 15:48
  • 1
    \$\begingroup\$ @ZachGates It's better that we're allowed to support either, but keep in mind that square brackets is still incorrect. AFAIK in chemical formulae square brackets are only used to indicated concentration. E.g.: [HCl] = 0.01 mol L^-1. \$\endgroup\$ Commented Sep 22, 2015 at 15:50
  • \$\begingroup\$ See the "Examples" section. Is there something specific you're asking about? @Oberon Inputs are denoted by a >. \$\endgroup\$ Commented Sep 22, 2015 at 15:51
  • 1
    \$\begingroup\$ Challenge would be so much better if you had to output the elements in Atomic Number order :-) \$\endgroup\$ Commented Sep 23, 2015 at 16:08
  • 1
    \$\begingroup\$ Just a note, the examples still have elements with multiple digit atom counts. \$\endgroup\$ Commented Sep 23, 2015 at 22:07

14 Answers 14

11
\$\begingroup\$

CJam, (削除) 59 (削除ここまで) 57 bytes

q{:Ci32/")("C#-"[ ] aC~* Ca C+"S/=~}%`La`-S%$e`{~": "@N}/

Try it online in the CJam interpreter.

How it works

q e# Read all input from STDIN.
{ e# For each character:
 :Ci e# Save it in C and cast to integer.
 32/ e# Divide the code point by 32. This pushes
 e# 2 for uppercase, 3 for lowercase and 1 for non-letters.
 ")("C# e# Find the index of C in that string. (-1 if not found.)
 - e# Subtract. This pushes 0 for (, 1 for ), 2 for digits,
 e# 3 for uppercase letters and 4 for lowercase letters.
 "[ ] aC~* Ca C+"
 S/ e# Split it at spaces into ["[" "]" "aC~*" "Ca" "C+"].
 =~ e# Select and evaluate the corresponding chunk.
 e# ( : [ : Begin an array.
 e# ) : ] : End an array.
 e# 0-9 : aC~* : Wrap the top of the stack into an array
 e# and repeat that array eval(C) times.
 e# A-Z : Ca : Push "C".
 e# a-z : C+ : Append C to the string on top of the stack.
}% e#
` e# Push a string representation of the resulting array.
 e# For input (Au(CH)2)2, this pushes the string
 e# [[["Au" [["C" "H"] ["C" "H"]]] ["Au" [["C" "H"].["C" "H"]]]]]
La` e# Push the string [""].
- e# Remove square brackets and double quotes from the first string.
S% e# Split the result at runs of spaces.
$e` e# Sort and perform run-length encoding.
{ e# For each pair [run-length string]:
 ~ e# Dump both on the stack.
 ": " e# Push that string.
 @N e# Rotate the run-length on top and push a linefeed.
}/ e#
answered Sep 23, 2015 at 6:19
\$\endgroup\$
11
\$\begingroup\$

Python 3, (削除) 157 (削除ここまで) 154 bytes

import re
s=re.sub
f=s("[()',]",'',str(eval(s(',?(\d+)',r'*1,円',s('([A-Z][a-z]*)',r'("1円",),',input()))))).split()
for c in set(f):print(c+":",f.count(c))

Try it online!

Only supports input using regular brackets.

Before creating the golfed solution using eval above I created this reference solution, which I found very elegant:

import re, collections
parts = filter(bool, re.split('([A-Z][a-z]*|\(|\))', input()))
stack = [[]]
for part in parts:
 if part == '(':
 stack.append([])
 elif part == ')':
 stack[-2].append(stack.pop())
 elif part.isdigit():
 stack[-1].append(int(part) * stack[-1].pop())
 else:
 stack[-1].append([part])
count = collections.Counter()
while stack:
 if isinstance(stack[-1], list):
 stack.extend(stack.pop())
 else:
 count[stack.pop()] += 1
for e, i in count.items():
 print("{}: {}".format(e, i))
DialFrost
5,1892 gold badges15 silver badges58 bronze badges
answered Sep 22, 2015 at 19:21
\$\endgroup\$
10
\$\begingroup\$

Pyth, (削除) 66 (削除ここまで) 65 bytes

VrSc-`v::z"([A-Z][a-z]*)""('\1円',),"",?(\d+)""*\1,円"`(k))8j": "_N

Try it online!

Port of my Python answer. Only supports input using regular brackets.

DialFrost
5,1892 gold badges15 silver badges58 bronze badges
answered Sep 22, 2015 at 19:50
\$\endgroup\$
1
  • 3
    \$\begingroup\$ +1. Three answers in an hour? Nice. \$\endgroup\$ Commented Sep 22, 2015 at 19:51
6
\$\begingroup\$

JavaScript ES6, 366 bytes

function f(i){function g(a,b,c){b=b.replace(/[[(]([^[(\])]+?)[\])](\d*)/g,g).replace(/([A-Z][a-z]?)(\d*)/g,function(x,y,z){return y+((z||1)*(c||1))});return(b.search(/[[(]/)<0)?b:g(0,b)}return JSON.stringify(g(0,i).split(/(\d+)/).reduce(function(q,r,s,t){(s%2)&&(q[t[s-1]]=+r+(q[t[s-1]]||0));return q},{})).replace(/["{}]/g,'').replace(/:/g,': ').replace(/,/g,'\n')}

JS Fiddle: https://jsfiddle.net/32tunzkr/1/

I'm pretty sure this can be shortened, but I need to get back to work. ;-)

answered Sep 22, 2015 at 18:51
\$\endgroup\$
2
  • 2
    \$\begingroup\$ I'm pretty sure it can be shortened as well. Since you clame to be using ES6, you could start by using the big-arrow notation to create functions. And the implicit return statement. That should be enough for now. \$\endgroup\$ Commented Sep 22, 2015 at 22:59
  • \$\begingroup\$ You also use replace a lot so you can save some bytes by using xyz[R='replace'](...) the first time and abc[R] (...) each subsequent time. \$\endgroup\$ Commented Sep 23, 2015 at 15:25
6
\$\begingroup\$

SageMath, (削除) 156 (削除ここまで) 148 bytes

import re
i=input()
g=re.sub
var(re.findall("[A-Z][a-z]?",i))
print g("(\d+).(\S+)\D*",r"2円: 1円\n",`eval(g("(\d+)",r"*1円",g("([A-Z(])",r"+1円",i)))`)

Try it online here (hopefully link will work, might need an online account)

Note: If trying online, you will need to replace input() with the string (e.g. "(CH3)3COOC(CH3)3")

Explanation

Sage allows you to simplify algebraic expressions, providing they are in the right format (see 'symbolic manipulation' of this link). The regexes inside the eval() basically serve to get the input string into the right format, for example something like:

+(+C+H*3)*3+C+O+O+C+(+C+H*3)*3

eval() will then simplify this to: 8*C + 18*H + 2*O, and then it's just a matter of formatting the output with another regex substitution.

answered Sep 22, 2015 at 23:28
\$\endgroup\$
6
\$\begingroup\$

Python 3, 414 bytes

I hope that the order of the result doesn't count.

import re
t=input().replace("[", '(').replace("]", ')')
d={}
p,q="(\([^\(\)]*\))(\d*)","([A-Z][a-z]*)(\d*)"
for i in re.findall(q,t):t = t.replace(i[0]+i[1],i[0]*(1if i[1]==''else int(i[1])))
r=re.findall(p,t)
while len(r)>0:t=t.replace(r[0][0]+r[0][1],r[0][0][1:-1]*(1if r[0][1]==''else int(r[0][1])));r=re.findall(p,t)
for i in re.findall(q[:-5], t):d[i]=d[i]+1if i in d else 1
for i in d:print(i+': '+str(d[i]))

Try it online!

DialFrost
5,1892 gold badges15 silver badges58 bronze badges
answered Sep 22, 2015 at 18:40
\$\endgroup\$
5
\$\begingroup\$

Javascript (ES6), (削除) 286 (削除ここまで) 284

Not that much shorter than the other ES6 one but I gave it my best. Note: this will error out if you give it an empty string or most invalid inputs. Also expects all groups to have a count of more than 1 (ie, no CO[OH]). If this breaks any challenge rules, let me know.

a=>(b=[],c={},a.replace(/([A-Z][a-z]*)(?![0-9a-z])/g, "11ドル").match(/[A-Z][a-z]*|[0-9]+|[\[\(]/g).reverse().map(d=>(d*1==d&&b.push(d*1),d.match(/\(|\[/)&&b.pop(),d.match(/[A-Z]/)&&eval('e=b.reduce((f,g)=>f*g,1),c[d]=c[d]?c[d]+e:e,b.pop()'))),eval('g="";for(x in c)g+=x+`: ${c[x]}\n`'))

Uses a stack-based approach. First, it preprocesses the string to add 1 to any element without a number, ie Co3(Fe(CN)6)2 becomes Co3(Fe1(C1N1)6)2. Then it loops through in reverse order and accumulates element counts.

a=>(
 // b: stack, c: accumulator
 b=[], c={},
 // adds the 1 to every element that doesn't have a count
 a.replace(/([A-Z][a-z]*)(?![0-9a-z])/g, "11ドル")
 // gathers a list of all the elements, counts, and grouping chars
 .match(/[A-Z][a-z]*|[0-9]+|[\[\(]/g)
 // loops in reverse order
 .reverse().map(d=>(
 // d*1 is shorthand here for parseInt(d)
 // d*1==d: true only if d is a number
 // if it's a number, add it to the stack
 d * 1 == d && b.push(d * 1),
 // if there's an opening grouping character, pop the last item
 // the item being popped is that group's count which isn't needed anymore
 d.match(/\(|\[/) && b.pop(),
 // if it's an element, update the accumulator
 d.match(/[A-Z]/) && eval('
 // multiplies out the current stack
 e = b.reduce((f, g)=> f * g, 1),
 // if the element exists, add to it, otherwise create an index for it
 c[d] = c[d] ? c[d] + e : e,
 // pops this element's count to get ready for the next element
 b.pop()
 ')
 )),
 // turns the accumulator into an output string and returns the string
 eval('
 g="";
 // loops through each item of the accumulator and adds it to the string
 // for loops in eval always return the last statement in the for loop
 // which in this case evaluates to g
 for(x in c)
 g+=x+`: ${c[x]}\n`
 ')
)

Fiddle

answered Sep 22, 2015 at 20:11
\$\endgroup\$
5
\$\begingroup\$

Perl, (削除) 177 (削除ここまで) 172 bytes

171 bytes code + 1 byte command line parameter

Ok, so I may have got a little carried away with regex on this one...

s/(?>[A-Z][a-z]?)(?!\d)/$&1/g;while(s/\(([A-Z][a-z]?)(\d+)(?=\w*\W(\d+))/2ドル.(3ドル*4ドル).1ドル/e||s/([A-Z][a-z]?)(\d*)(\w*)1円(\d*)/1ドル.(2ドル+4ドル).3ドル/e||s/\(\)\d+//g){};s/\d+/: $&\n/g

Usage example:

echo "(CH3)3COOC(CH3)3" | perl -p entry.pl
answered Sep 22, 2015 at 20:57
\$\endgroup\$
2
\$\begingroup\$

Mathematica, 152 bytes

f=TableForm@Cases[PowerExpand@Log@ToExpression@StringReplace[#,{a:(_?UpperCaseQ~~___?LowerCaseQ):>"\""<>a<>"\"",b__?DigitQ:>"^"<>b}],a_. Log[b_]:>{b,a}]&

The above defines a function f which takes a string as input. The function takes the string and wraps each element name into quotes and adds an infix exponentiation operator before each number, then interprets the string as an expression:

"YBa2Cu3O7" -> ""Y""Ba"^2"Cu"^3"O"^7" -> "Y" "Ba"^2 "Cu"^3 "O"^7

Then it takes the logarithm of that and expands it out (mathematica doesn't care, what to take the logarithm of :)):

Log["Y" "Ba"^2 "Cu"^3 "O"^7] -> Log["Y"] + 2 Log["Ba"] + 3 Log["Cu"] + 7 Log["O"]

and then it finds all occurrences of multiplication of a Log by a number and parses it into the form of {log-argument, number} and outputs those in a table. Some examples:

f@"K4(ON(SO3)2)2"
K 4
N 2
O 14
S 4
f@"(CH3)3COOC(CH3)3"
C 8
H 18
O 2
f@"Co3(Fe(CN)6)2"
C 12
Co 3
Fe 2
N 12
answered Oct 26, 2015 at 9:42
\$\endgroup\$
1
\$\begingroup\$

Java, 827 bytes

import java.util.*;class C{String[]x=new String[10];public static void main(String[]a){new C(a[0]);}C(String c){I p=new I();int[]d=d(c,p);for(int i=0;i<10;i++)if(x[i]!=null)System.out.println(x[i]+": "+d[i]);}int[]d(String c,I p){int[]f;int i,j;Vector<int[]>s=new Vector();while(p.v<c.length()){char q=c.charAt(p.v);if(q=='(')s.add(d(c,p.i()));if(q==')')break;if(q>='A'&&q<='Z'){f=new int[10];char[]d=new char[]{c.charAt(p.v),0};i=1;if(c.length()-1>p.v){d[1]=c.charAt(p.v+1);if(d[1]>='a'&&d[1]<='z'){i++;p.i();}}String h=new String(d,0,i);i=0;for(String k:x){if(k==null){x[i]=h;break;}if(k.equals(h))break;i++;}f[i]++;s.add(f);}if(q>='0'&&q<='9'){j=c.charAt(p.v)-'0';f=s.get(s.size()-1);for(i=0;i<10;)f[i++]*=j;}p.i();}f=new int[10];for(int[]w:s){j=0;for(int k:w)f[j++]+=k;}return f;}class I{int v=0;I i(){v++;return this;}}}

Git repository w/ ungolfed source (not perfect parity, ungolfed supports multi-character numbers).

Been a while, figured I'd give Java some representation. Definitely not going to win any awards :).

answered Sep 24, 2015 at 3:02
\$\endgroup\$
1
\$\begingroup\$

ES6, 198 bytes

f=s=>(t=s.replace(/(([A-Z][a-z]?)|\(([A-Za-z]+)\))(\d+)/,(a,b,x,y,z)=>(x||y).repeat(z)))!=s?f(t):(m=new Map,s.match(/[A-Z][a-z]?/g).map(x=>m.set(x,-~m.get(x))),[...m].map(([x,y])=>x+": "+y).join`\n`)

Where \n is a literal newline character.

Ungolfed:

function f(str) {
 // replace all multiple elements with individual copies
 // then replace all groups with copies working outwards
 while (/([A-Z][a-z]?)(\d+)/.test(str) || /\(([A-Za-z]+)\)(\d+)/.test(str)) {
 str = RegExp.leftContext + RegExp.1ドル.repeat(RegExp.2ドル) + RegExp.rightContext;
 }
 // count the number of each element in the expansion
 map = new Map;
 str.match(/[A-Z][a-z]?/g).forEach(function(x) {
 if (!map.has(x)) map.set(x, 1);
 else map.set(x, map.get(x) + 1);
 }
 // convert to string
 res = "";
 map.forEach(function(value, key) {
 res += key + ": " + value + "\n";
 }
 return res;
}
answered Jan 16, 2016 at 1:22
\$\endgroup\$
1
\$\begingroup\$

Perl 5 -MList::Util=pairmap -n, 94 bytes

s/\(([^()]+?)\)(\d)/1ドルx2ドル/e&&redo;s/([A-Z][a-z]*)(\d?)/$k{1ドル}+=2ドル||1/ge;pairmap{say"$a: $b"}%k

Try it online!

answered Dec 22, 2020 at 1:35
\$\endgroup\$
1
\$\begingroup\$

Pip -l, (削除) 85 (削除ここまで) (削除) 77 (削除ここまで) 62 bytes

Takes the formula as a command-line argument, and uses the -l flag for proper output formatting.

_.": ".(_N Y(VaRl:XU+KXL`"&"`RXI`X&`R`[\d"][("]`{aJ'.}l))MUQy

Try it online!

The main trick is to transform the formula via regex replacements into a Pip expression. This, when eval'd, will do the repetition and resolve parentheses for us. We then post-process a bit to get the atom counts and format everything correctly.

Ungolfed, with comments:

 a is command-line arg (implicit)
l:`[A-Z][a-z]*` Regex matching element symbols
aR: l `"&"` Replace each symbol in a with symbol wrapped in quotes
aR: XI `X&` Add X before each number (XI is a regex matching integers)
aR: `[\d"][("]` {aJ'.} Insert . between a digit or " and a ( or "
Y (Va)@l Eval result, findall matches of l, and yank that list into y
_.": ".(_Ny) M UQy Map function to y uniquified:
 Each element, concatenated with ": ", concatenated with
 the count of that element in y
 Print that list, one element per line (implicit, -l flag)

Here's how the input Co3(Fe(CN)6)2 is transformed:

Co3(Fe(CN)6)2
"Co"3("Fe"("C""N")6)2
"Co"X3("Fe"("C""N")X6)X2
"Co"X3.("Fe".("C"."N")X6)X2
CoCoCoFeCNCNCNCNCNCNFeCNCNCNCNCNCN

Then:

["Co" "Co" "Co" "Fe" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "Fe" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N"]
["Co" "Fe" "C" "N"]
[3 2 12 12]
["Co: 3" "Fe: 2" "C: 12" "N: 12"]
answered Apr 22, 2016 at 4:59
\$\endgroup\$
0
\$\begingroup\$

JavaScript (Node.js), 133 bytes

x=>x.replace(/[A-Z][a-z]*|\d+|./g,c=>d=[c<'/'?c<')'?'}':'{':+c?`for(_ in"${Array(-~c)}")`:`o.${c}=-~o.${c};`]+d,d='',o={})+eval(d)&&o

Try it online!

Would remain shorter than the existing ones using same old ECMAscript

answered Apr 2, 2022 at 4:49
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.