The Challenge
Write a program that can break down an input chemical formula (see below), and output its respective atoms in the form element: atom-count.
Input
Sample input:
H2O
Your input will always contain at least one element, but no more than ten. Your program should accept inputs that contain parentheses, which may be nested.
Elements in the strings will always match [A-Z][a-z]*, meaning they will always start with an uppercase letter. Numbers will always be single digits.
Output
Sample output (for the above input):
H: 2
O: 1
Your output can be optionally followed by a newline.
Breaking Down Molecules
Numbers to the right of a set of parentheses are distributed to each element inside:
Mg(OH)2
Should output:
Mg: 1
O: 2
H: 2
The same principle applies to individual atoms:
O2
Should output:
O: 2
And also chaining:
Ba(NO2)2
Should output:
Ba: 1
N: 2
O: 4
Testcases
Ba(PO3)2
->
Ba: 1
P: 2
O: 6
C13H18O2
->
C: 13
H: 18
O: 2
K4(ON(SO3)2)2
->
K: 4
O: 14
N: 2
S: 4
(CH3)3COOC(CH3)3
->
C: 8
H: 18
O: 2
(C2H5)2NH
->
C: 4
H: 11
N: 1
Co3(Fe(CN)6)2
->
Co: 3
Fe: 2
C: 12
N: 12
Scoreboard
For your score to appear on the board, it should be in this format:
# Language, Score
Or if you earned a bonus:
# Language, Score (Bytes - Bonus%)
function getURL(e){return"https://api.stackexchange.com/2.2/questions/"+QUESTION_ID+"/answers?page="+e+"&pagesize=100&order=desc&sort=creation&site=codegolf&filter="+ANSWER_FILTER}function getAnswers(){$.ajax({url:getURL(answer_page++),method:"get",dataType:"jsonp",crossDomain:!0,success:function(e){answers.push.apply(answers,e.items),answers_hash=[],answer_ids=[],e.items.forEach(function(e){var s=+e.share_link.match(/\d+/);answer_ids.push(s),answers_hash[s]=e}),useData(answers)}})}function getOwnerName(e){return e.owner.display_name}function useData(e){var s=[];e.forEach(function(e){var a=e.body.replace(/<s>.*<\/s>/,"").replace(/<strike>.*<\/strike>/,"");console.log(a),VALID_HEAD.test(a)&&s.push({user:getOwnerName(e),language:a.match(VALID_HEAD)[1],score:+a.match(VALID_HEAD)[2],link:e.share_link})}),s.sort(function(e,s){var a=e.score,r=s.score;return a-r}),s.forEach(function(e,s){var a=$("#score-template").html();a=a.replace("{{RANK}}",s+1+"").replace("{{NAME}}",e.user).replace("{{LANGUAGE}}",e.language).replace("{{SCORE}}",e.score),a=$(a),$("#scores").append(a)})}var QUESTION_ID=58469,ANSWER_FILTER="!t)IWYnsLAZle2tQ3KqrVveCRJfxcRLe",answers=[],answer_ids,answers_hash,answer_page=1;getAnswers();var VALID_HEAD=/<h\d>([^\n,]*)[, ]*(\d+).*<\/h\d>/;
body{text-align:left!important}table thead{font-weight:700}table td{padding:10px 0 0 30px}#scores-cont{padding:10px;width:600px}#scores tr td:first-of-type{padding-left:0}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><link rel="stylesheet" type="text/css" href="//cdn.sstatic.net/codegolf/all.css?v=83c949450c8b"><div id="scores-cont"><h2>Scores</h2><table class="score-table"><thead> <tr><td></td><td>User</td><td>Language</td><td>Score</td></tr></thead> <tbody id="scores"></tbody></table></div><table style="display: none"> <tbody id="score-template"><tr><td>{{RANK}}</td><td>{{NAME}}</td><td>{{LANGUAGE}}</td><td>{{SCORE}}</td></tr></tbody></table>
Edit: Square brackets are no longer a part of the question. Any answers posted before 3AM UTC time, September 23, are safe and will not be affected by this change.
14 Answers 14
CJam, (削除) 59 (削除ここまで) 57 bytes
q{:Ci32/")("C#-"[ ] aC~* Ca C+"S/=~}%`La`-S%$e`{~": "@N}/
Try it online in the CJam interpreter.
How it works
q e# Read all input from STDIN.
{ e# For each character:
:Ci e# Save it in C and cast to integer.
32/ e# Divide the code point by 32. This pushes
e# 2 for uppercase, 3 for lowercase and 1 for non-letters.
")("C# e# Find the index of C in that string. (-1 if not found.)
- e# Subtract. This pushes 0 for (, 1 for ), 2 for digits,
e# 3 for uppercase letters and 4 for lowercase letters.
"[ ] aC~* Ca C+"
S/ e# Split it at spaces into ["[" "]" "aC~*" "Ca" "C+"].
=~ e# Select and evaluate the corresponding chunk.
e# ( : [ : Begin an array.
e# ) : ] : End an array.
e# 0-9 : aC~* : Wrap the top of the stack into an array
e# and repeat that array eval(C) times.
e# A-Z : Ca : Push "C".
e# a-z : C+ : Append C to the string on top of the stack.
}% e#
` e# Push a string representation of the resulting array.
e# For input (Au(CH)2)2, this pushes the string
e# [[["Au" [["C" "H"] ["C" "H"]]] ["Au" [["C" "H"].["C" "H"]]]]]
La` e# Push the string [""].
- e# Remove square brackets and double quotes from the first string.
S% e# Split the result at runs of spaces.
$e` e# Sort and perform run-length encoding.
{ e# For each pair [run-length string]:
~ e# Dump both on the stack.
": " e# Push that string.
@N e# Rotate the run-length on top and push a linefeed.
}/ e#
Python 3, (削除) 157 (削除ここまで) 154 bytes
import re
s=re.sub
f=s("[()',]",'',str(eval(s(',?(\d+)',r'*1,円',s('([A-Z][a-z]*)',r'("1円",),',input()))))).split()
for c in set(f):print(c+":",f.count(c))
Only supports input using regular brackets.
Before creating the golfed solution using eval above I created this reference solution, which I found very elegant:
import re, collections
parts = filter(bool, re.split('([A-Z][a-z]*|\(|\))', input()))
stack = [[]]
for part in parts:
if part == '(':
stack.append([])
elif part == ')':
stack[-2].append(stack.pop())
elif part.isdigit():
stack[-1].append(int(part) * stack[-1].pop())
else:
stack[-1].append([part])
count = collections.Counter()
while stack:
if isinstance(stack[-1], list):
stack.extend(stack.pop())
else:
count[stack.pop()] += 1
for e, i in count.items():
print("{}: {}".format(e, i))
Pyth, (削除) 66 (削除ここまで) 65 bytes
VrSc-`v::z"([A-Z][a-z]*)""('\1円',),"",?(\d+)""*\1,円"`(k))8j": "_N
Port of my Python answer. Only supports input using regular brackets.
-
3\$\begingroup\$ +1. Three answers in an hour? Nice. \$\endgroup\$Zach Gates– Zach Gates2015年09月22日 19:51:07 +00:00Commented Sep 22, 2015 at 19:51
JavaScript ES6, 366 bytes
function f(i){function g(a,b,c){b=b.replace(/[[(]([^[(\])]+?)[\])](\d*)/g,g).replace(/([A-Z][a-z]?)(\d*)/g,function(x,y,z){return y+((z||1)*(c||1))});return(b.search(/[[(]/)<0)?b:g(0,b)}return JSON.stringify(g(0,i).split(/(\d+)/).reduce(function(q,r,s,t){(s%2)&&(q[t[s-1]]=+r+(q[t[s-1]]||0));return q},{})).replace(/["{}]/g,'').replace(/:/g,': ').replace(/,/g,'\n')}
JS Fiddle: https://jsfiddle.net/32tunzkr/1/
I'm pretty sure this can be shortened, but I need to get back to work. ;-)
-
2\$\begingroup\$ I'm pretty sure it can be shortened as well. Since you clame to be using ES6, you could start by using the big-arrow notation to create functions. And the implicit
returnstatement. That should be enough for now. \$\endgroup\$Ismael Miguel– Ismael Miguel2015年09月22日 22:59:07 +00:00Commented Sep 22, 2015 at 22:59 -
\$\begingroup\$ You also use
replacea lot so you can save some bytes by usingxyz[R='replace'](...)the first time andabc[R] (...)each subsequent time. \$\endgroup\$DankMemes– DankMemes2015年09月23日 15:25:30 +00:00Commented Sep 23, 2015 at 15:25
SageMath, (削除) 156 (削除ここまで) 148 bytes
import re
i=input()
g=re.sub
var(re.findall("[A-Z][a-z]?",i))
print g("(\d+).(\S+)\D*",r"2円: 1円\n",`eval(g("(\d+)",r"*1円",g("([A-Z(])",r"+1円",i)))`)
Try it online here (hopefully link will work, might need an online account)
Note: If trying online, you will need to replace input() with the string (e.g. "(CH3)3COOC(CH3)3")
Explanation
Sage allows you to simplify algebraic expressions, providing they are in the right format (see 'symbolic manipulation' of this link). The regexes inside the eval() basically serve to get the input string into the right format, for example something like:
+(+C+H*3)*3+C+O+O+C+(+C+H*3)*3
eval() will then simplify this to: 8*C + 18*H + 2*O, and then it's just a matter of formatting the output with another regex substitution.
Python 3, 414 bytes
I hope that the order of the result doesn't count.
import re
t=input().replace("[", '(').replace("]", ')')
d={}
p,q="(\([^\(\)]*\))(\d*)","([A-Z][a-z]*)(\d*)"
for i in re.findall(q,t):t = t.replace(i[0]+i[1],i[0]*(1if i[1]==''else int(i[1])))
r=re.findall(p,t)
while len(r)>0:t=t.replace(r[0][0]+r[0][1],r[0][0][1:-1]*(1if r[0][1]==''else int(r[0][1])));r=re.findall(p,t)
for i in re.findall(q[:-5], t):d[i]=d[i]+1if i in d else 1
for i in d:print(i+': '+str(d[i]))
Javascript (ES6), (削除) 286 (削除ここまで) 284
Not that much shorter than the other ES6 one but I gave it my best. Note: this will error out if you give it an empty string or most invalid inputs. Also expects all groups to have a count of more than 1 (ie, no CO[OH]). If this breaks any challenge rules, let me know.
a=>(b=[],c={},a.replace(/([A-Z][a-z]*)(?![0-9a-z])/g, "11ドル").match(/[A-Z][a-z]*|[0-9]+|[\[\(]/g).reverse().map(d=>(d*1==d&&b.push(d*1),d.match(/\(|\[/)&&b.pop(),d.match(/[A-Z]/)&&eval('e=b.reduce((f,g)=>f*g,1),c[d]=c[d]?c[d]+e:e,b.pop()'))),eval('g="";for(x in c)g+=x+`: ${c[x]}\n`'))
Uses a stack-based approach. First, it preprocesses the string to add 1 to any element without a number, ie Co3(Fe(CN)6)2 becomes Co3(Fe1(C1N1)6)2. Then it loops through in reverse order and accumulates element counts.
a=>(
// b: stack, c: accumulator
b=[], c={},
// adds the 1 to every element that doesn't have a count
a.replace(/([A-Z][a-z]*)(?![0-9a-z])/g, "11ドル")
// gathers a list of all the elements, counts, and grouping chars
.match(/[A-Z][a-z]*|[0-9]+|[\[\(]/g)
// loops in reverse order
.reverse().map(d=>(
// d*1 is shorthand here for parseInt(d)
// d*1==d: true only if d is a number
// if it's a number, add it to the stack
d * 1 == d && b.push(d * 1),
// if there's an opening grouping character, pop the last item
// the item being popped is that group's count which isn't needed anymore
d.match(/\(|\[/) && b.pop(),
// if it's an element, update the accumulator
d.match(/[A-Z]/) && eval('
// multiplies out the current stack
e = b.reduce((f, g)=> f * g, 1),
// if the element exists, add to it, otherwise create an index for it
c[d] = c[d] ? c[d] + e : e,
// pops this element's count to get ready for the next element
b.pop()
')
)),
// turns the accumulator into an output string and returns the string
eval('
g="";
// loops through each item of the accumulator and adds it to the string
// for loops in eval always return the last statement in the for loop
// which in this case evaluates to g
for(x in c)
g+=x+`: ${c[x]}\n`
')
)
Perl, (削除) 177 (削除ここまで) 172 bytes
171 bytes code + 1 byte command line parameter
Ok, so I may have got a little carried away with regex on this one...
s/(?>[A-Z][a-z]?)(?!\d)/$&1/g;while(s/\(([A-Z][a-z]?)(\d+)(?=\w*\W(\d+))/2ドル.(3ドル*4ドル).1ドル/e||s/([A-Z][a-z]?)(\d*)(\w*)1円(\d*)/1ドル.(2ドル+4ドル).3ドル/e||s/\(\)\d+//g){};s/\d+/: $&\n/g
Usage example:
echo "(CH3)3COOC(CH3)3" | perl -p entry.pl
Mathematica, 152 bytes
f=TableForm@Cases[PowerExpand@Log@ToExpression@StringReplace[#,{a:(_?UpperCaseQ~~___?LowerCaseQ):>"\""<>a<>"\"",b__?DigitQ:>"^"<>b}],a_. Log[b_]:>{b,a}]&
The above defines a function f which takes a string as input. The function takes the string and wraps each element name into quotes and adds an infix exponentiation operator before each number, then interprets the string as an expression:
"YBa2Cu3O7" -> ""Y""Ba"^2"Cu"^3"O"^7" -> "Y" "Ba"^2 "Cu"^3 "O"^7
Then it takes the logarithm of that and expands it out (mathematica doesn't care, what to take the logarithm of :)):
Log["Y" "Ba"^2 "Cu"^3 "O"^7] -> Log["Y"] + 2 Log["Ba"] + 3 Log["Cu"] + 7 Log["O"]
and then it finds all occurrences of multiplication of a Log by a number and parses it into the form of {log-argument, number} and outputs those in a table. Some examples:
f@"K4(ON(SO3)2)2"
K 4
N 2
O 14
S 4
f@"(CH3)3COOC(CH3)3"
C 8
H 18
O 2
f@"Co3(Fe(CN)6)2"
C 12
Co 3
Fe 2
N 12
Java, 827 bytes
import java.util.*;class C{String[]x=new String[10];public static void main(String[]a){new C(a[0]);}C(String c){I p=new I();int[]d=d(c,p);for(int i=0;i<10;i++)if(x[i]!=null)System.out.println(x[i]+": "+d[i]);}int[]d(String c,I p){int[]f;int i,j;Vector<int[]>s=new Vector();while(p.v<c.length()){char q=c.charAt(p.v);if(q=='(')s.add(d(c,p.i()));if(q==')')break;if(q>='A'&&q<='Z'){f=new int[10];char[]d=new char[]{c.charAt(p.v),0};i=1;if(c.length()-1>p.v){d[1]=c.charAt(p.v+1);if(d[1]>='a'&&d[1]<='z'){i++;p.i();}}String h=new String(d,0,i);i=0;for(String k:x){if(k==null){x[i]=h;break;}if(k.equals(h))break;i++;}f[i]++;s.add(f);}if(q>='0'&&q<='9'){j=c.charAt(p.v)-'0';f=s.get(s.size()-1);for(i=0;i<10;)f[i++]*=j;}p.i();}f=new int[10];for(int[]w:s){j=0;for(int k:w)f[j++]+=k;}return f;}class I{int v=0;I i(){v++;return this;}}}
Git repository w/ ungolfed source (not perfect parity, ungolfed supports multi-character numbers).
Been a while, figured I'd give Java some representation. Definitely not going to win any awards :).
ES6, 198 bytes
f=s=>(t=s.replace(/(([A-Z][a-z]?)|\(([A-Za-z]+)\))(\d+)/,(a,b,x,y,z)=>(x||y).repeat(z)))!=s?f(t):(m=new Map,s.match(/[A-Z][a-z]?/g).map(x=>m.set(x,-~m.get(x))),[...m].map(([x,y])=>x+": "+y).join`\n`)
Where \n is a literal newline character.
Ungolfed:
function f(str) {
// replace all multiple elements with individual copies
// then replace all groups with copies working outwards
while (/([A-Z][a-z]?)(\d+)/.test(str) || /\(([A-Za-z]+)\)(\d+)/.test(str)) {
str = RegExp.leftContext + RegExp.1ドル.repeat(RegExp.2ドル) + RegExp.rightContext;
}
// count the number of each element in the expansion
map = new Map;
str.match(/[A-Z][a-z]?/g).forEach(function(x) {
if (!map.has(x)) map.set(x, 1);
else map.set(x, map.get(x) + 1);
}
// convert to string
res = "";
map.forEach(function(value, key) {
res += key + ": " + value + "\n";
}
return res;
}
Pip -l, (削除) 85 (削除ここまで) (削除) 77 (削除ここまで) 62 bytes
Takes the formula as a command-line argument, and uses the -l flag for proper output formatting.
_.": ".(_N Y(VaRl:XU+KXL`"&"`RXI`X&`R`[\d"][("]`{aJ'.}l))MUQy
The main trick is to transform the formula via regex replacements into a Pip expression. This, when eval'd, will do the repetition and resolve parentheses for us. We then post-process a bit to get the atom counts and format everything correctly.
Ungolfed, with comments:
a is command-line arg (implicit)
l:`[A-Z][a-z]*` Regex matching element symbols
aR: l `"&"` Replace each symbol in a with symbol wrapped in quotes
aR: XI `X&` Add X before each number (XI is a regex matching integers)
aR: `[\d"][("]` {aJ'.} Insert . between a digit or " and a ( or "
Y (Va)@l Eval result, findall matches of l, and yank that list into y
_.": ".(_Ny) M UQy Map function to y uniquified:
Each element, concatenated with ": ", concatenated with
the count of that element in y
Print that list, one element per line (implicit, -l flag)
Here's how the input Co3(Fe(CN)6)2 is transformed:
Co3(Fe(CN)6)2
"Co"3("Fe"("C""N")6)2
"Co"X3("Fe"("C""N")X6)X2
"Co"X3.("Fe".("C"."N")X6)X2
CoCoCoFeCNCNCNCNCNCNFeCNCNCNCNCNCN
Then:
["Co" "Co" "Co" "Fe" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "Fe" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N"]
["Co" "Fe" "C" "N"]
[3 2 12 12]
["Co: 3" "Fe: 2" "C: 12" "N: 12"]
JavaScript (Node.js), 133 bytes
x=>x.replace(/[A-Z][a-z]*|\d+|./g,c=>d=[c<'/'?c<')'?'}':'{':+c?`for(_ in"${Array(-~c)}")`:`o.${c}=-~o.${c};`]+d,d='',o={})+eval(d)&&o
Would remain shorter than the existing ones using same old ECMAscript
[HCl] = 0.01 mol L^-1. \$\endgroup\$>. \$\endgroup\$