Molecules to Atoms

Question 1

The Challenge

Write a program that can break down an input chemical formula (see below), and output its respective atoms in the form element: atom-count.

Input

Sample input:

H2O

Your input will always contain at least one element, but no more than ten. Your program should accept inputs that contain parentheses, which may be nested.

Elements in the strings will always match [A-Z][a-z]*, meaning they will always start with an uppercase letter. Numbers will always be single digits.

Output

Sample output (for the above input):

H: 2
O: 1

Your output can be optionally followed by a newline.

Breaking Down Molecules

Numbers to the right of a set of parentheses are distributed to each element inside:

Mg(OH)2

Should output:

Mg: 1
O: 2
H: 2

The same principle applies to individual atoms:

O2

Should output:

O: 2

And also chaining:

Ba(NO2)2

Should output:

Ba: 1
N: 2
O: 4

Testcases

Ba(PO3)2
->
Ba: 1
P: 2
O: 6
C13H18O2
->
C: 13
H: 18
O: 2
K4(ON(SO3)2)2
->
K: 4
O: 14
N: 2
S: 4
(CH3)3COOC(CH3)3
->
C: 8
H: 18
O: 2
(C2H5)2NH
->
C: 4
H: 11
N: 1
Co3(Fe(CN)6)2
->
Co: 3
Fe: 2
C: 12
N: 12

Scoreboard

For your score to appear on the board, it should be in this format:

# Language, Score

Or if you earned a bonus:

# Language, Score (Bytes - Bonus%)

function getURL(e){return"https://api.stackexchange.com/2.2/questions/"+QUESTION_ID+"/answers?page="+e+"&pagesize=100&order=desc&sort=creation&site=codegolf&filter="+ANSWER_FILTER}function getAnswers(){$.ajax({url:getURL(answer_page++),method:"get",dataType:"jsonp",crossDomain:!0,success:function(e){answers.push.apply(answers,e.items),answers_hash=[],answer_ids=[],e.items.forEach(function(e){var s=+e.share_link.match(/\d+/);answer_ids.push(s),answers_hash[s]=e}),useData(answers)}})}function getOwnerName(e){return e.owner.display_name}function useData(e){var s=[];e.forEach(function(e){var a=e.body.replace(/<s>.*<\/s>/,"").replace(/<strike>.*<\/strike>/,"");console.log(a),VALID_HEAD.test(a)&&s.push({user:getOwnerName(e),language:a.match(VALID_HEAD)[1],score:+a.match(VALID_HEAD)[2],link:e.share_link})}),s.sort(function(e,s){var a=e.score,r=s.score;return a-r}),s.forEach(function(e,s){var a=$("#score-template").html();a=a.replace("{{RANK}}",s+1+"").replace("{{NAME}}",e.user).replace("{{LANGUAGE}}",e.language).replace("{{SCORE}}",e.score),a=$(a),$("#scores").append(a)})}var QUESTION_ID=58469,ANSWER_FILTER="!t)IWYnsLAZle2tQ3KqrVveCRJfxcRLe",answers=[],answer_ids,answers_hash,answer_page=1;getAnswers();var VALID_HEAD=/<h\d>([^\n,]*)[, ]*(\d+).*<\/h\d>/;

body{text-align:left!important}table thead{font-weight:700}table td{padding:10px 0 0 30px}#scores-cont{padding:10px;width:600px}#scores tr td:first-of-type{padding-left:0}

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><link rel="stylesheet" type="text/css" href="//cdn.sstatic.net/codegolf/all.css?v=83c949450c8b"><div id="scores-cont"><h2>Scores</h2><table class="score-table"><thead> <tr><td></td><td>User</td><td>Language</td><td>Score</td></tr></thead> <tbody id="scores"></tbody></table></div><table style="display: none"> <tbody id="score-template"><tr><td>{{RANK}}</td><td>{{NAME}}</td><td>{{LANGUAGE}}</td><td>{{SCORE}}</td></tr></tbody></table>

Edit: Square brackets are no longer a part of the question. Any answers posted before 3AM UTC time, September 23, are safe and will not be affected by this change.

Question 2

What are the allowed forms of input?

Question 3

@ZachGates It's better that we're allowed to support either, but keep in mind that square brackets is still incorrect. AFAIK in chemical formulae square brackets are only used to indicated concentration. E.g.: [HCl] = 0.01 mol L^-1.

Question 4

See the "Examples" section. Is there something specific you're asking about? @Oberon Inputs are denoted by a >.

Question 5

Challenge would be so much better if you had to output the elements in Atomic Number order :-)

Question 6

Just a note, the examples still have elements with multiple digit atom counts.

Question 7

CJam, (削除) 59 (削除ここまで) 57 bytes

q{:Ci32/")("C#-"[ ] aC~* Ca C+"S/=~}%`La`-S%$e`{~": "@N}/

Try it online in the CJam interpreter.

How it works

q e# Read all input from STDIN.
{ e# For each character:
 :Ci e# Save it in C and cast to integer.
 32/ e# Divide the code point by 32. This pushes
 e# 2 for uppercase, 3 for lowercase and 1 for non-letters.
 ")("C# e# Find the index of C in that string. (-1 if not found.)
 - e# Subtract. This pushes 0 for (, 1 for ), 2 for digits,
 e# 3 for uppercase letters and 4 for lowercase letters.
 "[ ] aC~* Ca C+"
 S/ e# Split it at spaces into ["[" "]" "aC~*" "Ca" "C+"].
 =~ e# Select and evaluate the corresponding chunk.
 e# ( : [ : Begin an array.
 e# ) : ] : End an array.
 e# 0-9 : aC~* : Wrap the top of the stack into an array
 e# and repeat that array eval(C) times.
 e# A-Z : Ca : Push "C".
 e# a-z : C+ : Append C to the string on top of the stack.
}% e#
` e# Push a string representation of the resulting array.
 e# For input (Au(CH)2)2, this pushes the string
 e# [[["Au" [["C" "H"] ["C" "H"]]] ["Au" [["C" "H"].["C" "H"]]]]]
La` e# Push the string [""].
- e# Remove square brackets and double quotes from the first string.
S% e# Split the result at runs of spaces.
$e` e# Sort and perform run-length encoding.
{ e# For each pair [run-length string]:
 ~ e# Dump both on the stack.
 ": " e# Push that string.
 @N e# Rotate the run-length on top and push a linefeed.
}/ e#

Question 8

Python 3, (削除) 157 (削除ここまで) 154 bytes

import re
s=re.sub
f=s("[()',]",'',str(eval(s(',?(\d+)',r'*1,円',s('([A-Z][a-z]*)',r'("1円",),',input()))))).split()
for c in set(f):print(c+":",f.count(c))

Try it online!

Only supports input using regular brackets.

Before creating the golfed solution using eval above I created this reference solution, which I found very elegant:

import re, collections
parts = filter(bool, re.split('([A-Z][a-z]*|\(|\))', input()))
stack = [[]]
for part in parts:
 if part == '(':
 stack.append([])
 elif part == ')':
 stack[-2].append(stack.pop())
 elif part.isdigit():
 stack[-1].append(int(part) * stack[-1].pop())
 else:
 stack[-1].append([part])
count = collections.Counter()
while stack:
 if isinstance(stack[-1], list):
 stack.extend(stack.pop())
 else:
 count[stack.pop()] += 1
for e, i in count.items():
 print("{}: {}".format(e, i))

Question 9

Pyth, (削除) 66 (削除ここまで) 65 bytes

VrSc-`v::z"([A-Z][a-z]*)""('\1円',),"",?(\d+)""*\1,円"`(k))8j": "_N

Try it online!

Port of my Python answer. Only supports input using regular brackets.

Question 10

+1. Three answers in an hour? Nice.

Question 11

JavaScript ES6, 366 bytes

function f(i){function g(a,b,c){b=b.replace(/[[(]([^[(\])]+?)[\])](\d*)/g,g).replace(/([A-Z][a-z]?)(\d*)/g,function(x,y,z){return y+((z||1)*(c||1))});return(b.search(/[[(]/)<0)?b:g(0,b)}return JSON.stringify(g(0,i).split(/(\d+)/).reduce(function(q,r,s,t){(s%2)&&(q[t[s-1]]=+r+(q[t[s-1]]||0));return q},{})).replace(/["{}]/g,'').replace(/:/g,': ').replace(/,/g,'\n')}

JS Fiddle: https://jsfiddle.net/32tunzkr/1/

I'm pretty sure this can be shortened, but I need to get back to work. ;-)

Question 12

I'm pretty sure it can be shortened as well. Since you clame to be using ES6, you could start by using the big-arrow notation to create functions. And the implicit return statement. That should be enough for now.

Question 13

You also use replace a lot so you can save some bytes by using xyz[R='replace'](...) the first time and abc[R] (...) each subsequent time.

Question 14

SageMath, (削除) 156 (削除ここまで) 148 bytes

import re
i=input()
g=re.sub
var(re.findall("[A-Z][a-z]?",i))
print g("(\d+).(\S+)\D*",r"2円: 1円\n",`eval(g("(\d+)",r"*1円",g("([A-Z(])",r"+1円",i)))`)

Try it online here (hopefully link will work, might need an online account)

Note: If trying online, you will need to replace input() with the string (e.g. "(CH3)3COOC(CH3)3")

Explanation

Sage allows you to simplify algebraic expressions, providing they are in the right format (see 'symbolic manipulation' of this link). The regexes inside the eval() basically serve to get the input string into the right format, for example something like:

+(+C+H*3)*3+C+O+O+C+(+C+H*3)*3

eval() will then simplify this to: 8*C + 18*H + 2*O, and then it's just a matter of formatting the output with another regex substitution.

Question 15

Python 3, 414 bytes

I hope that the order of the result doesn't count.

import re
t=input().replace("[", '(').replace("]", ')')
d={}
p,q="(\([^\(\)]*\))(\d*)","([A-Z][a-z]*)(\d*)"
for i in re.findall(q,t):t = t.replace(i[0]+i[1],i[0]*(1if i[1]==''else int(i[1])))
r=re.findall(p,t)
while len(r)>0:t=t.replace(r[0][0]+r[0][1],r[0][0][1:-1]*(1if r[0][1]==''else int(r[0][1])));r=re.findall(p,t)
for i in re.findall(q[:-5], t):d[i]=d[i]+1if i in d else 1
for i in d:print(i+': '+str(d[i]))

Try it online!

Question 16

Javascript (ES6), (削除) 286 (削除ここまで) 284

Not that much shorter than the other ES6 one but I gave it my best. Note: this will error out if you give it an empty string or most invalid inputs. Also expects all groups to have a count of more than 1 (ie, no CO[OH]). If this breaks any challenge rules, let me know.

a=>(b=[],c={},a.replace(/([A-Z][a-z]*)(?![0-9a-z])/g, "11ドル").match(/[A-Z][a-z]*|[0-9]+|[\[\(]/g).reverse().map(d=>(d*1==d&&b.push(d*1),d.match(/\(|\[/)&&b.pop(),d.match(/[A-Z]/)&&eval('e=b.reduce((f,g)=>f*g,1),c[d]=c[d]?c[d]+e:e,b.pop()'))),eval('g="";for(x in c)g+=x+`: ${c[x]}\n`'))

Uses a stack-based approach. First, it preprocesses the string to add 1 to any element without a number, ie Co3(Fe(CN)6)2 becomes Co3(Fe1(C1N1)6)2. Then it loops through in reverse order and accumulates element counts.

a=>(
 // b: stack, c: accumulator
 b=[], c={},
 // adds the 1 to every element that doesn't have a count
 a.replace(/([A-Z][a-z]*)(?![0-9a-z])/g, "11ドル")
 // gathers a list of all the elements, counts, and grouping chars
 .match(/[A-Z][a-z]*|[0-9]+|[\[\(]/g)
 // loops in reverse order
 .reverse().map(d=>(
 // d*1 is shorthand here for parseInt(d)
 // d*1==d: true only if d is a number
 // if it's a number, add it to the stack
 d * 1 == d && b.push(d * 1),
 // if there's an opening grouping character, pop the last item
 // the item being popped is that group's count which isn't needed anymore
 d.match(/\(|\[/) && b.pop(),
 // if it's an element, update the accumulator
 d.match(/[A-Z]/) && eval('
 // multiplies out the current stack
 e = b.reduce((f, g)=> f * g, 1),
 // if the element exists, add to it, otherwise create an index for it
 c[d] = c[d] ? c[d] + e : e,
 // pops this element's count to get ready for the next element
 b.pop()
 ')
 )),
 // turns the accumulator into an output string and returns the string
 eval('
 g="";
 // loops through each item of the accumulator and adds it to the string
 // for loops in eval always return the last statement in the for loop
 // which in this case evaluates to g
 for(x in c)
 g+=x+`: ${c[x]}\n`
 ')
)

Fiddle

Question 17

Perl, (削除) 177 (削除ここまで) 172 bytes

171 bytes code + 1 byte command line parameter

Ok, so I may have got a little carried away with regex on this one...

s/(?>[A-Z][a-z]?)(?!\d)/$&1/g;while(s/\(([A-Z][a-z]?)(\d+)(?=\w*\W(\d+))/2ドル.(3ドル*4ドル).1ドル/e||s/([A-Z][a-z]?)(\d*)(\w*)1円(\d*)/1ドル.(2ドル+4ドル).3ドル/e||s/\(\)\d+//g){};s/\d+/: $&\n/g

Usage example:

echo "(CH3)3COOC(CH3)3" | perl -p entry.pl

Question 18

Mathematica, 152 bytes

f=TableForm@Cases[PowerExpand@Log@ToExpression@StringReplace[#,{a:(_?UpperCaseQ~~___?LowerCaseQ):>"\""<>a<>"\"",b__?DigitQ:>"^"<>b}],a_. Log[b_]:>{b,a}]&

The above defines a function f which takes a string as input. The function takes the string and wraps each element name into quotes and adds an infix exponentiation operator before each number, then interprets the string as an expression:

"YBa2Cu3O7" -> ""Y""Ba"^2"Cu"^3"O"^7" -> "Y" "Ba"^2 "Cu"^3 "O"^7

Then it takes the logarithm of that and expands it out (mathematica doesn't care, what to take the logarithm of :)):

Log["Y" "Ba"^2 "Cu"^3 "O"^7] -> Log["Y"] + 2 Log["Ba"] + 3 Log["Cu"] + 7 Log["O"]

and then it finds all occurrences of multiplication of a Log by a number and parses it into the form of {log-argument, number} and outputs those in a table. Some examples:

f@"K4(ON(SO3)2)2"
K 4
N 2
O 14
S 4
f@"(CH3)3COOC(CH3)3"
C 8
H 18
O 2
f@"Co3(Fe(CN)6)2"
C 12
Co 3
Fe 2
N 12

Question 19

Java, 827 bytes

import java.util.*;class C{String[]x=new String[10];public static void main(String[]a){new C(a[0]);}C(String c){I p=new I();int[]d=d(c,p);for(int i=0;i<10;i++)if(x[i]!=null)System.out.println(x[i]+": "+d[i]);}int[]d(String c,I p){int[]f;int i,j;Vector<int[]>s=new Vector();while(p.v<c.length()){char q=c.charAt(p.v);if(q=='(')s.add(d(c,p.i()));if(q==')')break;if(q>='A'&&q<='Z'){f=new int[10];char[]d=new char[]{c.charAt(p.v),0};i=1;if(c.length()-1>p.v){d[1]=c.charAt(p.v+1);if(d[1]>='a'&&d[1]<='z'){i++;p.i();}}String h=new String(d,0,i);i=0;for(String k:x){if(k==null){x[i]=h;break;}if(k.equals(h))break;i++;}f[i]++;s.add(f);}if(q>='0'&&q<='9'){j=c.charAt(p.v)-'0';f=s.get(s.size()-1);for(i=0;i<10;)f[i++]*=j;}p.i();}f=new int[10];for(int[]w:s){j=0;for(int k:w)f[j++]+=k;}return f;}class I{int v=0;I i(){v++;return this;}}}

Git repository w/ ungolfed source (not perfect parity, ungolfed supports multi-character numbers).

Been a while, figured I'd give Java some representation. Definitely not going to win any awards :).

Question 20

ES6, 198 bytes

f=s=>(t=s.replace(/(([A-Z][a-z]?)|\(([A-Za-z]+)\))(\d+)/,(a,b,x,y,z)=>(x||y).repeat(z)))!=s?f(t):(m=new Map,s.match(/[A-Z][a-z]?/g).map(x=>m.set(x,-~m.get(x))),[...m].map(([x,y])=>x+": "+y).join`\n`)

Where \n is a literal newline character.

Ungolfed:

function f(str) {
 // replace all multiple elements with individual copies
 // then replace all groups with copies working outwards
 while (/([A-Z][a-z]?)(\d+)/.test(str) || /\(([A-Za-z]+)\)(\d+)/.test(str)) {
 str = RegExp.leftContext + RegExp.1ドル.repeat(RegExp.2ドル) + RegExp.rightContext;
 }
 // count the number of each element in the expansion
 map = new Map;
 str.match(/[A-Z][a-z]?/g).forEach(function(x) {
 if (!map.has(x)) map.set(x, 1);
 else map.set(x, map.get(x) + 1);
 }
 // convert to string
 res = "";
 map.forEach(function(value, key) {
 res += key + ": " + value + "\n";
 }
 return res;
}

Question 21

Perl 5 `-MList::Util=pairmap -n`, 94 bytes

s/\(([^()]+?)\)(\d)/1ドルx2ドル/e&&redo;s/([A-Z][a-z]*)(\d?)/$k{1ドル}+=2ドル||1/ge;pairmap{say"$a: $b"}%k

Try it online!

Question 22

Pip `-l`, (削除) 85 (削除ここまで) (削除) 77 (削除ここまで) 62 bytes

Takes the formula as a command-line argument, and uses the -l flag for proper output formatting.

_.": ".(_N Y(VaRl:XU+KXL`"&"`RXI`X&`R`[\d"][("]`{aJ'.}l))MUQy

Try it online!

The main trick is to transform the formula via regex replacements into a Pip expression. This, when eval'd, will do the repetition and resolve parentheses for us. We then post-process a bit to get the atom counts and format everything correctly.

Ungolfed, with comments:

 a is command-line arg (implicit)
l:`[A-Z][a-z]*` Regex matching element symbols
aR: l `"&"` Replace each symbol in a with symbol wrapped in quotes
aR: XI `X&` Add X before each number (XI is a regex matching integers)
aR: `[\d"][("]` {aJ'.} Insert . between a digit or " and a ( or "
Y (Va)@l Eval result, findall matches of l, and yank that list into y
_.": ".(_Ny) M UQy Map function to y uniquified:
 Each element, concatenated with ": ", concatenated with
 the count of that element in y
 Print that list, one element per line (implicit, -l flag)

Here's how the input Co3(Fe(CN)6)2 is transformed:

Co3(Fe(CN)6)2
"Co"3("Fe"("C""N")6)2
"Co"X3("Fe"("C""N")X6)X2
"Co"X3.("Fe".("C"."N")X6)X2
CoCoCoFeCNCNCNCNCNCNFeCNCNCNCNCNCN

Then:

["Co" "Co" "Co" "Fe" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "Fe" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N"]
["Co" "Fe" "C" "N"]
[3 2 12 12]
["Co: 3" "Fe: 2" "C: 12" "N: 12"]

Question 23

JavaScript (Node.js), 133 bytes

x=>x.replace(/[A-Z][a-z]*|\d+|./g,c=>d=[c<'/'?c<')'?'}':'{':+c?`for(_ in"${Array(-~c)}")`:`o.${c}=-~o.${c};`]+d,d='',o={})+eval(d)&&o

Try it online!

Would remain shorter than the existing ones using same old ECMAscript

Dennis 212k41 gold badges380 silver badges830 bronze badges · Accepted Answer · 2015-09-23 06:19:46Z

CJam, (削除) 59 (削除ここまで) 57 bytes

q{:Ci32/")("C#-"[ ] aC~* Ca C+"S/=~}%`La`-S%$e`{~": "@N}/

Try it online in the CJam interpreter.

How it works

q e# Read all input from STDIN.
{ e# For each character:
 :Ci e# Save it in C and cast to integer.
 32/ e# Divide the code point by 32. This pushes
 e# 2 for uppercase, 3 for lowercase and 1 for non-letters.
 ")("C# e# Find the index of C in that string. (-1 if not found.)
 - e# Subtract. This pushes 0 for (, 1 for ), 2 for digits,
 e# 3 for uppercase letters and 4 for lowercase letters.
 "[ ] aC~* Ca C+"
 S/ e# Split it at spaces into ["[" "]" "aC~*" "Ca" "C+"].
 =~ e# Select and evaluate the corresponding chunk.
 e# ( : [ : Begin an array.
 e# ) : ] : End an array.
 e# 0-9 : aC~* : Wrap the top of the stack into an array
 e# and repeat that array eval(C) times.
 e# A-Z : Ca : Push "C".
 e# a-z : C+ : Append C to the string on top of the stack.
}% e#
` e# Push a string representation of the resulting array.
 e# For input (Au(CH)2)2, this pushes the string
 e# [[["Au" [["C" "H"] ["C" "H"]]] ["Au" [["C" "H"].["C" "H"]]]]]
La` e# Push the string [""].
- e# Remove square brackets and double quotes from the first string.
S% e# Split the result at runs of spaces.
$e` e# Sort and perform run-length encoding.
{ e# For each pair [run-length string]:
 ~ e# Dump both on the stack.
 ": " e# Push that string.
 @N e# Rotate the run-length on top and push a linefeed.
}/ e#

Stack Exchange Network

Molecules to Atoms

The Challenge

Input

Output

Breaking Down Molecules

Testcases

Scoreboard

14 Answers 14

CJam, (削除) 59 (削除ここまで) 57 bytes

How it works

Python 3, (削除) 157 (削除ここまで) 154 bytes

Pyth, (削除) 66 (削除ここまで) 65 bytes

JavaScript ES6, 366 bytes

SageMath, (削除) 156 (削除ここまで) 148 bytes

Explanation

Python 3, 414 bytes

Javascript (ES6), (削除) 286 (削除ここまで) 284

Perl, (削除) 177 (削除ここまで) 172 bytes

Mathematica, 152 bytes

Java, 827 bytes

ES6, 198 bytes

Perl 5 `-MList::Util=pairmap -n`, 94 bytes

Pip `-l`, (削除) 85 (削除ここまで) (削除) 77 (削除ここまで) 62 bytes

JavaScript (Node.js), 133 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Molecules to Atoms

The Challenge

Input

Output

Breaking Down Molecules

Testcases

Scoreboard

14 Answers 14

CJam, (削除) 59 (削除ここまで) 57 bytes

How it works

Python 3, (削除) 157 (削除ここまで) 154 bytes

Pyth, (削除) 66 (削除ここまで) 65 bytes

JavaScript ES6, 366 bytes

SageMath, (削除) 156 (削除ここまで) 148 bytes

Explanation

Python 3, 414 bytes

Javascript (ES6), (削除) 286 (削除ここまで) 284

Perl, (削除) 177 (削除ここまで) 172 bytes

Mathematica, 152 bytes

Java, 827 bytes

ES6, 198 bytes

Perl 5 -MList::Util=pairmap -n, 94 bytes

Pip -l, (削除) 85 (削除ここまで) (削除) 77 (削除ここまで) 62 bytes

JavaScript (Node.js), 133 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

Perl 5 `-MList::Util=pairmap -n`, 94 bytes

Pip `-l`, (削除) 85 (削除ここまで) (削除) 77 (削除ここまで) 62 bytes