9
\$\begingroup\$

A SMILES (Simplified molecular-input line-entry system) string is a string that represents a chemical structure using ASCII characters. For example, water (\$H_2O\$) can be written in SMILES as H-O-H.

However, for simplicity, the single bonds (-) and hydrogen atoms (H) are frequently omitted. Thus, a molecules with only single bonds like n-pentane (\$CH_3CH_2CH_2CH_2CH_3\$) can be represented as simply CCCCC, and ethanol (\$CH_3CH_2OH\$) as CCO or OCC (which atom you start from does not matter).

n-pentane:n-pentane

ethanol:ethanol

In SMILES, double bonds are represented with = and triple bonds with #. So ethene:

ethene

can be represented as C=C, and hydrogen cyanide:

HCN

can be represented as C#N or N#C.

SMILES uses parentheses when representing branching:

HCN

Bromochlorodifluoromethane can be represented as FC(Br)(Cl)F, BrC(F)(F)Cl, C(F)(Cl)(F)Br, etc.

For rings, atoms that close rings are numbered:

cyclohexane

First strip the H and start from any C. Going round the ring, we get CCCCCC. Since the first and last C are bonded, we write C1CCCCC1.

Use this tool: https://pubchem.ncbi.nlm.nih.gov/edit3/index.html to try drawing your own structures and convert them to SMILES, or vice versa.

Task

Your program shall receive two SMILES string. The first one is a molecule, the second is a substructure (portion of a molecule). The program should return true if the substructure is found in the molecule and false if not. For simplicity, only above explanation of SMILES will be used (no need to consider stereochemistry like cis-trans, or aromaticity) and the only atoms will be:

  • O
  • C
  • N
  • F

Also, the substructure do not contain H.

Examples

CCCC C
true
CCCC CC
true
CCCC F
false
C1CCCCC1 CC
true
C1CCCCC1 C=C
false
COC(C1)CCCC1C#N C(C)(C)C // substructure is a C connected to 3 other Cs
true
COC(C1)CCCCC1#N COC1CC(CCC1)C#N // SMILES strings representing the same molecule
true
OC(CC1)CCC1CC(N)C(O)=O CCCCO
true
OC(CC1)CCC1CC(N)C(O)=O NCCO
true
OC(CC1)CCC1CC(N)C(O)=O COC
false

Shortest code wins. Refrain from using external libraries.

asked Jun 12, 2020 at 6:09
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Could you add a few more complexer test cases like the last one? The first five test cases can be solved using a single contains builtin. \$\endgroup\$ Commented Jun 12, 2020 at 7:06
  • 1
    \$\begingroup\$ As for that last test case, the COC(C1)CCCC1C#N can't be pasted to the tool you've linked (it automatically changes to COC1CC(CCC1)C#N..) Also, would COC(C1)CCCCC1#N with CCCCCC result in truthy, since it does contain a substructure of six subsequence C-atoms (the entire circle (C1)CCCCC1, and the additional branch to a C in COC)? \$\endgroup\$ Commented Jun 12, 2020 at 7:07
  • 1
    \$\begingroup\$ @KevinCruijssen i will add more examples. The tool converts the SMILES string to a canonical SMILES string (it uses a standard algorithm that ensures a unique output). COC(C1)CCCCC1#N with CCCCCC will be truthy. COC(C1)CCCCC1#N with COC1CC(CCC1)C#N will be truthy. \$\endgroup\$ Commented Jun 12, 2020 at 7:17

1 Answer 1

3
\$\begingroup\$

Mathematica, 31 bytes

MoleculeContainsQ@@Molecule/@#&

Takes input as a list of 2 strings (the source and the pattern molecules).

As you could guess, this checks if the first molecule (parsed via Molecule) contains the second one by using the MoleculeContainsQ function.

This doesn't seem to work in the online interpreter on TIO; I'm not sure what I am doing wrong. It works on my local machine, though. Of course, this is not using an external library: it's completely built-in functionality!

answered Jun 12, 2020 at 6:19
\$\endgroup\$
1
  • 1
    \$\begingroup\$ i'm looking for answers that do some parsing, but as your answer uses built-in functionality, i guess you've found a loophole! +1 \$\endgroup\$ Commented Jun 12, 2020 at 6:25

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.