6
\$\begingroup\$

This question is part of a series solving the Rosalind challenges. For the previous question in this series, see Wascally wabbits. The repository with all my up-to-date solutions so far can be found here.


Problem: PROT

The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings.

The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet.

Given:

An RNA string \$s\$ corresponding to a strand of mRNA (of length at most 10 kbp).

Return:

The protein string encoded by \$s\$.

Sample Dataset:

AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA

Sample Output:

MAMAPRTEINSTRING

My solution solves the sample dataset and the actual dataset given.

Dataset:

AUGCGCCCUUGGUCGCUCCUUGGAUCAGAGCAUAUUCUAUCACGGCGCGUCGAAGGAAUAACCCACGACAUCUCUCUAUAUUGGAUUCCCUUUUUUUCGGUUCAGGUAGAUCAUUUGGCUACUGGACUUUCUAAGAUUUACUCCGCCAUGUUCCUAUAUGUUACAUUCUCAGCCGAAGUCGCUGUAUAUCACGUUAAGGUAGACGGUUCCUUGACUACCAGCGACGCCUGUAGGGAGAAUUCCAUCCAUCAUGCAUGUAUGGGCAGUGCGCACUUACAGCGCCAUAGGCAACGGACGGACAGACCUCCUUUCCUGUCGGACGGUAAGCCGCGAUCCAAUACAGAGCAAAGUCCCACGCCCUCCUAUAGACUCACGCCAAGAUUGUAUUCCCCGUUAACCGCUCUCUCAGGGAAGUUGUAUCUACUCGGAUCGGGAUGUCCUUGGAAAUGUAGGAAAAUGGCUCAAACUACGAUUGUAUACCGUGCGAGACGUUGGAUCCCGCUUAUCACUGAUACCAUAAUCUGUGUGGCCCCCUUACCACAACCUAACCAUGGAGUAGUAGCCCUGGCCGUCCCUUCAAGGGCAAGACCUCAUUGUCUUGUACGCUUAUCACAAGGGCCAUCUAACAAUGUGUACCGGUAUAAUUUUACGUGGUAUUGUCCAGACGGCGGUACGGCCGAUCCGUUGCCAUUUCGUCAUGGCAUAACCUCGGUCUAUCUUCCUUCCUACUCGGGAAUAGUUCGCAGUACACCAUACCUCAUCGGCACUUACGCUGUUCCAACACAAAAUUCUGAUCCCUUCGCUACCACCCGCUGGGUAUCUGUCAGGUUACUGGCCUCCACUACCGGAGAGGGCGAUACGGGGGACCGCGGAACACUUUCUACAUUUUUGGACUGCUGUAUGUCUACUUCGGCUCUUCCUCCCGCGAGAUUUAUAUCGGCAUACAGAGUAAACGCUACCCAUGGCGACGACACUCGUCUCACCGUUAAGAAGCUGUGCACACCUUAUAAAAGCCUUUGGUCAGGUAAUGUAUGUAGACCUCUAUCGGUCUAUAAGUUGAAGGAAAGAUAUAUAGUCCUCACCGUUGGUUAUCAUUCCCUCCUAUCCCGCAUGUCCACCCUAAGGAAAACUUACUCCGUGGACCAAGCGGGACCGGCAAGGCCCUUGGUUGCCGAAAAAUCAGGAGUUUGCGAGUGCGACUAUGAUCCCAGGUUUGUAAUCGUUAUCGCGACUGUUCAUUUCGGGAAGGCUACUGGGCUGCAUUCAAAGGCGGACCGCUGCUCUCGCUGGGUCGGGUCCAGUGGACGGGUCGAGGCUGCUCAGCUCCAUCAUCAAAGAAAUAAGAAGCCCACCGGAAGAGACAGUGAUGCCGAGAAGAUGUCCGGCAAAGGCAGACUCUGGUCCCAAGUUAGACUGGGUGGUUUAGAGAUCCAGAUAGGUCGUCGUUACCAGUUCUACGAAUACUUUCUUGGUAAACUGAUUAUCCGUCAAAGGCCGACUCACGAGAGGACACAGGGUGCAAAGAAACCCUCACAGAAGGGAACGCGAUUGGCGACACGUAUGGCGCUCGCGUGGUGUGACAGGUUUGGUAGGAUCUAUAUCCCCCAGUGCAACAUAUUAGUUCACACUAUAAUGAAGGUCCGAUUGCACCAAACAGCCUGCGAUGAUAACACGUGGACUGCUGGAGAGUAUGACUUGUACGACUGCACGCGCGAUGUACCCAACAUCGUCCUGUCCCUACCGCACCAUCUUUAUGAACUGCUGGUCUCUGAUGCACUCCCCGCCCCCCACCUCCUCUUUCUGGGGGAAUCUGGUUCCGCCAGGCAUAGGACACGUUCGGUUGGACUAACUAUUGCUAACUACACGAUCAUUCGUGAUAGGUGUCGGUCCACAUGUAUAAGGUUGGAUGAACCUAACCCCUACCGACGAUUUGAUGUAUUGUGGCCACUACUAAUGACCCCCGCUCGUAACACCCAAACACGCGCAUUUCUCUGCUCCGUGGCUGGCUGGAAUUGCCAGUAUUACAGACCCCCUGACAGAUCCAGUAGUAGGACGUACUUGAUCGCACUACAUUUAGCAAUCAAAUUGCGUUCACGCUACCCCAAUCGUUCAGAUCUGGCUUAUCCCACAUCUGUAAACAGGAACACGGUGUAUAUCACCGUAGUCUCUCUACGCACAGCGAAACUAAGAUACAACAGUUACACACCCAGCAAACUGCGGCCCGAUCAAGGCAAUGACCCCAGCUACCGAACUUCGGAAAGAGGGAAUUUCGUCCGGAACUUGCCAUCAGUGAAACCGUACCGAGACUUCAUGAAAACGAUCAGCAUUUCCUUUACAGGAUUCCGGACCAAAUUUGAUCGUCAAAUUGGGAAUUCCAUAGGCCGAGGAUUCACGGGAGCAAGGCCGACCUCAUUGAAGCGUCAUAGUCGCUUUUCGCUCACCGUACAUUCUAGCAAGCGCUAUCUCUCCCGCCUCAACGCCUACGUCUCUUUUACAAUAAAACAUCGGAUAACGAGAUUCGACGUGGCUACGCGCGAAGUUAAGGCUUCCGCCGUGGUACCUAAUGCGGAAAUAGUCCGGAAAAGUACGAAGGUGUGCUGGUUUAUGUGCAUCAUCAGACUCCAAACUGUCCGAGCUACCAAACCGACCAUUCAGAAGCAGUUGUUGAGAUUAAGUGGCCCGUUUCAAUGCGGGGAUUCCACCAAUUACGUACAACCUGGUUACUUCCACAGUUCAAACGCGCCCCGCCCGGCGUGUGUCAGUGUGUGUAUCAGCCCGGGGUUAUGGGACCUGUUGGUAAAAACCCGGAAUGCUUUCUCCCGUGUGCACGGGGGUACUAUCCUUACUUUAGUUCAGCAUGACAUUCAUAAAGUAGAAUUAUCGUCAGCAUGCACUCGCGAGCGGGCUACCAACCUGGCAAUGACUGAAAGCGUAACGUCAUACUCUCAGUGCGAGGCCUCGACUCGCCAUACCGAAAUACAAGCUGUUAGCUCAAUUGUGUAUCUCCACUUGACUGCGGCCCGCAGGGAGAAACACACAGAGAAGAGGGCCGACGCGAAGCAUCAUGUGUCUACUUGGCGCGAGGGUAAAACCGAAAACGUCAUUGGAAGGCUCAGAUCGUCACAUACAUUAACCCUUAGGUUACAUCCAUCCUCGUUGGACAACUGGCCGUUCAUUCUUGGGGAAUGCCAGAGAGGAACGGAUAUCGAGGAACGCAUCCCGGCACAUGCGGAAUGUACAGACAAGGCUGUAGGCGUAGCCUUUCAGUCGACGCUAUGGGCAGAUUCGGCGAAUCCGCGAGGUGGAUCUCGCUUGAGAAGAGGGAUUAGGGGCCCCAACGCGAUGAAUAUUGAAUGCGGAUAUUAUCUGGCGAGACAGCUUCUUGACCGCUCUUGUAGUCGCAAGAUAGGCGAGACUCUAAGACAAACUAGUUCCCGCACGCCAUUGCCAUGCAAGCGAGGCCGCGUCCCAAAACCCCUGGAACCUAAAGAGUCGAACAGGUCAGGAGCAAGUAGCGUAUGGAUACCAGUCGGAGUUAGGCUGGGUCCCUCCGCUGCGAAGACUCCGCCCUGGCGACAUGGUCGCCCGCGACAACUUCUAAUCUCUCCUCAAGUUAUUCCGUUAAGACACCCGAGCAAACGAGCUAGUCAAAGGGAUCAGUGCGAGCUCCCAUCAUGUCCUGAGUACAAGACCCCAGUGUGCCGACUUGCUUUGGGUAGCUCCAGAAUGGUUCACGAAUUAGCCCUUAAGAUGCUCUCCCCUGUUCCGGAGUUCGUGUGGAGGGUCGGAGGCGGGAAAGUCUAUUUAACGGCGGACCCACCUAGGGUAAAUCUGACGCAUAUGUCUGAACACGCCCUGGUGGUACCAGGAGUUUCCCUAUGGGCCCUUUUUCUUUUAAGACCUUUAUUUUUCCAUCUCCAACCUCGAUUAUCGACAACAUACCGUUGGCGCAGACACUUAUGCUUACCCGUUCAACUGCAUUCGUACAGGCUGGGUGAUAUGCAAUUAGGAGCAUCUCGACGUUGGGUAGGCCCCCGAAAUAUAUGGAGACAGGGUGUGUAUGCGUGGGAGAUAUAUGAGAUUCGAACUGUACCGGCUCCAAGGGUGUCACUGUUCCGCCGUUGGAAGGAAAACACUUACACCCUCUUUGGAUCAGGGGAAAUUACAGCGAAUGUUAAGACGGCUAUGUAUCGGAUAACGUCACAUCCGUUUAGACUGUAUGCGGGCGCCCGAGAAUUCAGUCCAUUCCGACUAAACGAGAAAAAGUUUGCCCCCGCGGGGAUUACGUACAAAACCGGAUGCGAUUAUAGCCGUUCUGGAGAACUCUGUGAGGGGCGGGGAAGGAAAAAUAGUUUUAUGUACCAUUGGGCGGCCCUUCCUCUCCAUGCACAUAAAACAAACAGCCUCAUUGAUUUCUACACUCCGUGCAACCCAAAUGCGGCUGCUGACAUGCUAGUGCGUAGUACAGAAGAUGCCCGAGCUCGAAUACAUUGCAGAUAUUGGGUUAACAAUUCGUUUUGCAAAUAUAUAUGUUGGCACUCCAGGUUCUUGAUAGAACCGAUUCAGAAGAAAUGGUGUACACCCAUUGAGAGGCGUCGCCCCGUAAUUAACGGGGAUGUCUUAAACGGGUCAGAGGUAACUACUAAGACGCGGUGCUGUUUCAGAUGGGCAAGUCAUACGGGCCGUUCUUACGGAAGAGAUCGUGCUAAUACAAACCUGCUUGUUAUGGGCGACGCAUCGCCCGAGGGGGGCGCGAAUCGGAGACUUGCAAGUACGACCGGUGGAUUCGUGCAAUUUAAGGUAUACAUUUCACGCGGAGACCCCGGGAAGGAGCUCCCCUACAUAACACGAAUAUCACCCGGCCGAAUUAGGGCUCGACGGUCCUUCCGCCUAAUGUGUGCAGUGAACGAGUUGGUGCCUGAAGAUGGUUUCACUCACAGGCGCCAAAGUACGACUCCUCCUUCCCGAUCAGUUUGCGACGGGCCUGUCCGCUUUAGAAUAAAGACCCACUUCCAAACUUCGACCGGCUGGGGGAAACAUUGGAGCAGCUUCCAAUGUGAUCAAAACUGUAGCAGAUCUCUAGCAUACUUCAAACAGGAAGUUACUGUAUGUGGUGCACGACCUGGCCAAGGUAGUUUCCUCCCCCUAGGACUGGUGAACGAUGGCGGGUGGAUUGUCAUUCAUAGUGAGAGAUUAGCCGUGCCUGCUUAUGACGGGAUCGGCGAUCUAGUAAGCGAUUCCAAAUCGAACAGCAUGCGCCGAGGGGACACUUACUUGGAGGUACUUAUCCGAGCGAAAAGGAGGGAGCCCAUAUCCAAAUGCGCUAGUAGAGGAGCACUGUCGAGUCAUGACCGCGGCCAUUCACUCGUAAGUACAGGGACACACUUCCAUAUCCUCCGGGGACUGAUGGGUAUUCGUACGAUUCGGCUGAGCGGGUCGCGGGACCCUACGGUCCGAACGUCUCGCGAGGGGUGUCAGGCUCCUCAUUUCAUGGUGCAGUCUGUCUGGAAGCCGACUACACUACGGGGUAGUCCUGCCCUUGAUAAUGCUAAUGAGAGUAACUCACGCCCCGCCCAUAACAAAGGCCGGGGGCCCUCUCAAUCAAACAGGCGGACGGGGAAUGUCAGGCAGGUUGUCGUUGGCAGAGUUACGCUCUCGCAGGAGAUUAAUCCUUUUGUAAAGCAUUUGGAACUAGUCCCCGGCUAUUAUUUAGCUGAGUAUCCAAUGCCUAGAAGCCUUGCGUCCCGUUCUAACCUGCGCGUAAUUCAUACAUCGCAUGAGAGAGCAAGGCAAACAAUCCAUUCGCCUGGCAAGAGAAACCGAGGAGCAAGUCACCGAACGCCCGCCGGGAAGCACCGCGAGUACCCACGACAAAACAGCUGCUUGGACUAUUAUGAACCCUCCAUACGUAGGAAGGAGGCCUAUGGGUGCGUCAAUAACGCACUCCCUGAUUGUCCUGACAAGGACGAUCGCGAAUGGACGCGCUCGCAAUCCAUGAUUGAAAUGUCCAGACCAACCGAGUCCCUGCUCAGUGCCUCCUGGCAUCGGCCAUUGGUUCUUGGAAGCCUCAACUACGGAUUCAUCACUGACCCCGUGGCGCUCACUGGUCAAAGAAAACUAGGAUGCCGUGGAAUGAUGAACACGUUAAUGUUAAUAUGGAACCAUCAUUUCGGCCCCUAUGGUUCAACCCCAAGAUUAGUUUUCGUUUGUGAAGCCAAGCGGCACCGGGGAUCUUGGGCAAACUACACUGAAGCAAAACUCCCUUCCUAUUAUGUAAUAACACUGGCACAGGGUCUUGGCCCGCGCGCUGGGCUCCACCACAACGUGUACUGUCUUCACCCUCAACGAGUUUUCCACUUCUGUCCCUUCGUAUCAGUUCACUUGCAAUUCCUAUCCCAUGUUUCGACUAGCCCAAGCGCUAAGUGUGCCCGCCUAGAUCCAGUCCAUCUUCCGGCUGAGGUGGGGAUCGUCAAACCUGCGGGGCGGAUUAAGAAGUCAUUUGUUGGUGGCGCGGGGCCUCUCAGAAUGUUAAAUAGGCAACGUGUAUGUUCGGUGGGGCCUGGAAGUGGACCGCCGUCCGUGGCGGAUUGUGCCAAAUUAACGGCUGAAGUGGAGUGGACUUCCAUCCACCCAGCUGCAGCAGAUCGGGGGUUAUCCCAAAGCACCAUCCAUGCCAGCAUGAUGCUGACUCACCAAAUAAGCUUCACUGAAUGCGACAAGUUCGCGCAAAUGGCUCAGAGCAGCGUGUCCCACACCGUGGGACAAAGGGUAUACUCGACUUCUCCACCUUGCGCGAAACCUGGCCCCGCUGGAUACAGACUGAUCAGUUCUAUCGAAUGUACCGUGCACAAAUGUAAACGUCGACAUAUGGCCGGCGCGCUACUGCGGCCCCGGGAACCUGGCCUCCUACCCGAUGACAAUGUAUCACCCGUCCCUCUCCGGUACGGCGAUAAUAUAUUGGCUCAUCGGGUGGCAUCUUACUCCCGGCGUACUUCCGACCCGUCGCAUCAAGUCCGAUCUGACCAAUUCUGGGACAUUAAUGUACAACCACCUAUACCCUUCUUCGCACCUCUGCUGAAUUUGGCUUCACGAACCUAUGGGCGAGGAGCGCUGCUGUCCCCGCCGGAACCACAGAUUCACGCUGCCACUAUGGCUUCAGCUAGAUGUGAGUCAAAUAAUAGAUCAGUACUCGUUAUGCGGCAUGAUCAUGAAGGUAAACUGCCCUUGCACCGAUCCAAGCUAAGCGGGCUAGCUGUAAUCCUUAGCCGGGGAUCUUCCGAUGUAUGUGCCCCCUCGGACAUGAAACACAUCCACAGUGGAGAUAGACAAAUGACUGAGGAGCUUAGAUUUCUGGAGAACAAAAACUUGAUGGGCUUAAGAUAUGGUCUAUACUCAUUAACAUCGAGAUGUGCUCGAAACGUCGAUAGACUCAUUCCUUUUAUUCGCCUACAGCAAGUGUUCGGGGAAUCAAAGUUGGAGUCACUUGCCCCAGGGGUCAAGCCGCUCCCGAUUUUCGUCGAGCGUCGUAGGAUGUGGCCGCCGGUUAUAUGGAUAAGUAUACGUUGCGGACACCAGACCAGACCCUAUAUACGAGACCGUUCUGCAGCUAAAUGUCGGGGGGGCCAGUCGCGGCCCGCCCUCUCUAAACAACUUAUUUACGUUCGGCGGGUAAGGCAGUGGCGGUUACAUCCAGGCAGACAGAUGGUCCUUGGUCAUACGUUCGCGAGCUCUUUCCGUCAGGAACUCUCCGCACAGCAUACUGCAACUCGCCGGAUUACAAGACCCCUUAGUGCUCCCUUUUAUGUACGUCCCCGGCCCUGGACUGGCGGUACUGUCGAGCUUUGCAUUUUUAGAGGCGCCUCAUGCAGGACUUCAGAAUUCGGCAAGGGAGCUACCCCCAAAGAGCUCCUCGUAAUGAACGGGUUCCUAGUGGUGUAUUACCCAGCCGGACAAAGGCCCGGUCUAACGUCUUUUGUCCGUUCGCAUUCCAUACGUCCCGUGUACGCCGAGCUACUCGGUAGUGAAACUAGGCGAGACUUGCGGAGGUCUUUUUGGUCAGUAAACGUAGUACUUGGUGUAUAUCGUCACUUACGCCACGGCACAAGGCAAAGGAGUGCGUCAUCCGGAUUGAAGGGCACUCUCAAGGUUGAUUCGCCAAUGGGUGUUGUUCGUCGCAAACCGAACCCGAUCCACUUUUUACCCUGGAAAGGGGUGUCAAGGGCGGACUUCGUGGCUCUAUCCGUCCAUGGAGUAUACUCGUCCUCAGUAAGUAGUGUAGGAUGGUUCACCGGAUGGAAAGGUAACGUUAAAAGACCGCUUCGUUGUUUAAUUGCGCAAGACUUCAAGUGCUCGAGCUUAGGUCUUCCCAUUAUGUUUAGGGAUGUAUUCUCACAAAUGCCUUAUUGUAGAUUGAGACAAGCUCCAUACGUAGUAGCACCCUUUGACUCGGGCGUUCUAUGGAUAGCUCGCAAGACGUGGAUCGCAUUCAGUCACUUACGAAAAUCCAGAUUCUGCCCUGCCUGGCUGUCAACAGACAACACCUUCGAUCAAUACGGAUCUAUCUUGGUGAGCGAAUUUUCUCCCACCCCGCGGGGAAUCGCACUGGUGGUCUGUGUGCCCCGAUCCAUUGUCUGCCGGAGCCACGGGAAAAAUUUUAAAUUCUGUAUCCUACUCCCCCGUGUGGCUGUAGCCCAGCUGAGGUCAGUAUGUCACCUUGUCGCUAUUAGGUGUUUCACCAUCCUAAUUGGCAAACUGUUUCAACCCUGCCAGAUAAGGUCAGAGCAGCCCCUUCGCUGGUAUUUAUCCCACAGCCCCCCUUCGAAGCGUUCCGCUAAGGCAAUACCAGCUCCGUACAGAGCGCCGGGUACCUUCCUCAUCUACUCCUGGAUCUACUUCUUACUUUGUAGGUCCACGGAUCAAGGCUGUUACUUUUGCAUAGUUCAUCGUGCCAUUACGCAGAGGACUGGAUGUCCCAGAAUACUUCUUGGAUUCACACUUGUCUCAAAUGAGCUUACGGUGGCGCACGGGAUUCAAGCUCCCGUGUUAGAGCCUCGGGCGCUGCCGUACAAUAGGGCAACUCCCAGAACCGAUCACGGAGUUUCUCCGGUGCGUAGACGUAGGUGCAGUAAUAUUCCUAUAAAUGUUGGAGAGUACCGCUGGUUGUUUACUUUUUCGGUAUGCAUACCUACCGACAGUCGCAAGGCAGUACAUGCCACGCAAGUUAGUUGUUUAAUGGUUUUGCCGCGCACAGCUCGUGCCUAUCAUAGGGUAAGGUACACCAGCUUCGGGCUUGCUUCUGAGCAGACCCAAACUAUUUUUCUGAUCCACAUAUCAUCAGACAACAAUUUUGCUCGAAAAGUAUGCAUACCCCCAUUAGUCCUUCUCUGA

Output:

MRPWSLLGSEHILSRRVEGITHDISLYWIPFFSVQVDHLATGLSKIYSAMFLYVTFSAEVAVYHVKVDGSLTTSDACRENSIHHACMGSAHLQRHRQRTDRPPFLSDGKPRSNTEQSPTPSYRLTPRLYSPLTALSGKLYLLGSGCPWKCRKMAQTTIVYRARRWIPLITDTIICVAPLPQPNHGVVALAVPSRARPHCLVRLSQGPSNNVYRYNFTWYCPDGGTADPLPFRHGITSVYLPSYSGIVRSTPYLIGTYAVPTQNSDPFATTRWVSVRLLASTTGEGDTGDRGTLSTFLDCCMSTSALPPARFISAYRVNATHGDDTRLTVKKLCTPYKSLWSGNVCRPLSVYKLKERYIVLTVGYHSLLSRMSTLRKTYSVDQAGPARPLVAEKSGVCECDYDPRFVIVIATVHFGKATGLHSKADRCSRWVGSSGRVEAAQLHHQRNKKPTGRDSDAEKMSGKGRLWSQVRLGGLEIQIGRRYQFYEYFLGKLIIRQRPTHERTQGAKKPSQKGTRLATRMALAWCDRFGRIYIPQCNILVHTIMKVRLHQTACDDNTWTAGEYDLYDCTRDVPNIVLSLPHHLYELLVSDALPAPHLLFLGESGSARHRTRSVGLTIANYTIIRDRCRSTCIRLDEPNPYRRFDVLWPLLMTPARNTQTRAFLCSVAGWNCQYYRPPDRSSSRTYLIALHLAIKLRSRYPNRSDLAYPTSVNRNTVYITVVSLRTAKLRYNSYTPSKLRPDQGNDPSYRTSERGNFVRNLPSVKPYRDFMKTISISFTGFRTKFDRQIGNSIGRGFTGARPTSLKRHSRFSLTVHSSKRYLSRLNAYVSFTIKHRITRFDVATREVKASAVVPNAEIVRKSTKVCWFMCIIRLQTVRATKPTIQKQLLRLSGPFQCGDSTNYVQPGYFHSSNAPRPACVSVCISPGLWDLLVKTRNAFSRVHGGTILTLVQHDIHKVELSSACTRERATNLAMTESVTSYSQCEASTRHTEIQAVSSIVYLHLTAARREKHTEKRADAKHHVSTWREGKTENVIGRLRSSHTLTLRLHPSSLDNWPFILGECQRGTDIEERIPAHAECTDKAVGVAFQSTLWADSANPRGGSRLRRGIRGPNAMNIECGYYLARQLLDRSCSRKIGETLRQTSSRTPLPCKRGRVPKPLEPKESNRSGASSVWIPVGVRLGPSAAKTPPWRHGRPRQLLISPQVIPLRHPSKRASQRDQCELPSCPEYKTPVCRLALGSSRMVHELALKMLSPVPEFVWRVGGGKVYLTADPPRVNLTHMSEHALVVPGVSLWALFLLRPLFFHLQPRLSTTYRWRRHLCLPVQLHSYRLGDMQLGASRRWVGPRNIWRQGVYAWEIYEIRTVPAPRVSLFRRWKENTYTLFGSGEITANVKTAMYRITSHPFRLYAGAREFSPFRLNEKKFAPAGITYKTGCDYSRSGELCEGRGRKNSFMYHWAALPLHAHKTNSLIDFYTPCNPNAAADMLVRSTEDARARIHCRYWVNNSFCKYICWHSRFLIEPIQKKWCTPIERRRPVINGDVLNGSEVTTKTRCCFRWASHTGRSYGRDRANTNLLVMGDASPEGGANRRLASTTGGFVQFKVYISRGDPGKELPYITRISPGRIRARRSFRLMCAVNELVPEDGFTHRRQSTTPPSRSVCDGPVRFRIKTHFQTSTGWGKHWSSFQCDQNCSRSLAYFKQEVTVCGARPGQGSFLPLGLVNDGGWIVIHSERLAVPAYDGIGDLVSDSKSNSMRRGDTYLEVLIRAKRREPISKCASRGALSSHDRGHSLVSTGTHFHILRGLMGIRTIRLSGSRDPTVRTSREGCQAPHFMVQSVWKPTTLRGSPALDNANESNSRPAHNKGRGPSQSNRRTGNVRQVVVGRVTLSQEINPFVKHLELVPGYYLAEYPMPRSLASRSNLRVIHTSHERARQTIHSPGKRNRGASHRTPAGKHREYPRQNSCLDYYEPSIRRKEAYGCVNNALPDCPDKDDREWTRSQSMIEMSRPTESLLSASWHRPLVLGSLNYGFITDPVALTGQRKLGCRGMMNTLMLIWNHHFGPYGSTPRLVFVCEAKRHRGSWANYTEAKLPSYYVITLAQGLGPRAGLHHNVYCLHPQRVFHFCPFVSVHLQFLSHVSTSPSAKCARLDPVHLPAEVGIVKPAGRIKKSFVGGAGPLRMLNRQRVCSVGPGSGPPSVADCAKLTAEVEWTSIHPAAADRGLSQSTIHASMMLTHQISFTECDKFAQMAQSSVSHTVGQRVYSTSPPCAKPGPAGYRLISSIECTVHKCKRRHMAGALLRPREPGLLPDDNVSPVPLRYGDNILAHRVASYSRRTSDPSHQVRSDQFWDINVQPPIPFFAPLLNLASRTYGRGALLSPPEPQIHAATMASARCESNNRSVLVMRHDHEGKLPLHRSKLSGLAVILSRGSSDVCAPSDMKHIHSGDRQMTEELRFLENKNLMGLRYGLYSLTSRCARNVDRLIPFIRLQQVFGESKLESLAPGVKPLPIFVERRRMWPPVIWISIRCGHQTRPYIRDRSAAKCRGGQSRPALSKQLIYVRRVRQWRLHPGRQMVLGHTFASSFRQELSAQHTATRRITRPLSAPFYVRPRPWTGGTVELCIFRGASCRTSEFGKGATPKELLVMNGFLVVYYPAGQRPGLTSFVRSHSIRPVYAELLGSETRRDLRRSFWSVNVVLGVYRHLRHGTRQRSASSGLKGTLKVDSPMGVVRRKPNPIHFLPWKGVSRADFVALSVHGVYSSSVSSVGWFTGWKGNVKRPLRCLIAQDFKCSSLGLPIMFRDVFSQMPYCRLRQAPYVVAPFDSGVLWIARKTWIAFSHLRKSRFCPAWLSTDNTFDQYGSILVSEFSPTPRGIALVVCVPRSIVCRSHGKNFKFCILLPRVAVAQLRSVCHLVAIRCFTILIGKLFQPCQIRSEQPLRWYLSHSPPSKRSAKAIPAPYRAPGTFLIYSWIYFLLCRSTDQGCYFCIVHRAITQRTGCPRILLGFTLVSNELTVAHGIQAPVLEPRALPYNRATPRTDHGVSPVRRRRCSNIPINVGEYRWLFTFSVCIPTDSRKAVHATQVSCLMVLPRTARAYHRVRYTSFGLASEQTQTIFLIHISSDNNFARKVCIPPLVLL

PROT.rb:

def abbreviate(str)
 list = ""
 str.scan(/.../) do |sub|
 case sub
 when "UUU", "UUC"
 list += "F"
 when "UUA", "UUG" 
 list += "L"
 when "UCU", "UCC", "UCA", "UCG", "AGU", "AGC"
 list += "S"
 when "UAU", "UAC"
 list += "Y"
 when "UGU", "UGC"
 list += "C"
 when "UGG"
 list += "W"
 when "CUU", "CUC", "CUA", "CUG"
 list += "L"
 when "CCU", "CCC", "CCA", "CCG"
 list += "P"
 when "CAU", "CAC"
 list+= "H"
 when "CAA", "CAG"
 list += "Q"
 when "CGU", "CGC", "CGA", "CGG", "AGA", "AGG"
 list += "R"
 when "AUU", "AUC", "AUA"
 list += "I"
 when "AUG"
 list += "M"
 when "ACU", "ACC", "ACA", "ACG"
 list += "T"
 when "AAU", "AAC"
 list += "N"
 when "AAA", "AAG"
 list += "K"
 when "GUU", "GUC", "GUA", "GUG"
 list += "V"
 when "GCU", "GCC", "GCA", "GCG"
 list += "A"
 when "GAU", "GAC"
 list += "D"
 when "GAA", "GAG"
 list += "E"
 when "GGU", "GGC", "GGA", "GGG"
 list += "G"
 else
 return list
 end
 end
end
user_input = gets.chomp
abbreviate(user_input)

Yea, that's a very long switch.

Basically it's a translator. Take 3 characters, output 1 other. I've thought about implementing this using a key-value map, but that wouldn't solve the repetitiveness.

The readability is quite good and so is the speed. However, I'm quite sure it isn't idiomatic. I have no clue how this would score on maintainability.

There's one thing striking me as odd: return should be implicit. But simply stating list instead of return list modifies the behaviour and I'm not sure why.

asked May 12, 2016 at 18:14
\$\endgroup\$

3 Answers 3

7
\$\begingroup\$

Some notes:

  • As others have already pointed out, you should use a hash instead of a gigantic case. But make sure your get operations on that hash are O(1), otherwise the method will be very inefficient.

  • You can use Enumerable#take_while to manage the stop amino acids.

  • Encapsulate the code in a module/class.

  • You need a return because it's not the last expression of the method, it's within the scan, which you want to break.

  • Note that this works: "123456".gsub(/.../) { |triplet| triplet[0] } #=> "14"

  • This is a common pattern: write the data structure in the most declarative/simple way and then programmatically build (on initialization) whatever (efficient) data structures you need in the algorithm.

I'd write it in functional style:

module Rosalind
 CODONS_BY_AMINOACID = {
 "F" => ["UUU", "UUC"],
 "L" => ["UUA", "UUG","CUU", "CUC", "CUA", "CUG"],
 "S" => ["UCU", "UCC", "UCA", "UCG", "AGU", "AGC"],
 "Y" => ["UAU", "UAC"],
 "C" => ["UGU", "UGC"],
 "W" => ["UGG"],
 "P" => ["CCU", "CCC", "CCA", "CCG"],
 "H" => ["CAU", "CAC"],
 "Q" => ["CAA", "CAG"],
 "R" => ["CGU", "CGC", "CGA", "CGG", "AGA", "AGG"],
 "I" => ["AUU", "AUC", "AUA"],
 "M" => ["AUG"],
 "T" => ["ACU", "ACC", "ACA", "ACG"],
 "N" => ["AAU", "AAC"],
 "K" => ["AAA", "AAG"],
 "V" => ["GUU", "GUC", "GUA", "GUG"],
 "A" => ["GCU", "GCC", "GCA", "GCG"],
 "D" => ["GAU", "GAC"],
 "E" => ["GAA", "GAG"],
 "G" => ["GGU", "GGC", "GGA", "GGG"],
 "STOP" => ["UGA", "UAA", "UAG"],
 }
 AMINOACID_BY_CODON = CODONS_BY_AMINOACID.
 flat_map { |c, as| as.map { |a| [a, c] } }.to_h
 def self.problem_prot(aminoacids_string)
 aminoacids_string.
 scan(/[UGTCA]{3}/).
 map { |codon| AMINOACID_BY_CODON[codon] }.
 take_while { |aminoacid| aminoacid != "STOP" }.
 join
 end
end
answered May 12, 2016 at 22:30
\$\endgroup\$
5
  • 1
    \$\begingroup\$ Consider merging the Stop codon list into the main table, then moving the take_while after the map. \$\endgroup\$ Commented May 13, 2016 at 7:09
  • 2
    \$\begingroup\$ Let's try it. I already considered it, but was afraid conceptually it wasn't fitting to mix them. Maybe adding lazy in there would be a good idea. \$\endgroup\$ Commented May 13, 2016 at 7:13
  • 1
    \$\begingroup\$ That's a good idea to programmatically invert the hash from the more maintainable/readable version and store it in a constant. I'll have to remember that :) \$\endgroup\$ Commented May 13, 2016 at 16:22
  • 1
    \$\begingroup\$ James, I had this "aha moment" some years ago reading this: norvig.com/sudoku.html \$\endgroup\$ Commented May 13, 2016 at 17:44
  • \$\begingroup\$ That looks very fancy. Quite advanced. Thanks for the input! \$\endgroup\$ Commented May 15, 2016 at 20:09
5
\$\begingroup\$

In your case, the 'return' is required to break out of the scan do/end block. You can remove the else part and just have 'list' alone as the last line of the method. Unless you actually wanted it to stop scanning on an invalid sequence like "UGA"?

I would do a key value map here, it's just much more straight forward. Keeps the logic part clean.

def replace(sub)
 hash = {
 "F" => ["UUU", "UUC"],
 "L" => ["UUA", "UUG","CUU", "CUC", "CUA", "CUG"],
 "S" => ["UCU", "UCC", "UCA", "UCG", "AGU", "AGC"],
 "Y" => ["UAU", "UAC"],
 "C" => ["UGU", "UGC"],
 "W" => ["UGG"],
 "P" => ["CCU", "CCC", "CCA", "CCG"],
 "H" => ["CAU", "CAC"],
 "Q" => ["CAA", "CAG"],
 "R" => ["CGU", "CGC", "CGA", "CGG", "AGA", "AGG"],
 "I" => ["AUU", "AUC", "AUA"],
 "M" => ["AUG"],
 "T" => ["ACU", "ACC", "ACA", "ACG"],
 "N" => ["AAU", "AAC"],
 "K" => ["AAA", "AAG"],
 "V" => ["GUU", "GUC", "GUA", "GUG"],
 "A" => ["GCU", "GCC", "GCA", "GCG"],
 "D" => ["GAU", "GAC"],
 "E" => ["GAA", "GAG"],
 "G" => ["GGU", "GGC", "GGA", "GGG"]
 }
 hash.detect do |key, array|
 break key if array.include? sub
 end
end
def abbreviate(str)
 list = ""
 str.scan(/.../) do |sub|
 letter = replace(sub)
 break if letter.nil? # if you want to break because of "UGA" ?
 list += letter
 end
 list
end
answered May 12, 2016 at 19:58
\$\endgroup\$
2
  • \$\begingroup\$ Unless you actually wanted it to stop scanning on an invalid sequence like "UGA"? That's what the table implied, so I went with that. It also seemed to fit the idea of returning as soon as you know you're done. In hindsight I should probably return behind the switch anyway, at least that's what I'd do in other languages. \$\endgroup\$ Commented May 12, 2016 at 20:42
  • 1
    \$\begingroup\$ I don't know ruby so I may be missing something here, but wouldn't it be better to have the codons as keys and the amino acids as values for the hash? Something like "AAG" => "K", "AAA" => "K" instead of "K" => ["AAA", "AAG"]. That way the lookup is direct and you don't need to scan an array each time. \$\endgroup\$ Commented May 13, 2016 at 8:13
3
\$\begingroup\$

If you wanted to look more ruby-ish, you could convert your switch statement into a hash and then inject/reduce your input string.

def abbreviate(str)
 codes = {
 'F' => ["UUU", "UUC"],
 'L' => ["UUA", "UUG"],
 'S' => ["UCU", "UCC", "UCA", "UCG", "AGU", "AGC"],
 'Y' => ["UAU", "UAC"],
 'C' => ["UGU", "UGC"],
 'W' => ["UGG"],
 'L' => ["CUU", "CUC", "CUA", "CUG"],
 'P' => ["CCU", "CCC", "CCA", "CCG"],
 'H' => ["CAU", "CAC"],
 'Q' => ["CAA", "CAG"],
 'R' => ["CGU", "CGC", "CGA", "CGG", "AGA", "AGG"],
 'I' => ["AUU", "AUC", "AUA"],
 'M' => ['AUG'],
 'T' => [ "ACU", "ACC", "ACA", "ACG"],
 'N' => ["AAU", "AAC"],
 'K' => ["AAA", "AAG"],
 'V' => ["GUU", "GUC", "GUA", "GUG"],
 'A' => ["GCU", "GCC", "GCA", "GCG"],
 'D' => ["GAU", "GAC"],
 'E' => [ "GAA", "GAG"],
 'G' => ["GGU", "GGC", "GGA", "GGG"]
 }
 str.scan(/.../).inject('') do |s, sub|
 c = codes.select { |_, value| value.include? sub }.keys.first
 c.nil? ? s : s += c
 end
end

I'm sure there a few variations of this idea that would work well too. This example is just the first that occurred to me.

answered May 12, 2016 at 19:58
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.