This question is part of a series solving the Rosalind challenges. For the previous question in this series, see Wascally wabbits. The repository with all my up-to-date solutions so far can be found here.
Problem: PROT
The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings.
The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet.
Given:
An RNA string \$s\$ corresponding to a strand of mRNA (of length at most 10 kbp).
Return:
The protein string encoded by \$s\$.
Sample Dataset:
AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA
Sample Output:
MAMAPRTEINSTRING
My solution solves the sample dataset and the actual dataset given.
Dataset:
AUGCGCCCUUGGUCGCUCCUUGGAUCAGAGCAUAUUCUAUCACGGCGCGUCGAAGGAAUAACCCACGACAUCUCUCUAUAUUGGAUUCCCUUUUUUUCGGUUCAGGUAGAUCAUUUGGCUACUGGACUUUCUAAGAUUUACUCCGCCAUGUUCCUAUAUGUUACAUUCUCAGCCGAAGUCGCUGUAUAUCACGUUAAGGUAGACGGUUCCUUGACUACCAGCGACGCCUGUAGGGAGAAUUCCAUCCAUCAUGCAUGUAUGGGCAGUGCGCACUUACAGCGCCAUAGGCAACGGACGGACAGACCUCCUUUCCUGUCGGACGGUAAGCCGCGAUCCAAUACAGAGCAAAGUCCCACGCCCUCCUAUAGACUCACGCCAAGAUUGUAUUCCCCGUUAACCGCUCUCUCAGGGAAGUUGUAUCUACUCGGAUCGGGAUGUCCUUGGAAAUGUAGGAAAAUGGCUCAAACUACGAUUGUAUACCGUGCGAGACGUUGGAUCCCGCUUAUCACUGAUACCAUAAUCUGUGUGGCCCCCUUACCACAACCUAACCAUGGAGUAGUAGCCCUGGCCGUCCCUUCAAGGGCAAGACCUCAUUGUCUUGUACGCUUAUCACAAGGGCCAUCUAACAAUGUGUACCGGUAUAAUUUUACGUGGUAUUGUCCAGACGGCGGUACGGCCGAUCCGUUGCCAUUUCGUCAUGGCAUAACCUCGGUCUAUCUUCCUUCCUACUCGGGAAUAGUUCGCAGUACACCAUACCUCAUCGGCACUUACGCUGUUCCAACACAAAAUUCUGAUCCCUUCGCUACCACCCGCUGGGUAUCUGUCAGGUUACUGGCCUCCACUACCGGAGAGGGCGAUACGGGGGACCGCGGAACACUUUCUACAUUUUUGGACUGCUGUAUGUCUACUUCGGCUCUUCCUCCCGCGAGAUUUAUAUCGGCAUACAGAGUAAACGCUACCCAUGGCGACGACACUCGUCUCACCGUUAAGAAGCUGUGCACACCUUAUAAAAGCCUUUGGUCAGGUAAUGUAUGUAGACCUCUAUCGGUCUAUAAGUUGAAGGAAAGAUAUAUAGUCCUCACCGUUGGUUAUCAUUCCCUCCUAUCCCGCAUGUCCACCCUAAGGAAAACUUACUCCGUGGACCAAGCGGGACCGGCAAGGCCCUUGGUUGCCGAAAAAUCAGGAGUUUGCGAGUGCGACUAUGAUCCCAGGUUUGUAAUCGUUAUCGCGACUGUUCAUUUCGGGAAGGCUACUGGGCUGCAUUCAAAGGCGGACCGCUGCUCUCGCUGGGUCGGGUCCAGUGGACGGGUCGAGGCUGCUCAGCUCCAUCAUCAAAGAAAUAAGAAGCCCACCGGAAGAGACAGUGAUGCCGAGAAGAUGUCCGGCAAAGGCAGACUCUGGUCCCAAGUUAGACUGGGUGGUUUAGAGAUCCAGAUAGGUCGUCGUUACCAGUUCUACGAAUACUUUCUUGGUAAACUGAUUAUCCGUCAAAGGCCGACUCACGAGAGGACACAGGGUGCAAAGAAACCCUCACAGAAGGGAACGCGAUUGGCGACACGUAUGGCGCUCGCGUGGUGUGACAGGUUUGGUAGGAUCUAUAUCCCCCAGUGCAACAUAUUAGUUCACACUAUAAUGAAGGUCCGAUUGCACCAAACAGCCUGCGAUGAUAACACGUGGACUGCUGGAGAGUAUGACUUGUACGACUGCACGCGCGAUGUACCCAACAUCGUCCUGUCCCUACCGCACCAUCUUUAUGAACUGCUGGUCUCUGAUGCACUCCCCGCCCCCCACCUCCUCUUUCUGGGGGAAUCUGGUUCCGCCAGGCAUAGGACACGUUCGGUUGGACUAACUAUUGCUAACUACACGAUCAUUCGUGAUAGGUGUCGGUCCACAUGUAUAAGGUUGGAUGAACCUAACCCCUACCGACGAUUUGAUGUAUUGUGGCCACUACUAAUGACCCCCGCUCGUAACACCCAAACACGCGCAUUUCUCUGCUCCGUGGCUGGCUGGAAUUGCCAGUAUUACAGACCCCCUGACAGAUCCAGUAGUAGGACGUACUUGAUCGCACUACAUUUAGCAAUCAAAUUGCGUUCACGCUACCCCAAUCGUUCAGAUCUGGCUUAUCCCACAUCUGUAAACAGGAACACGGUGUAUAUCACCGUAGUCUCUCUACGCACAGCGAAACUAAGAUACAACAGUUACACACCCAGCAAACUGCGGCCCGAUCAAGGCAAUGACCCCAGCUACCGAACUUCGGAAAGAGGGAAUUUCGUCCGGAACUUGCCAUCAGUGAAACCGUACCGAGACUUCAUGAAAACGAUCAGCAUUUCCUUUACAGGAUUCCGGACCAAAUUUGAUCGUCAAAUUGGGAAUUCCAUAGGCCGAGGAUUCACGGGAGCAAGGCCGACCUCAUUGAAGCGUCAUAGUCGCUUUUCGCUCACCGUACAUUCUAGCAAGCGCUAUCUCUCCCGCCUCAACGCCUACGUCUCUUUUACAAUAAAACAUCGGAUAACGAGAUUCGACGUGGCUACGCGCGAAGUUAAGGCUUCCGCCGUGGUACCUAAUGCGGAAAUAGUCCGGAAAAGUACGAAGGUGUGCUGGUUUAUGUGCAUCAUCAGACUCCAAACUGUCCGAGCUACCAAACCGACCAUUCAGAAGCAGUUGUUGAGAUUAAGUGGCCCGUUUCAAUGCGGGGAUUCCACCAAUUACGUACAACCUGGUUACUUCCACAGUUCAAACGCGCCCCGCCCGGCGUGUGUCAGUGUGUGUAUCAGCCCGGGGUUAUGGGACCUGUUGGUAAAAACCCGGAAUGCUUUCUCCCGUGUGCACGGGGGUACUAUCCUUACUUUAGUUCAGCAUGACAUUCAUAAAGUAGAAUUAUCGUCAGCAUGCACUCGCGAGCGGGCUACCAACCUGGCAAUGACUGAAAGCGUAACGUCAUACUCUCAGUGCGAGGCCUCGACUCGCCAUACCGAAAUACAAGCUGUUAGCUCAAUUGUGUAUCUCCACUUGACUGCGGCCCGCAGGGAGAAACACACAGAGAAGAGGGCCGACGCGAAGCAUCAUGUGUCUACUUGGCGCGAGGGUAAAACCGAAAACGUCAUUGGAAGGCUCAGAUCGUCACAUACAUUAACCCUUAGGUUACAUCCAUCCUCGUUGGACAACUGGCCGUUCAUUCUUGGGGAAUGCCAGAGAGGAACGGAUAUCGAGGAACGCAUCCCGGCACAUGCGGAAUGUACAGACAAGGCUGUAGGCGUAGCCUUUCAGUCGACGCUAUGGGCAGAUUCGGCGAAUCCGCGAGGUGGAUCUCGCUUGAGAAGAGGGAUUAGGGGCCCCAACGCGAUGAAUAUUGAAUGCGGAUAUUAUCUGGCGAGACAGCUUCUUGACCGCUCUUGUAGUCGCAAGAUAGGCGAGACUCUAAGACAAACUAGUUCCCGCACGCCAUUGCCAUGCAAGCGAGGCCGCGUCCCAAAACCCCUGGAACCUAAAGAGUCGAACAGGUCAGGAGCAAGUAGCGUAUGGAUACCAGUCGGAGUUAGGCUGGGUCCCUCCGCUGCGAAGACUCCGCCCUGGCGACAUGGUCGCCCGCGACAACUUCUAAUCUCUCCUCAAGUUAUUCCGUUAAGACACCCGAGCAAACGAGCUAGUCAAAGGGAUCAGUGCGAGCUCCCAUCAUGUCCUGAGUACAAGACCCCAGUGUGCCGACUUGCUUUGGGUAGCUCCAGAAUGGUUCACGAAUUAGCCCUUAAGAUGCUCUCCCCUGUUCCGGAGUUCGUGUGGAGGGUCGGAGGCGGGAAAGUCUAUUUAACGGCGGACCCACCUAGGGUAAAUCUGACGCAUAUGUCUGAACACGCCCUGGUGGUACCAGGAGUUUCCCUAUGGGCCCUUUUUCUUUUAAGACCUUUAUUUUUCCAUCUCCAACCUCGAUUAUCGACAACAUACCGUUGGCGCAGACACUUAUGCUUACCCGUUCAACUGCAUUCGUACAGGCUGGGUGAUAUGCAAUUAGGAGCAUCUCGACGUUGGGUAGGCCCCCGAAAUAUAUGGAGACAGGGUGUGUAUGCGUGGGAGAUAUAUGAGAUUCGAACUGUACCGGCUCCAAGGGUGUCACUGUUCCGCCGUUGGAAGGAAAACACUUACACCCUCUUUGGAUCAGGGGAAAUUACAGCGAAUGUUAAGACGGCUAUGUAUCGGAUAACGUCACAUCCGUUUAGACUGUAUGCGGGCGCCCGAGAAUUCAGUCCAUUCCGACUAAACGAGAAAAAGUUUGCCCCCGCGGGGAUUACGUACAAAACCGGAUGCGAUUAUAGCCGUUCUGGAGAACUCUGUGAGGGGCGGGGAAGGAAAAAUAGUUUUAUGUACCAUUGGGCGGCCCUUCCUCUCCAUGCACAUAAAACAAACAGCCUCAUUGAUUUCUACACUCCGUGCAACCCAAAUGCGGCUGCUGACAUGCUAGUGCGUAGUACAGAAGAUGCCCGAGCUCGAAUACAUUGCAGAUAUUGGGUUAACAAUUCGUUUUGCAAAUAUAUAUGUUGGCACUCCAGGUUCUUGAUAGAACCGAUUCAGAAGAAAUGGUGUACACCCAUUGAGAGGCGUCGCCCCGUAAUUAACGGGGAUGUCUUAAACGGGUCAGAGGUAACUACUAAGACGCGGUGCUGUUUCAGAUGGGCAAGUCAUACGGGCCGUUCUUACGGAAGAGAUCGUGCUAAUACAAACCUGCUUGUUAUGGGCGACGCAUCGCCCGAGGGGGGCGCGAAUCGGAGACUUGCAAGUACGACCGGUGGAUUCGUGCAAUUUAAGGUAUACAUUUCACGCGGAGACCCCGGGAAGGAGCUCCCCUACAUAACACGAAUAUCACCCGGCCGAAUUAGGGCUCGACGGUCCUUCCGCCUAAUGUGUGCAGUGAACGAGUUGGUGCCUGAAGAUGGUUUCACUCACAGGCGCCAAAGUACGACUCCUCCUUCCCGAUCAGUUUGCGACGGGCCUGUCCGCUUUAGAAUAAAGACCCACUUCCAAACUUCGACCGGCUGGGGGAAACAUUGGAGCAGCUUCCAAUGUGAUCAAAACUGUAGCAGAUCUCUAGCAUACUUCAAACAGGAAGUUACUGUAUGUGGUGCACGACCUGGCCAAGGUAGUUUCCUCCCCCUAGGACUGGUGAACGAUGGCGGGUGGAUUGUCAUUCAUAGUGAGAGAUUAGCCGUGCCUGCUUAUGACGGGAUCGGCGAUCUAGUAAGCGAUUCCAAAUCGAACAGCAUGCGCCGAGGGGACACUUACUUGGAGGUACUUAUCCGAGCGAAAAGGAGGGAGCCCAUAUCCAAAUGCGCUAGUAGAGGAGCACUGUCGAGUCAUGACCGCGGCCAUUCACUCGUAAGUACAGGGACACACUUCCAUAUCCUCCGGGGACUGAUGGGUAUUCGUACGAUUCGGCUGAGCGGGUCGCGGGACCCUACGGUCCGAACGUCUCGCGAGGGGUGUCAGGCUCCUCAUUUCAUGGUGCAGUCUGUCUGGAAGCCGACUACACUACGGGGUAGUCCUGCCCUUGAUAAUGCUAAUGAGAGUAACUCACGCCCCGCCCAUAACAAAGGCCGGGGGCCCUCUCAAUCAAACAGGCGGACGGGGAAUGUCAGGCAGGUUGUCGUUGGCAGAGUUACGCUCUCGCAGGAGAUUAAUCCUUUUGUAAAGCAUUUGGAACUAGUCCCCGGCUAUUAUUUAGCUGAGUAUCCAAUGCCUAGAAGCCUUGCGUCCCGUUCUAACCUGCGCGUAAUUCAUACAUCGCAUGAGAGAGCAAGGCAAACAAUCCAUUCGCCUGGCAAGAGAAACCGAGGAGCAAGUCACCGAACGCCCGCCGGGAAGCACCGCGAGUACCCACGACAAAACAGCUGCUUGGACUAUUAUGAACCCUCCAUACGUAGGAAGGAGGCCUAUGGGUGCGUCAAUAACGCACUCCCUGAUUGUCCUGACAAGGACGAUCGCGAAUGGACGCGCUCGCAAUCCAUGAUUGAAAUGUCCAGACCAACCGAGUCCCUGCUCAGUGCCUCCUGGCAUCGGCCAUUGGUUCUUGGAAGCCUCAACUACGGAUUCAUCACUGACCCCGUGGCGCUCACUGGUCAAAGAAAACUAGGAUGCCGUGGAAUGAUGAACACGUUAAUGUUAAUAUGGAACCAUCAUUUCGGCCCCUAUGGUUCAACCCCAAGAUUAGUUUUCGUUUGUGAAGCCAAGCGGCACCGGGGAUCUUGGGCAAACUACACUGAAGCAAAACUCCCUUCCUAUUAUGUAAUAACACUGGCACAGGGUCUUGGCCCGCGCGCUGGGCUCCACCACAACGUGUACUGUCUUCACCCUCAACGAGUUUUCCACUUCUGUCCCUUCGUAUCAGUUCACUUGCAAUUCCUAUCCCAUGUUUCGACUAGCCCAAGCGCUAAGUGUGCCCGCCUAGAUCCAGUCCAUCUUCCGGCUGAGGUGGGGAUCGUCAAACCUGCGGGGCGGAUUAAGAAGUCAUUUGUUGGUGGCGCGGGGCCUCUCAGAAUGUUAAAUAGGCAACGUGUAUGUUCGGUGGGGCCUGGAAGUGGACCGCCGUCCGUGGCGGAUUGUGCCAAAUUAACGGCUGAAGUGGAGUGGACUUCCAUCCACCCAGCUGCAGCAGAUCGGGGGUUAUCCCAAAGCACCAUCCAUGCCAGCAUGAUGCUGACUCACCAAAUAAGCUUCACUGAAUGCGACAAGUUCGCGCAAAUGGCUCAGAGCAGCGUGUCCCACACCGUGGGACAAAGGGUAUACUCGACUUCUCCACCUUGCGCGAAACCUGGCCCCGCUGGAUACAGACUGAUCAGUUCUAUCGAAUGUACCGUGCACAAAUGUAAACGUCGACAUAUGGCCGGCGCGCUACUGCGGCCCCGGGAACCUGGCCUCCUACCCGAUGACAAUGUAUCACCCGUCCCUCUCCGGUACGGCGAUAAUAUAUUGGCUCAUCGGGUGGCAUCUUACUCCCGGCGUACUUCCGACCCGUCGCAUCAAGUCCGAUCUGACCAAUUCUGGGACAUUAAUGUACAACCACCUAUACCCUUCUUCGCACCUCUGCUGAAUUUGGCUUCACGAACCUAUGGGCGAGGAGCGCUGCUGUCCCCGCCGGAACCACAGAUUCACGCUGCCACUAUGGCUUCAGCUAGAUGUGAGUCAAAUAAUAGAUCAGUACUCGUUAUGCGGCAUGAUCAUGAAGGUAAACUGCCCUUGCACCGAUCCAAGCUAAGCGGGCUAGCUGUAAUCCUUAGCCGGGGAUCUUCCGAUGUAUGUGCCCCCUCGGACAUGAAACACAUCCACAGUGGAGAUAGACAAAUGACUGAGGAGCUUAGAUUUCUGGAGAACAAAAACUUGAUGGGCUUAAGAUAUGGUCUAUACUCAUUAACAUCGAGAUGUGCUCGAAACGUCGAUAGACUCAUUCCUUUUAUUCGCCUACAGCAAGUGUUCGGGGAAUCAAAGUUGGAGUCACUUGCCCCAGGGGUCAAGCCGCUCCCGAUUUUCGUCGAGCGUCGUAGGAUGUGGCCGCCGGUUAUAUGGAUAAGUAUACGUUGCGGACACCAGACCAGACCCUAUAUACGAGACCGUUCUGCAGCUAAAUGUCGGGGGGGCCAGUCGCGGCCCGCCCUCUCUAAACAACUUAUUUACGUUCGGCGGGUAAGGCAGUGGCGGUUACAUCCAGGCAGACAGAUGGUCCUUGGUCAUACGUUCGCGAGCUCUUUCCGUCAGGAACUCUCCGCACAGCAUACUGCAACUCGCCGGAUUACAAGACCCCUUAGUGCUCCCUUUUAUGUACGUCCCCGGCCCUGGACUGGCGGUACUGUCGAGCUUUGCAUUUUUAGAGGCGCCUCAUGCAGGACUUCAGAAUUCGGCAAGGGAGCUACCCCCAAAGAGCUCCUCGUAAUGAACGGGUUCCUAGUGGUGUAUUACCCAGCCGGACAAAGGCCCGGUCUAACGUCUUUUGUCCGUUCGCAUUCCAUACGUCCCGUGUACGCCGAGCUACUCGGUAGUGAAACUAGGCGAGACUUGCGGAGGUCUUUUUGGUCAGUAAACGUAGUACUUGGUGUAUAUCGUCACUUACGCCACGGCACAAGGCAAAGGAGUGCGUCAUCCGGAUUGAAGGGCACUCUCAAGGUUGAUUCGCCAAUGGGUGUUGUUCGUCGCAAACCGAACCCGAUCCACUUUUUACCCUGGAAAGGGGUGUCAAGGGCGGACUUCGUGGCUCUAUCCGUCCAUGGAGUAUACUCGUCCUCAGUAAGUAGUGUAGGAUGGUUCACCGGAUGGAAAGGUAACGUUAAAAGACCGCUUCGUUGUUUAAUUGCGCAAGACUUCAAGUGCUCGAGCUUAGGUCUUCCCAUUAUGUUUAGGGAUGUAUUCUCACAAAUGCCUUAUUGUAGAUUGAGACAAGCUCCAUACGUAGUAGCACCCUUUGACUCGGGCGUUCUAUGGAUAGCUCGCAAGACGUGGAUCGCAUUCAGUCACUUACGAAAAUCCAGAUUCUGCCCUGCCUGGCUGUCAACAGACAACACCUUCGAUCAAUACGGAUCUAUCUUGGUGAGCGAAUUUUCUCCCACCCCGCGGGGAAUCGCACUGGUGGUCUGUGUGCCCCGAUCCAUUGUCUGCCGGAGCCACGGGAAAAAUUUUAAAUUCUGUAUCCUACUCCCCCGUGUGGCUGUAGCCCAGCUGAGGUCAGUAUGUCACCUUGUCGCUAUUAGGUGUUUCACCAUCCUAAUUGGCAAACUGUUUCAACCCUGCCAGAUAAGGUCAGAGCAGCCCCUUCGCUGGUAUUUAUCCCACAGCCCCCCUUCGAAGCGUUCCGCUAAGGCAAUACCAGCUCCGUACAGAGCGCCGGGUACCUUCCUCAUCUACUCCUGGAUCUACUUCUUACUUUGUAGGUCCACGGAUCAAGGCUGUUACUUUUGCAUAGUUCAUCGUGCCAUUACGCAGAGGACUGGAUGUCCCAGAAUACUUCUUGGAUUCACACUUGUCUCAAAUGAGCUUACGGUGGCGCACGGGAUUCAAGCUCCCGUGUUAGAGCCUCGGGCGCUGCCGUACAAUAGGGCAACUCCCAGAACCGAUCACGGAGUUUCUCCGGUGCGUAGACGUAGGUGCAGUAAUAUUCCUAUAAAUGUUGGAGAGUACCGCUGGUUGUUUACUUUUUCGGUAUGCAUACCUACCGACAGUCGCAAGGCAGUACAUGCCACGCAAGUUAGUUGUUUAAUGGUUUUGCCGCGCACAGCUCGUGCCUAUCAUAGGGUAAGGUACACCAGCUUCGGGCUUGCUUCUGAGCAGACCCAAACUAUUUUUCUGAUCCACAUAUCAUCAGACAACAAUUUUGCUCGAAAAGUAUGCAUACCCCCAUUAGUCCUUCUCUGA
Output:
MRPWSLLGSEHILSRRVEGITHDISLYWIPFFSVQVDHLATGLSKIYSAMFLYVTFSAEVAVYHVKVDGSLTTSDACRENSIHHACMGSAHLQRHRQRTDRPPFLSDGKPRSNTEQSPTPSYRLTPRLYSPLTALSGKLYLLGSGCPWKCRKMAQTTIVYRARRWIPLITDTIICVAPLPQPNHGVVALAVPSRARPHCLVRLSQGPSNNVYRYNFTWYCPDGGTADPLPFRHGITSVYLPSYSGIVRSTPYLIGTYAVPTQNSDPFATTRWVSVRLLASTTGEGDTGDRGTLSTFLDCCMSTSALPPARFISAYRVNATHGDDTRLTVKKLCTPYKSLWSGNVCRPLSVYKLKERYIVLTVGYHSLLSRMSTLRKTYSVDQAGPARPLVAEKSGVCECDYDPRFVIVIATVHFGKATGLHSKADRCSRWVGSSGRVEAAQLHHQRNKKPTGRDSDAEKMSGKGRLWSQVRLGGLEIQIGRRYQFYEYFLGKLIIRQRPTHERTQGAKKPSQKGTRLATRMALAWCDRFGRIYIPQCNILVHTIMKVRLHQTACDDNTWTAGEYDLYDCTRDVPNIVLSLPHHLYELLVSDALPAPHLLFLGESGSARHRTRSVGLTIANYTIIRDRCRSTCIRLDEPNPYRRFDVLWPLLMTPARNTQTRAFLCSVAGWNCQYYRPPDRSSSRTYLIALHLAIKLRSRYPNRSDLAYPTSVNRNTVYITVVSLRTAKLRYNSYTPSKLRPDQGNDPSYRTSERGNFVRNLPSVKPYRDFMKTISISFTGFRTKFDRQIGNSIGRGFTGARPTSLKRHSRFSLTVHSSKRYLSRLNAYVSFTIKHRITRFDVATREVKASAVVPNAEIVRKSTKVCWFMCIIRLQTVRATKPTIQKQLLRLSGPFQCGDSTNYVQPGYFHSSNAPRPACVSVCISPGLWDLLVKTRNAFSRVHGGTILTLVQHDIHKVELSSACTRERATNLAMTESVTSYSQCEASTRHTEIQAVSSIVYLHLTAARREKHTEKRADAKHHVSTWREGKTENVIGRLRSSHTLTLRLHPSSLDNWPFILGECQRGTDIEERIPAHAECTDKAVGVAFQSTLWADSANPRGGSRLRRGIRGPNAMNIECGYYLARQLLDRSCSRKIGETLRQTSSRTPLPCKRGRVPKPLEPKESNRSGASSVWIPVGVRLGPSAAKTPPWRHGRPRQLLISPQVIPLRHPSKRASQRDQCELPSCPEYKTPVCRLALGSSRMVHELALKMLSPVPEFVWRVGGGKVYLTADPPRVNLTHMSEHALVVPGVSLWALFLLRPLFFHLQPRLSTTYRWRRHLCLPVQLHSYRLGDMQLGASRRWVGPRNIWRQGVYAWEIYEIRTVPAPRVSLFRRWKENTYTLFGSGEITANVKTAMYRITSHPFRLYAGAREFSPFRLNEKKFAPAGITYKTGCDYSRSGELCEGRGRKNSFMYHWAALPLHAHKTNSLIDFYTPCNPNAAADMLVRSTEDARARIHCRYWVNNSFCKYICWHSRFLIEPIQKKWCTPIERRRPVINGDVLNGSEVTTKTRCCFRWASHTGRSYGRDRANTNLLVMGDASPEGGANRRLASTTGGFVQFKVYISRGDPGKELPYITRISPGRIRARRSFRLMCAVNELVPEDGFTHRRQSTTPPSRSVCDGPVRFRIKTHFQTSTGWGKHWSSFQCDQNCSRSLAYFKQEVTVCGARPGQGSFLPLGLVNDGGWIVIHSERLAVPAYDGIGDLVSDSKSNSMRRGDTYLEVLIRAKRREPISKCASRGALSSHDRGHSLVSTGTHFHILRGLMGIRTIRLSGSRDPTVRTSREGCQAPHFMVQSVWKPTTLRGSPALDNANESNSRPAHNKGRGPSQSNRRTGNVRQVVVGRVTLSQEINPFVKHLELVPGYYLAEYPMPRSLASRSNLRVIHTSHERARQTIHSPGKRNRGASHRTPAGKHREYPRQNSCLDYYEPSIRRKEAYGCVNNALPDCPDKDDREWTRSQSMIEMSRPTESLLSASWHRPLVLGSLNYGFITDPVALTGQRKLGCRGMMNTLMLIWNHHFGPYGSTPRLVFVCEAKRHRGSWANYTEAKLPSYYVITLAQGLGPRAGLHHNVYCLHPQRVFHFCPFVSVHLQFLSHVSTSPSAKCARLDPVHLPAEVGIVKPAGRIKKSFVGGAGPLRMLNRQRVCSVGPGSGPPSVADCAKLTAEVEWTSIHPAAADRGLSQSTIHASMMLTHQISFTECDKFAQMAQSSVSHTVGQRVYSTSPPCAKPGPAGYRLISSIECTVHKCKRRHMAGALLRPREPGLLPDDNVSPVPLRYGDNILAHRVASYSRRTSDPSHQVRSDQFWDINVQPPIPFFAPLLNLASRTYGRGALLSPPEPQIHAATMASARCESNNRSVLVMRHDHEGKLPLHRSKLSGLAVILSRGSSDVCAPSDMKHIHSGDRQMTEELRFLENKNLMGLRYGLYSLTSRCARNVDRLIPFIRLQQVFGESKLESLAPGVKPLPIFVERRRMWPPVIWISIRCGHQTRPYIRDRSAAKCRGGQSRPALSKQLIYVRRVRQWRLHPGRQMVLGHTFASSFRQELSAQHTATRRITRPLSAPFYVRPRPWTGGTVELCIFRGASCRTSEFGKGATPKELLVMNGFLVVYYPAGQRPGLTSFVRSHSIRPVYAELLGSETRRDLRRSFWSVNVVLGVYRHLRHGTRQRSASSGLKGTLKVDSPMGVVRRKPNPIHFLPWKGVSRADFVALSVHGVYSSSVSSVGWFTGWKGNVKRPLRCLIAQDFKCSSLGLPIMFRDVFSQMPYCRLRQAPYVVAPFDSGVLWIARKTWIAFSHLRKSRFCPAWLSTDNTFDQYGSILVSEFSPTPRGIALVVCVPRSIVCRSHGKNFKFCILLPRVAVAQLRSVCHLVAIRCFTILIGKLFQPCQIRSEQPLRWYLSHSPPSKRSAKAIPAPYRAPGTFLIYSWIYFLLCRSTDQGCYFCIVHRAITQRTGCPRILLGFTLVSNELTVAHGIQAPVLEPRALPYNRATPRTDHGVSPVRRRRCSNIPINVGEYRWLFTFSVCIPTDSRKAVHATQVSCLMVLPRTARAYHRVRYTSFGLASEQTQTIFLIHISSDNNFARKVCIPPLVLL
PROT.rb:
def abbreviate(str)
list = ""
str.scan(/.../) do |sub|
case sub
when "UUU", "UUC"
list += "F"
when "UUA", "UUG"
list += "L"
when "UCU", "UCC", "UCA", "UCG", "AGU", "AGC"
list += "S"
when "UAU", "UAC"
list += "Y"
when "UGU", "UGC"
list += "C"
when "UGG"
list += "W"
when "CUU", "CUC", "CUA", "CUG"
list += "L"
when "CCU", "CCC", "CCA", "CCG"
list += "P"
when "CAU", "CAC"
list+= "H"
when "CAA", "CAG"
list += "Q"
when "CGU", "CGC", "CGA", "CGG", "AGA", "AGG"
list += "R"
when "AUU", "AUC", "AUA"
list += "I"
when "AUG"
list += "M"
when "ACU", "ACC", "ACA", "ACG"
list += "T"
when "AAU", "AAC"
list += "N"
when "AAA", "AAG"
list += "K"
when "GUU", "GUC", "GUA", "GUG"
list += "V"
when "GCU", "GCC", "GCA", "GCG"
list += "A"
when "GAU", "GAC"
list += "D"
when "GAA", "GAG"
list += "E"
when "GGU", "GGC", "GGA", "GGG"
list += "G"
else
return list
end
end
end
user_input = gets.chomp
abbreviate(user_input)
Yea, that's a very long switch.
Basically it's a translator. Take 3 characters, output 1 other. I've thought about implementing this using a key-value map, but that wouldn't solve the repetitiveness.
The readability is quite good and so is the speed. However, I'm quite sure it isn't idiomatic. I have no clue how this would score on maintainability.
There's one thing striking me as odd: return
should be implicit. But simply stating list
instead of return list
modifies the behaviour and I'm not sure why.
3 Answers 3
Some notes:
As others have already pointed out, you should use a hash instead of a gigantic
case
. But make sure your get operations on that hash are O(1), otherwise the method will be very inefficient.You can use
Enumerable#take_while
to manage the stop amino acids.Encapsulate the code in a module/class.
You need a
return
because it's not the last expression of the method, it's within thescan
, which you want to break.Note that this works:
"123456".gsub(/.../) { |triplet| triplet[0] } #=> "14"
This is a common pattern: write the data structure in the most declarative/simple way and then programmatically build (on initialization) whatever (efficient) data structures you need in the algorithm.
I'd write it in functional style:
module Rosalind
CODONS_BY_AMINOACID = {
"F" => ["UUU", "UUC"],
"L" => ["UUA", "UUG","CUU", "CUC", "CUA", "CUG"],
"S" => ["UCU", "UCC", "UCA", "UCG", "AGU", "AGC"],
"Y" => ["UAU", "UAC"],
"C" => ["UGU", "UGC"],
"W" => ["UGG"],
"P" => ["CCU", "CCC", "CCA", "CCG"],
"H" => ["CAU", "CAC"],
"Q" => ["CAA", "CAG"],
"R" => ["CGU", "CGC", "CGA", "CGG", "AGA", "AGG"],
"I" => ["AUU", "AUC", "AUA"],
"M" => ["AUG"],
"T" => ["ACU", "ACC", "ACA", "ACG"],
"N" => ["AAU", "AAC"],
"K" => ["AAA", "AAG"],
"V" => ["GUU", "GUC", "GUA", "GUG"],
"A" => ["GCU", "GCC", "GCA", "GCG"],
"D" => ["GAU", "GAC"],
"E" => ["GAA", "GAG"],
"G" => ["GGU", "GGC", "GGA", "GGG"],
"STOP" => ["UGA", "UAA", "UAG"],
}
AMINOACID_BY_CODON = CODONS_BY_AMINOACID.
flat_map { |c, as| as.map { |a| [a, c] } }.to_h
def self.problem_prot(aminoacids_string)
aminoacids_string.
scan(/[UGTCA]{3}/).
map { |codon| AMINOACID_BY_CODON[codon] }.
take_while { |aminoacid| aminoacid != "STOP" }.
join
end
end
-
1\$\begingroup\$ Consider merging the Stop codon list into the main table, then moving the
take_while
after themap
. \$\endgroup\$200_success– 200_success2016年05月13日 07:09:40 +00:00Commented May 13, 2016 at 7:09 -
2\$\begingroup\$ Let's try it. I already considered it, but was afraid conceptually it wasn't fitting to mix them. Maybe adding
lazy
in there would be a good idea. \$\endgroup\$tokland– tokland2016年05月13日 07:13:24 +00:00Commented May 13, 2016 at 7:13 -
1\$\begingroup\$ That's a good idea to programmatically invert the hash from the more maintainable/readable version and store it in a constant. I'll have to remember that :) \$\endgroup\$James Robey– James Robey2016年05月13日 16:22:41 +00:00Commented May 13, 2016 at 16:22
-
1\$\begingroup\$ James, I had this "aha moment" some years ago reading this: norvig.com/sudoku.html \$\endgroup\$tokland– tokland2016年05月13日 17:44:40 +00:00Commented May 13, 2016 at 17:44
-
\$\begingroup\$ That looks very fancy. Quite advanced. Thanks for the input! \$\endgroup\$2016年05月15日 20:09:44 +00:00Commented May 15, 2016 at 20:09
In your case, the 'return' is required to break out of the scan do/end block. You can remove the else part and just have 'list' alone as the last line of the method. Unless you actually wanted it to stop scanning on an invalid sequence like "UGA"?
I would do a key value map here, it's just much more straight forward. Keeps the logic part clean.
def replace(sub)
hash = {
"F" => ["UUU", "UUC"],
"L" => ["UUA", "UUG","CUU", "CUC", "CUA", "CUG"],
"S" => ["UCU", "UCC", "UCA", "UCG", "AGU", "AGC"],
"Y" => ["UAU", "UAC"],
"C" => ["UGU", "UGC"],
"W" => ["UGG"],
"P" => ["CCU", "CCC", "CCA", "CCG"],
"H" => ["CAU", "CAC"],
"Q" => ["CAA", "CAG"],
"R" => ["CGU", "CGC", "CGA", "CGG", "AGA", "AGG"],
"I" => ["AUU", "AUC", "AUA"],
"M" => ["AUG"],
"T" => ["ACU", "ACC", "ACA", "ACG"],
"N" => ["AAU", "AAC"],
"K" => ["AAA", "AAG"],
"V" => ["GUU", "GUC", "GUA", "GUG"],
"A" => ["GCU", "GCC", "GCA", "GCG"],
"D" => ["GAU", "GAC"],
"E" => ["GAA", "GAG"],
"G" => ["GGU", "GGC", "GGA", "GGG"]
}
hash.detect do |key, array|
break key if array.include? sub
end
end
def abbreviate(str)
list = ""
str.scan(/.../) do |sub|
letter = replace(sub)
break if letter.nil? # if you want to break because of "UGA" ?
list += letter
end
list
end
-
\$\begingroup\$
Unless you actually wanted it to stop scanning on an invalid sequence like "UGA"?
That's what the table implied, so I went with that. It also seemed to fit the idea of returning as soon as you know you're done. In hindsight I should probably return behind the switch anyway, at least that's what I'd do in other languages. \$\endgroup\$2016年05月12日 20:42:37 +00:00Commented May 12, 2016 at 20:42 -
1\$\begingroup\$ I don't know ruby so I may be missing something here, but wouldn't it be better to have the codons as keys and the amino acids as values for the hash? Something like
"AAG" => "K", "AAA" => "K"
instead of"K" => ["AAA", "AAG"]
. That way the lookup is direct and you don't need to scan an array each time. \$\endgroup\$terdon– terdon2016年05月13日 08:13:26 +00:00Commented May 13, 2016 at 8:13
If you wanted to look more ruby-ish, you could convert your switch statement into a hash and then inject
/reduce
your input string.
def abbreviate(str)
codes = {
'F' => ["UUU", "UUC"],
'L' => ["UUA", "UUG"],
'S' => ["UCU", "UCC", "UCA", "UCG", "AGU", "AGC"],
'Y' => ["UAU", "UAC"],
'C' => ["UGU", "UGC"],
'W' => ["UGG"],
'L' => ["CUU", "CUC", "CUA", "CUG"],
'P' => ["CCU", "CCC", "CCA", "CCG"],
'H' => ["CAU", "CAC"],
'Q' => ["CAA", "CAG"],
'R' => ["CGU", "CGC", "CGA", "CGG", "AGA", "AGG"],
'I' => ["AUU", "AUC", "AUA"],
'M' => ['AUG'],
'T' => [ "ACU", "ACC", "ACA", "ACG"],
'N' => ["AAU", "AAC"],
'K' => ["AAA", "AAG"],
'V' => ["GUU", "GUC", "GUA", "GUG"],
'A' => ["GCU", "GCC", "GCA", "GCG"],
'D' => ["GAU", "GAC"],
'E' => [ "GAA", "GAG"],
'G' => ["GGU", "GGC", "GGA", "GGG"]
}
str.scan(/.../).inject('') do |s, sub|
c = codes.select { |_, value| value.include? sub }.keys.first
c.nil? ? s : s += c
end
end
I'm sure there a few variations of this idea that would work well too. This example is just the first that occurred to me.
Explore related questions
See similar questions with these tags.