Consider a function that given an RNA sequence string, returns a string
representing the corresponding amino acids.
RNA is represented as string of letters A, C, G, and U,
representing the base pairs Adenine, Cytosine, Guanine,
and Uracil respectively. This differs from DNA in that
Uracil replaces Thymine, which is why this is AC GU instead
of the familiar AC GT (i.e. ' GATTACA').
The amino acids are also
represented by a single letter.
As an example, the string 'UUCGAACACUGAG' would be transformed into
'FEH.' and returned.
RNA works such that each group of three "letters" (i.e. base-pairs)
corresponds to the use of a particular amino acid,
or the STOP sequence which is represented here as a period.
If there are one or two extra letters at the end of the sequence, these should
be ignored. All input to the function is assumed to contain only the letters
A,C,G,U, and nothing else, though the number of characters may be arbitrary.
Below is a reference implementation that is not optimized, and
includes comments for the curious:
sub f {
my %g = (
# . - Stop
'UAA'=>'.','UAG'=>'.','UGA'=>'.',
# A - Alanine
'GCU'=>'A','GCC'=>'A','GCA'=>'A','GCG'=>'A',
# C - Cysteine
'UGU'=>'C','UGC'=>'C',
# D - Aspartic Acid
'GAU'=>'D','GAC'=>'D',
# E - Glutamic Acid
'GAA'=>'E','GAG'=>'E',
# F - Phenylalanine
'UUU'=>'F','UUC'=>'F',
# G - Glycine
'GGU'=>'G','GGC'=>'G','GGA'=>'G','GGG'=>'G',
# H - Histidine
'CAU'=>'H','CAC'=>'H',
# I - Isoleucine
'AUU'=>'I','AUC'=>'I','AUA'=>'I',
# K - Lysine
'AAA'=>'K','AAG'=>'K',
# L - Leucine
'CUU'=>'L','CUC'=>'L','CUA'=>'L','CUG'=>'L',
'UUA'=>'L','UUG'=>'L',
# M - Methionine
'AUG'=>'M',
# N - Asparagine
'AAU'=>'N','AAC'=>'N',
# P - Proline
'CCU'=>'P','CCC'=>'P','CCA'=>'P','CCG'=>'P',
# Q - Glutamine
'CAA'=>'Q','CAG'=>'Q',
# R - Arginine
'CGU'=>'R','CGC'=>'R','CGA'=>'R','CGG'=>'R',
'AGA'=>'R','AGG'=>'R',
# S - Serine
'UCU'=>'S','UCC'=>'S','UCA'=>'S','UCG'=>'S',
'AGU'=>'S','AGC'=>'S',
# T - Threonine
'ACU'=>'T','ACC'=>'T','ACA'=>'T','ACG'=>'T',
# V - Valine
'GUU'=>'V','GUC'=>'V','GUA'=>'V','GUG'=>'V',
# W - Tryptophan
'UGG'=>'W',
# Y - Tyrosine
'UAU'=>'Y','UAC'=>'Y',
);
$_=pop;s/.{1,3}/$g{$&}/g;$_
}
print f("ACCCACAUUUCAUAAAUAUCCCCUGAGCGGCUCUGAGGGCAACUGUUCUAAUC");
Interesting Links: Genetic Code, Golf challange: match U.S. State names
Update: Typo in the example 'GAG'->'CAC' fixed.
Re: (Golf) RNA Genetic Code Translator
by MeowChow (Vicar) on Jul 06, 2001 at 02:45 UTC
|
Here's a swing, strict at 222:
sub f {
my@r=qw(UA[AG]|UGA GC. - UG[UC] GA[UC] GA[AG] UU[UC] GG. CA[UC] AU[^
+G] - AA[AG] CU.|UU[AG]
AUG AA[UC] - CC. CA[AG] CG.|AG[AG] UC.|AG[UC] AC. - GU. UGG - UA[UC]
+);
((my$t=pop)=~s|...|chr 64+(grep$&=~/$r[$_]/,0..25)[0]|eg);$t=~y/@/./
+;$t
}
update: I can't count, the one above is actually 232. And as no_slogan points out, it's sometimes helpful to read the spec. 238 chars:
sub f {
my@r=qw(UA[AG]|UGA GC. - UG[UC] GA[UC] GA[AG] UU[UC] GG. CA[UC] AU[^
+G] - AA[AG] CU.|UU[AG] AUG AA[UC] - CC. CA[AG] CG.|AG[AG] UC.|AG[UC]
+AC. - GU. UGG - UA[UC] ^);
((my$t=pop)=~s|..?.?|chr 64+(grep$&=~/$r[$_]/,0..26)[0]|eg);$t=~y/@Z
+/./d;$t
}
MeowChow
s aamecha.s a..a\u$&owag.print | [reply] [Watch: Dir/Any] [d/l] [select] |
|
That's a pretty nifty way to do the encoding. It doesn't eliminate trailing characters that aren't part of a group of three, though.
| [reply] [Watch: Dir/Any] |
Re: (Golf) RNA Genetic Code Translator
by japhy (Canon) on Jul 06, 2001 at 01:30 UTC
|
There's no whitespace here, except for newlines (which I've decided not to count, although they've been placed to my advantage. It's 418 chars. I love my hash slices.
sub
B(){''}sub
Z(){(B)x13}sub
U(){(B)x31}sub
O(){(B)x83}sub
J(){(B)x343}sub
b(){B,B,B}@g{AAA..UUU}=(K,B,N,b,K,Z,N,U,T,B,T,b,T,Z,T,O,R,B,S,b,R,Z,S,
+J,
I,B,I,b,M,Z,I,(B)x811,Q,B,H,b,Q,Z,H,U,P,B,P,b,P,Z,P,O,R,B,R,b,R,Z,R,J,
+L,
B,L,b,L,Z,L,(B)x2163,E,B,D,b,E,Z,D,U,A,B,A,b,A,Z,A,O,G,B,G,b,G,Z,G,J,V
+,B
,V,b,V,Z,V,(B)x8923,'.',B,Y,b,'.',Z,Y,U,S,B,S,b,S,Z,S,O,'.',B,C,b,W,Z,
+C,
J,L,B,F,b,L,Z,F);sub
f{$_=pop;s/..?.?/$g{$&}/g;$_}
japhy --
Perl and Regex Hacker | [reply] [Watch: Dir/Any] [d/l] |
Re: (Golf) RNA Genetic Code Translator
by no_slogan (Deacon) on Jul 06, 2001 at 02:48 UTC
|
$_="KNNKtIIIMRSSRQHHQplr.YY.sLFFL.CCWEDDEavg";s/[a-z]/uc$&x4/eg;@x=/./
+g;join"",@x[map{$x=0;$x=$x*4|6&ord for/./g;$x/2}pop=~/.../g]
| [reply] [Watch: Dir/Any] [d/l] |
|
I can shave 4 chars off of that:
$_="KNNKtIIIMRSSRQHHQplr.YY.sLFFL.CCWEDDEavg";s/[a-z]/uc$&x4/eg;
join"",(/./g)[map{$x=0;$x=$x*4|6&ord for/./g;$x/2}pop=~/.../g]
The 15 year old, freshman programmer,
Stephen Rawls | [reply] [Watch: Dir/Any] [d/l] |
|
Bending the spec a bit at 123 (regarding treatment of leftover base pairs):
sub f {
$_=pop;y/ACUG/0123/;s|(.)(.)(.)|(map{ord>91?uc:(),uc}
'KnKttiIMRsRQhQppllrr.y.ssLfL.cWEdEaavvgg'=~/./g)[$1*16+$2*4+$3]|eg;$_
}
MeowChow
s aamecha.s a..a\u$&owag.print | [reply] [Watch: Dir/Any] [d/l] |
|
Stunning, to say the least, but what is more stunning is
the amateurish oversight that I made myself when posting my
entry. How could I have not used the range feature of tr?
I feel silly, but at least I'm not alone:
sub f {
$_=pop;y/ACUG/0-3/;s|(.)(.)(.)|(map{ord>91?uc:(),uc}
'KnKttiIMRsRQhQppllrr.y.ssLfL.cWEdEaavvgg'=~/./g)[$1*16+$2*4+$3]|eg;$_
}
I was looking at my entry, trying to save a few strokes,
motivated by scain's Benchmarks posted below. It was
immediately obvious how to save a few strokes, now that I'm
awake and caffinated and all.
Revised, mine ended up at 133, still a ways off of MeowChow at the
new and improved 122 posted above:
sub f{
$_=pop;y/UCAG/0-3/;s/(.)(.)(.)/substr
"FFLLSSSSYY..CC.WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
$1<<4|$2*4|$3,1/ge;s/\d//g;$_
}
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: (Golf) RNA Genetic Code Translator
by tadman (Prior) on Jul 06, 2001 at 13:54 UTC
|
My first real attempt came in at 137 characters, not
including cosmetic linebreaks:
sub f{
$_=pop;y/UCAG/0123/;s/(.)(.)(.)/substr
"FFLLSSSSYY..CC.WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"
,$1<<4|$2<<2|$3,1/ge;y/0123//d;$_
}
Which I thought was pretty decent, but it's a little behind
the times. Strangely when I use the simple, but elegant
compression technique introduced by no_slogan,
this code expands. I'll have to look into that more. | [reply] [Watch: Dir/Any] [d/l] |
Re: (Golf) RNA Genetic Code Translator
by japhy (Canon) on Jul 06, 2001 at 00:48 UTC
|
Are you asking us to golf the entire function, or just the substitution part? Here's a small savings:
sub RNA {
# hash here
$_=pop;s/..?.?/$g{$&}/g;$_
}
The hash will take me some more time. I'll do that later.
japhy --
Perl and Regex Hacker | [reply] [Watch: Dir/Any] [d/l] |
|
When I saw that there was already a reply I figured
someone pulled a use RNA out of their back
pocket.
The reference function has the hash in it, so yes, any
compatible one would also have to, though presumably in a
more compact format, which you could obtain by using the
reference function hash as input data. Saves typing it in
yourself and all that.
| [reply] [Watch: Dir/Any] [d/l] |
Re: (Golf) RNA Genetic Code Translator
by scain (Curate) on Jul 06, 2001 at 17:50 UTC
|
| [reply] [Watch: Dir/Any] |
Re: (Golf) RNA Genetic Code Translator
by tachyon (Chancellor) on Jul 06, 2001 at 03:01 UTC
|
sub RNA {
@_{'UAAUAGUGAGCUGCCGCAGCGUGUUGCGAUGACGAAGAGUUUUUCGGUGGCGGAGGGCAUCACAUU
+AUCAUAAAAAAGCUUCUCCUACUGUUAUUGAUGAAUAACCCUCCCCCACCGCAACAGCGUCGCCGACGG
+AGAAGGUCUUCCUCAUCGAGUAGCACUACCACAACGGUUGUCGUAGUGUGGUAUUAC'=~/(...)/g}
+=split//,'...AAAACCDDEEFFGGGGHHIIIKKLLLLLLMNNPPPPQQRRRRRRSSSSSSTTTTVV
+VVWYY';$_=pop;s/..?.?/$_{$&}/g;$_
}
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] [Watch: Dir/Any] [d/l] |
Re: (Golf) RNA Genetic Code Translator
by srawls (Friar) on Jul 06, 2001 at 03:45 UTC
|
As an example, the string 'UUCGAAGAGUGAG' would be transformed into 'FEH.' and returned.
I hope this is a typo, cause your hash makes it: 'FEE.'
The 15 year old, freshman programmer,
Stephen Rawls | [reply] [Watch: Dir/Any] |
Re: (Golf) RNA Genetic Code Translator
by scain (Curate) on Jul 06, 2001 at 21:04 UTC
|
update: DNA, RNA what's the difference? My original
code used the cDNA, not the mRNA. I changed it and reran it,
and everyone's code now works except for japhy's.
OK, this is going to be a long one...
I was going to benchmark these golf examples to see which one
was fastest, but there seems to be some cheating going on.
Honestly, I don't really understand what any of these is doing,
so I don't know if the cheating was intentional or not. To
do the benchmarking, was was going to use the
CFTR mRNA (that
is the protein that, when mutated, causes cystic fibrosis).
The mRNA (with leading and trailing sequence removed)
is in the __DATA__ section of the code. The correct
translation looks like this:
MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRELASKKNPKLI
+NALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIAIYLGIGLCLLFIVRTLLLH
+PAIFGLHHIGMQMRIAMFSLIYKKTLKLSSRVLDKISIGQLVSLLSNNLNKFDEGLALAHFVWIAPLQV
+ALLMGLIWELLQASAFCGLGFLIVLALFQAGLGRMMMKYRDQRAGKISERLVITSEMIENIQSVKAYCW
+EEAMEKMIENLRQTELKLTRKAAYVRYFNSSAFFFSGFFVVFLSVLPYALIKGIILRKIFTTISFCIVL
+RMAVTRQFPWAVQTWYDSLGAINKIQDFLQKQEYKTLEYNLTTTEVVMENVTAFWEEGFGELFEKAKQN
+NNNRKTSNGDDSLFFSNFSLLGTPVLKDINFKIERGQLLAVAGSTGAGKTSLLMMIMGELEPSEGKIKH
+SGRISFCSQFSWIMPGTIKENIIFGVSYDEYRYRSVIKACQLEEDISKFAEKDNIVLGEGGITLSGGQR
+ARISLARAVYKDADLYLLDSPFGYLDVLTEKEIFESCVCKLMANKTRILVTSKMEHLKKADKILILNEG
+SSYFYGTFSELQNLQPDFSSKLMGCDSFDQFSAERRNSILTETLHRFSLEGDAPVSWTETKKQSFKQTG
+EFGEKRKNSILNPINSIRKFSIVQKTPLQMNGIEEDSDEPLERRLSLVPDSEQGEAILPRISVISTGPT
+LQARRRQSVLNLMTHSVNQGQNIHRKTTASTRKVSLAPQANLTELDIYSRRLSQETGLEISEEINEEDL
+KECLFDDMESIPAVTTWNTYLRYITVHKSLIFVLIWCLVIFLAEVAASLVVLWLLGNTPLQDKGNSTHS
+RNNSYAVIITSTSSYYVFYIYVGVADTLLAMGFFRGLPLVHTLITVSKILHHKMLHSVLQAPMSTLNTL
+KAGGILNRFSKDIAILDDLLPLTIFDFIQLLLIVIGAIAVVAVLQPYIFVATVPVIVAFIMLRAYFLQT
+SQQLKQLESEGRSPIFTHLVTSLKGLWTLRAFGRQPYFETLFHKALNLHTANWFLYLSTLRWFQMRIEM
+IFVIFFIAVTFISILTTGEGEGRVGIILTLAMNIMSTLQWAVNSSIDVDSLMRSVSRVFKFIDMPTEGK
+PTKSTKPYKNGQLSKVMIIENSHVKKDDIWPSGGQMTVKDLTAKYTEGGNAILENISFSISPGQRVGLL
+GRTGSGKSTLLSAFLRLLNTEGEIQIDGVSWDSITLQQWRKAFGVIPQKVFIFSGTFRKNLDPYEQWSD
+QEIWKVADEVGLRSVIEQFPGKLDFVLVDGGCVLSHGHKQLMCLARSVLSKAKILLLDEPSAHLDPVTY
+QIIRRTLKQAFADCTVILCEHRIEAMLECQQFLVIEENKVRQYDSIQKLLNERSLFRQAISPSDRVKLF
+PHRNSSKCKSKPQIAALKEETEEEVQDTRL.
However, tachyon's, MeowChow's and tadman's orignal codes all gave this:
QRPEKASKSTRPRKGRQREDQPADEKEREREAKKPKARRRGGETKAQPGRADPNKEERAGGRTHPAGHGQ
+RAKKTKSRKGQNNNKEGAAAPQAGEQAAGGAQAGGRKRQRAGKERTEEQKAEEAEKENRQTEKTRKAAR
+SAGPAKGRKTTRATRQPAQTDGANKQQKQEKTENTTTEETAEEGGEEKAKQNNRKTGDSGTPKKERGQA
+AGTGAGKTGEEPEGKKHGRQPGTKEGERRSKAQEEDKAEKDGEGGTGGQRARARAKADPGTEKEESKAN
+KTRTKEKKADKEGSSGTEQQPDSKGDQAERRTETHREGAPTETKKQKQTGEGEKRKPNRKQKTPQGEEE
+PERRPEQGEAPRSSTGPTQARRRQNTHNQGQNHRKTTATRKAPQANTERRQETGEEENEEDKEESPATT
+NTRTHKSAEAAGNTPQDKGTRNSATSTGADTAGRGPTTKHHKQAPTNTKAGGRKADPTDQGAAAQPATP
+ARAQTQQKQEEGRPTTSKGTRAGRQPETHKATANTRQREATTTGEGEGRGTATQANSSRSRKDPTEGKP
+TKTKPKGQKEHKKDPGGQTKTAKTEGGAENPGQRGGRTGGKTARNTEGEQGTQQRKAGPQKGTRKNPEQ
+QEKAEGREQPGKDGGSGHKQARKAKEPAPTQRRTKQAATEHREAEQQEENKRQQKNERSRQASPDRKPH
+RNSKKKPQAAKEETEEEQTR
It is not at all clear to me why, and it is not at all related
to CFTR. For that matter, it's not related to any protein
in public databases. Congradulations, you did gene discovery;
pharamceutical companies spent billions of dollars to do that :-)
Also, japhy's code returns nothing (except some line feeds apparently).
So, can anyone point out the problems with these subs? I
copied them directly from the html, and only removed "+" at
the beginning of code wrapped lines, and changed the name of
the subs. Here is my code:
#!/usr/bin/perl
while (<DATA>) {
$cftr=$_;
}
print "tadman original\n".f0($cftr)."\n\n";
print "japhy\n".f1($cftr)."\n\n";
print "MeowChow\n".f2($cftr)."\n\n";
print "no_slogan\n".f3($cftr)."\n\n";
print "srawls\n".f4($cftr)."\n\n";
print "tachyon\n".RNA($cftr)."\n\n";
print "tadman golf\n".f5($cftr)."\n\n";
sub f0 { # orginal by tadman
my %g = (
# . - Stop
'UAA'=>'.','UAG'=>'.','UGA'=>'.',
# A - Alanine
'GCU'=>'A','GCC'=>'A','GCA'=>'A','GCG'=>'A',
# C - Cysteine
'UGU'=>'C','UGC'=>'C',
# D - Aspartic Acid
'GAU'=>'D','GAC'=>'D',
# E - Glutamic Acid
'GAA'=>'E','GAG'=>'E',
# F - Phenylalanine
'UUU'=>'F','UUC'=>'F',
# G - Glycine
'GGU'=>'G','GGC'=>'G','GGA'=>'G','GGG'=>'G',
# H - Histidine
'CAU'=>'H','CAC'=>'H',
# I - Isoleucine
'AUU'=>'I','AUC'=>'I','AUA'=>'I',
# K - Lysine
'AAA'=>'K','AAG'=>'K',
# L - Leucine
'CUU'=>'L','CUC'=>'L','CUA'=>'L','CUG'=>'L',
'UUA'=>'L','UUG'=>'L',
# M - Methionine
'AUG'=>'M',
# N - Asparagine
'AAU'=>'N','AAC'=>'N',
# P - Proline
'CCU'=>'P','CCC'=>'P','CCA'=>'P','CCG'=>'P',
# Q - Glutamine
'CAA'=>'Q','CAG'=>'Q',
# R - Arginine
'CGU'=>'R','CGC'=>'R','CGA'=>'R','CGG'=>'R',
'AGA'=>'R','AGG'=>'R',
# S - Serine
'UCU'=>'S','UCC'=>'S','UCA'=>'S','UCG'=>'S',
'AGU'=>'S','AGC'=>'S',
# T - Threonine
'ACU'=>'T','ACC'=>'T','ACA'=>'T','ACG'=>'T',
# V - Valine
'GUU'=>'V','GUC'=>'V','GUA'=>'V','GUG'=>'V',
# W - Tryptophan
'UGG'=>'W',
# Y - Tyrosine
'UAU'=>'Y','UAC'=>'Y',
);
$_=pop;s/.{1,3}/$g{$&}/g;$_
}
sub #japhy
B(){''}sub
Z(){(B)x13}sub
U(){(B)x31}sub
O(){(B)x83}sub
J(){(B)x343}sub
b(){B,B,B}@g{AAA..UUU}=(K,B,N,b,K,Z,N,U,T,B,T,b,T,Z,T,O,R,B,S,
+b,R,Z,S,J,
I,B,I,b,M,Z,I,(B)x811,Q,B,H,b,Q,Z,H,U,P,B,P,b,P,Z,P,O,R,B,R,b,
+R,Z,R,J,L,
B,L,b,L,Z,L,(B)x2163,E,B,D,b,E,Z,D,U,A,B,A,b,A,Z,A,O,G,B,G,b,G
+,Z,G,J,V,B
,V,b,V,Z,V,(B)x8923,'.',B,Y,b,'.',Z,Y,U,S,B,S,b,S,Z,S,O,'.',B,
+C,b,W,Z,C,
J,L,B,F,b,L,Z,F);sub
f1{$_=pop;s/..?.?/$g{$&}/g;$_}
sub f2{ #MeowChow
my@r=qw(UA[AG]|UGA GC. - UG[UC] GA[UC] GA[AG] UU[UC] GG. CA[UC] AU[^G]
+ - AA[AG] CU.|UU[AG] AUG AA[UC] - CC. CA[AG] CG.|AG[AG] UC.|AG[UC] AC
+. - GU. UGG - UA[UC] ^);
((my$t=pop)=~s|..?.?|chr 64+(grep$&=~/$r[$_]/,0..26)[0]|eg);$t=~y/@Z/.
+/d;$t
}
sub f3 { #no_slogan
$_="KNNKtIIIMRSSRQHHQplr.YY.sLFFL.CCWEDDEavg";s/[a-z]/uc$&x4/eg;@x=/./
+g;join"",@x[map{$x=0;$x=$x*4|6&ord for/./g;$x/2}pop=~/.../g]
}
sub f4 { #srawls
$_="KNNKtIIIMRSSRQHHQplr.YY.sLFFL.CCWEDDEavg";s/[a-z]/uc$&x4/eg;
join"",(/./g)[map{$x=0;$x=$x*4|6&ord for/./g;$x/2}pop=~/.../g]
}
sub RNA { #tachyon
@_{'UAAUAGUGAGCUGCCGCAGCGUGUUGCGAUGACGAAGAGUUUUUCGGUGGCGGAGGGCAUCACAUU
+AUCAUAAAAAAGCUUCUCCUACUGUUAUUGAUGAAUAACCCUCCCCCACCGCAACAGCGUCGCCGACGG
+AGAAGG
UCUUCCUCAUCGAGUAGCACUACCACAACGGUUGUCGUAGUGUGGUAUUAC'=~/(...)/g}=split/
+/,'...AAAACCDDEEFFGGGGHHIIIKKLLLLLLMNNPPPPQQRRRRRRSSSSSSTTTTVVVVWYY';
+$_=pop
;s/..?.?/$_{$&}/g;$_
}
sub f5{ #tadman
$_=pop;y/UCAG/0123/;s/(.)(.)(.)/substr
"FFLLSSSSYY..CC.WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"
,$1<<4|$2<<2|$3,1/ge;y/0123//d;$_
}
#>gi|6995995|ref|NM_000492.2| Homo sapiens cystic fibrosis transmembra
+ne conductance regulator, ATP-binding cassette (sub-family C, member
+7) (CF
TR), mRNA
__DATA__
AUGCAGAGGUCGCCUCUGGAAAAGGCCAGCGUUGUCUCCAAACUUUUUUUCAGCUGGACCAGACCAAUUU
+UGAGGAAAGGAUACAGACAGCGCCUGGAAUUGUCAGACAUAUACCAAAUCCCUUCUGUUGAUUCUGCUG
+ACAAUCUAUCUGAAAAAUUGGAAAGAGAAUGGGAUAGAGAGCUGGCUUCAAAGAAAAAUCCUAAACUCA
+UUAAUGCCCUUCGGCGAUGUUUUUUCUGGAGAUUUAUGUUCUAUGGAAUCUUUUUAUAUUUAGGGGAAG
+UCACCAAAGCAGUACAGCCUCUCUUACUGGGAAGAAUCAUAGCUUCCUAUGACCCGGAUAACAAGGAGG
+AACGCUCUAUCGCGAUUUAUCUAGGCAUAGGCUUAUGCCUUCUCUUUAUUGUGAGGACACUGCUCCUAC
+ACCCAGCCAUUUUUGGCCUUCAUCACAUUGGAAUGCAGAUGAGAAUAGCUAUGUUUAGUUUGAUUUAUA
+AGAAGACUUUAAAGCUGUCAAGCCGUGUUCUAGAUAAAAUAAGUAUUGGACAACUUGUUAGUCUCCUUU
+CCAACAACCUGAACAAAUUUGAUGAAGGACUUGCAUUGGCACAUUUCGUGUGGAUCGCUCCUUUGCAAG
+UGGCACUCCUCAUGGGGCUAAUCUGGGAGUUGUUACAGGCGUCUGCCUUCUGUGGACUUGGUUUCCUGA
+UAGUCCUUGCCCUUUUUCAGGCUGGGCUAGGGAGAAUGAUGAUGAAGUACAGAGAUCAGAGAGCUGGGA
+AGAUCAGUGAAAGACUUGUGAUUACCUCAGAAAUGAUUGAAAAUAUCCAAUCUGUUAAGGCAUACUGCU
+GGGAAGAAGCAAUGGAAAAAAUGAUUGAAAACUUAAGACAAACAGAACUGAAACUGACUCGGAAGGCAG
+CCUAUGUGAGAUACUUCAAUAGCUCAGCCUUCUUCUUCUCAGGGUUCUUUGUGGUGUUUUUAUCUGUGC
+UUCCCUAUGCACUAAUCAAAGGAAUCAUCCUCCGGAAAAUAUUCACCACCAUCUCAUUCUGCAUUGUUC
+UGCGCAUGGCGGUCACUCGGCAAUUUCCCUGGGCUGUACAAACAUGGUAUGACUCUCUUGGAGCAAUAA
+ACAAAAUACAGGAUUUCUUACAAAAGCAAGAAUAUAAGACAUUGGAAUAUAACUUAACGACUACAGAAG
+UAGUGAUGGAGAAUGUAACAGCCUUCUGGGAGGAGGGAUUUGGGGAAUUAUUUGAGAAAGCAAAACAAA
+ACAAUAACAAUAGAAAAACUUCUAAUGGUGAUGACAGCCUCUUCUUCAGUAAUUUCUCACUUCUUGGUA
+CUCCUGUCCUGAAAGAUAUUAAUUUCAAGAUAGAAAGAGGACAGUUGUUGGCGGUUGCUGGAUCCACUG
+GAGCAGGCAAGACUUCACUUCUAAUGAUGAUUAUGGGAGAACUGGAGCCUUCAGAGGGUAAAAUUAAGC
+ACAGUGGAAGAAUUUCAUUCUGUUCUCAGUUUUCCUGGAUUAUGCCUGGCACCAUUAAAGAAAAUAUCA
+UCUUUGGUGUUUCCUAUGAUGAAUAUAGAUACAGAAGCGUCAUCAAAGCAUGCCAACUAGAAGAGGACA
+UCUCCAAGUUUGCAGAGAAAGACAAUAUAGUUCUUGGAGAAGGUGGAAUCACACUGAGUGGAGGUCAAC
+GAGCAAGAAUUUCUUUAGCAAGAGCAGUAUACAAAGAUGCUGAUUUGUAUUUAUUAGACUCUCCUUUUG
+GAUACCUAGAUGUUUUAACAGAAAAAGAAAUAUUUGAAAGCUGUGUCUGUAAACUGAUGGCUAACAAAA
+CUAGGAUUUUGGUCACUUCUAAAAUGGAACAUUUAAAGAAAGCUGACAAAAUAUUAAUUUUGAAUGAAG
+GUAGCAGCUAUUUUUAUGGGACAUUUUCAGAACUCCAAAAUCUACAGCCAGACUUUAGCUCAAAACUCA
+UGGGAUGUGAUUCUUUCGACCAAUUUAGUGCAGAAAGAAGAAAUUCAAUCCUAACUGAGACCUUACACC
+GUUUCUCAUUAGAAGGAGAUGCUCCUGUCUCCUGGACAGAAACAAAAAAACAAUCUUUUAAACAGACUG
+GAGAGUUUGGGGAAAAAAGGAAGAAUUCUAUUCUCAAUCCAAUCAACUCUAUACGAAAAUUUUCCAUUG
+UGCAAAAGACUCCCUUACAAAUGAAUGGCAUCGAAGAGGAUUCUGAUGAGCCUUUAGAGAGAAGGCUGU
+CCUUAGUACCAGAUUCUGAGCAGGGAGAGGCGAUACUGCCUCGCAUCAGCGUGAUCAGCACUGGCCCCA
+CGCUUCAGGCACGAAGGAGGCAGUCUGUCCUGAACCUGAUGACACACUCAGUUAACCAAGGUCAGAACA
+UUCACCGAAAGACAACAGCAUCCACACGAAAAGUGUCACUGGCCCCUCAGGCAAACUUGACUGAACUGG
+AUAUAUAUUCAAGAAGGUUAUCUCAAGAAACUGGCUUGGAAAUAAGUGAAGAAAUUAACGAAGAAGACU
+UAAAGGAGUGCCUUUUUGAUGAUAUGGAGAGCAUACCAGCAGUGACUACAUGGAACACAUACCUUCGAU
+AUAUUACUGUCCACAAGAGCUUAAUUUUUGUGCUAAUUUGGUGCUUAGUAAUUUUUCUGGCAGAGGUGG
+CUGCUUCUUUGGUUGUGCUGUGGCUCCUUGGAAACACUCCUCUUCAAGACAAAGGGAAUAGUACUCAUA
+GUAGAAAUAACAGCUAUGCAGUGAUUAUCACCAGCACCAGUUCGUAUUAUGUGUUUUACAUUUACGUGG
+GAGUAGCCGACACUUUGCUUGCUAUGGGAUUCUUCAGAGGUCUACCACUGGUGCAUACUCUAAUCACAG
+UGUCGAAAAUUUUACACCACAAAAUGUUACAUUCUGUUCUUCAAGCACCUAUGUCAACCCUCAACACGU
+UGAAAGCAGGUGGGAUUCUUAAUAGAUUCUCCAAAGAUAUAGCAAUUUUGGAUGACCUUCUGCCUCUUA
+CCAUAUUUGACUUCAUCCAGUUGUUAUUAAUUGUGAUUGGAGCUAUAGCAGUUGUCGCAGUUUUACAAC
+CCUACAUCUUUGUUGCAACAGUGCCAGUGAUAGUGGCUUUUAUUAUGUUGAGAGCAUAUUUCCUCCAAA
+CCUCACAGCAACUCAAACAACUGGAAUCUGAAGGCAGGAGUCCAAUUUUCACUCAUCUUGUUACAAGCU
+UAAAAGGACUAUGGACACUUCGUGCCUUCGGACGGCAGCCUUACUUUGAAACUCUGUUCCACAAAGCUC
+UGAAUUUACAUACUGCCAACUGGUUCUUGUACCUGUCAACACUGCGCUGGUUCCAAAUGAGAAUAGAAA
+UGAUUUUUGUCAUCUUCUUCAUUGCUGUUACCUUCAUUUCCAUUUUAACAACAGGAGAAGGAGAAGGAA
+GAGUUGGUAUUAUCCUGACUUUAGCCAUGAAUAUCAUGAGUACAUUGCAGUGGGCUGUAAACUCCAGCA
+UAGAUGUGGAUAGCUUGAUGCGAUCUGUGAGCCGAGUCUUUAAGUUCAUUGACAUGCCAACAGAAGGUA
+AACCUACCAAGUCAACCAAACCAUACAAGAAUGGCCAACUCUCGAAAGUUAUGAUUAUUGAGAAUUCAC
+ACGUGAAGAAAGAUGACAUCUGGCCCUCAGGGGGCCAAAUGACUGUCAAAGAUCUCACAGCAAAAUACA
+CAGAAGGUGGAAAUGCCAUAUUAGAGAACAUUUCCUUCUCAAUAAGUCCUGGCCAGAGGGUGGGCCUCU
+UGGGAAGAACUGGAUCAGGGAAGAGUACUUUGUUAUCAGCUUUUUUGAGACUACUGAACACUGAAGGAG
+AAAUCCAGAUCGAUGGUGUGUCUUGGGAUUCAAUAACUUUGCAACAGUGGAGGAAAGCCUUUGGAGUGA
+UACCACAGAAAGUAUUUAUUUUUUCUGGAACAUUUAGAAAAAACUUGGAUCCCUAUGAACAGUGGAGUG
+AUCAAGAAAUAUGGAAAGUUGCAGAUGAGGUUGGGCUCAGAUCUGUGAUAGAACAGUUUCCUGGGAAGC
+UUGACUUUGUCCUUGUGGAUGGGGGCUGUGUCCUAAGCCAUGGCCACAAGCAGUUGAUGUGCUUGGCUA
+GAUCUGUUCUCAGUAAGGCGAAGAUCUUGCUGCUUGAUGAACCCAGUGCUCAUUUGGAUCCAGUAACAU
+ACCAAAUAAUUAGAAGAACUCUAAAACAAGCAUUUGCUGAUUGCACAGUAAUUCUCUGUGAACACAGGA
+UAGAAGCAAUGCUGGAAUGCCAACAAUUUUUGGUCAUAGAAGAGAACAAAGUGCGGCAGUACGAUUCCA
+UCCAGAAACUGCUGAACGAGAGGAGCCUCUUCCGGCAAGCCAUCAGCCCCUCCGACAGGGUGAAGCUCU
+UUCCCCACCGGAACUCAAGCAAGUGCAAGUCUAAGCCCCAGAUUGCUGCUCUGAAAGAGGAGACAGAAG
+AAGAGGUGCAAGAUACAAGGCUUUAG
Happy debugging golfed code,
Scott | [reply] [Watch: Dir/Any] [d/l] [select] |
|
I'm not sure how you got those results, and the code you
posted had some trouble running too. Apparently the __DATA__
wasn't being imported correctly.
I changed that to a definition:
$cftr="AUGCAGAGGUCGCCUCUGGAAA...";
Everything ran fine after that, except that japhy
just spins for a while and then outputs nothing.
Otherwise, the results appear to be as expected.
Update: With respect to scain's update, this
update basically says that I didn't actually read
his update, and so, this entire node is kind of pointless.
| [reply] [Watch: Dir/Any] [d/l] |
Re: (Golf) RNA Genetic Code Translator
by scain (Curate) on Jul 09, 2001 at 19:19 UTC
|
#!/usr/bin/perl
use Benchmark;
my $iter=100000;
while (<DATA>){ #tadman original
timethis($iter, sub {
$_=pop;y/UCAG/0123/;s/(.)(.)(.)/substr
"FFLLSSSSYY..CC.WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"
,$1<<4|$2<<2|$3,1/ge;y/0123//d;$_
});
}
__DATA__
yada yada... the CFTR mRNA from above.
| [reply] [Watch: Dir/Any] [d/l] |
|
I think MeowChow's would be nearly as fast,
except that for reasons of brevity it performs this
crazy map operation on every base-pair triplet
substitution.
On this train of thought, is there such as thing as a
Benchmark-type routine that will test performance on
a variety of data sizes? So many times people benchmark
a variety of routines with only one set of data, which
has the result of being a 1-dimensional test where there
are actually 2 independent variables (function and data set
size).
In line with Big-O Notation, is it possible and/or has someone
written a Benchmark-type module which would estimate what
kind of O(f(n)) function would best represent how the
algorithm in question scales? Certainly not trivial by any
means, but not impossible either.
| [reply] [Watch: Dir/Any] |
|
This is not exactly what you asked for, but it would be easy
to place several sequences in the __DATA__ chunk,
one to a line, and since the timethis is in a
while loop, it will iterate over each seqence.
If you wanted to be really anal, you could then pipe the output
to another perl program to parse and do statistics.
Scott
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|