[Bioperl-l] Fwd: spliced

Tristan Lefebure tristan.lefebure at gmail.com
Mon Jul 15 13:45:35 UTC 2013


Forwarding to the list (sorry):

---------- Forwarded message ----------
From: Tristan Lefebure <tristan.lefebure at gmail.com>
Date: Mon, Jul 15, 2013 at 3:44 PM
Subject: Re: [Bioperl-l] spliced
To: "Fields, Christopher J" <cjfields at illinois.edu>


and I just looked at the genbank files distributed by ensembl: the problems
are gone. So this is indeed a poor support of the EMBL format by bioperl or
ensembl.
Regards,
--
Tristan


On Mon, Jul 15, 2013 at 3:06 PM, Tristan Lefebure <
tristan.lefebure at gmail.com> wrote:

> Hi Chris,
> These are ensembl files, my understanding is that for this species they
> use flybase as a starting point.
> The file can be found here (sorry quite heavy):
>
> ftp://ftp.ensemblgenomes.org/pub/release-19/metazoa/embl/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.19.dat.gz
>
> I selected another CDS (FBtr0310395) that also had nonsense STOPs and had
> a simpler structure. Here are chunks of the ensembl file related to this
> CDS:
>
>
> ----------------------
> ID   X    standard; DNA; HTG; 22422827 BP.
> XX
> AC   chromosome:BDGP5:X:1:22422827:1
> XX
> SV   X.BDGP5
> XX
> DT   18-JUN-2013
>
> [...]
>
> FT   gene            15673179..15674095
> FT                   /gene=FBgn0263748
> FT                   /locus_tag="CG43673"
> FT   mRNA            join(15673179..15673224,15673413..15674095)
> FT                   /gene="FBgn0263748"
> FT                   /note="transcript_id=FBtr0310395"
> FT   CDS             join(15673215..15673224,15673413..15674095)
> FT                   /gene="FBgn0263748"
> FT                   /protein_id="FBpp0302543"
> FT                   /note="transcript_id=FBtr0310395"
> FT                   /db_xref="flybase_transcript_id:FBtr0310395"
> FT                   /db_xref="FlyBaseCGID_transcript:CG43673-RA"
> FT                   /db_xref="FlyBaseCGID_translation:CG43673-PA"
> FT                   /db_xref="FlyBaseName_transcript:FBtr0310395"
> FT                   /db_xref="FlyBaseName_translation:FBpp0302543"
> FT                   /db_xref="GO:0005576"
> FT                   /db_xref="GO:0006030"
> FT                   /db_xref="GO:0008061"
> FT                   /db_xref="flybase_translation_id:FBpp0302543"
>
> FT                   /db_xref="goslim_goa:GO:0003674"
> FT                   /db_xref="goslim_goa:GO:0005575"
> FT                   /db_xref="goslim_goa:GO:0005576"
>
> FT                   /db_xref="goslim_goa:GO:0008150"
> FT                   /db_xref="UniParc:UPI0002945AD1"
> FT
> /translation="MKVWIAQHLFVVILVSSAVPLTEALGSTVCADRFNGLSFADPASC
> FT
> SSFFVCQRGNAVRRECSNGLYYDPKIQTCNLPGLVKCFNGDRGGSVLGDVKANVTLVPN
> FT
> GKANGEVTTTPPQTTTCPPTTTVTPAVTTKKSKLILDTEDADDAHSIFQVTPHPLTNRI
> FT
> DVLRSQRDCRGINDGEYLTDPKHCRRFYMCHKNRVKRHNCPRNQWFDRETKSCQDRELV
> FT                   LNCPVNRN"
>
> [...]
>
>
> FT   exon            15673413..15674095
> FT                   /note="exon_id=FBgn0263748:2"
> FT   exon            15673179..15673224
> FT                   /note="exon_id=FBgn0263748:1"
>
> [...]
>
> XX
> SQ   Sequence 22422827 BP; 6409325 A; 4742952 C; 4748415 G; 6432035 T;
> 90100
> SQ   other;
>      CAACATTAGC GCCATGCCCA CTGTGGGGAA TTTACCAGCA GCCCGCACAC TTAGCCGGCC
>    60
>
> [...]
>
> --------------------------
>
>
> I encountered 2 problems using this file with bioperl:
>
> *1- while converting the file into fasta format, I got the following:*
>
> >X Drosophila melanogaster chromosome X BDGP5 full sequence 1..22422827
> annotated by Ensembl Genomes
> SQOTHERCAACATTAGCGCCATGCCCACTGTGGGGAATTTACCAGCAGCCCGCACACTTA
> GCCGGCCTGCTGCAAAGCGGGATTTATTTAATTCATCCTCCAAGAGCCCAAACGAGCATC
> CTATGAGTTTCTCGGAAGTGGTAGCTGGAGCAGGTCCAGTTTCTATGGCACCCCCTAATC
> [...]
>
> See the "SQOTHER" ? Looks like the second line with the SQ header is read
> as a sequence...
>
>
> *2- if you extract the CDS with the attached script, you don't get the
> expected sequence for this loci (and many others, not all though):*
>
> >FBtr0310395|FBgn0263748|X
> YQHVFRDCTASICGDPSVVGGAID*SFGQHRLCGSLQWPVVCGSGQLLQLLCVPAW*CRS
> ARVLQWPVLRSKDPDLQSTRTSQMFQRGSRRFCAGRRQSKRHFGAQWKGQWGGHHDATTN
> NHLSTDDDSDTRSDHQKIETYSRYRGCR*CPFHLPSYSASTH*QN*CAEIPARLPWNKRW
> RVFDRSQTLPSFLYVP*ESGQAP*LPTESVVRSGDEILPRSRVGTELPSQS
>
> It's totally off. I realigned the exported sequence to the genome what
> bioperl has done is actually:
>    join(15673208..15673216,15673405..15674088)
>
>  it was supposed to be:
>   join(15673215..15673224,15673413..15674095)
>
>
> Looks like real bugs, what do you think ? It would be easier to play with
> a smaller example...
>
> Thanks!
> --
> Tristan
>
>
>
>
>
>
>
>
> On Fri, Jul 12, 2013 at 7:59 PM, Fields, Christopher J <
> cjfields at illinois.edu> wrote:
>
>> According to FlyBase that particular gene is listed as incomplete.  Do
>> you have the original EMBL nuc accession # so we can test this?
>>
>> chris
>>
>> On Jul 12, 2013, at 10:16 AM, Tristan Lefebure <
>> tristan.lefebure at gmail.com> wrote:
>>
>> > Dear bioperlers,
>> >
>> > I am trying to extract CDS sequences from ensembl EMBL files (*.dat),
>> > but I get STOP codons where I suppose I should not get some. I am
>> > using the usual way of extracting coding seq, splice the seq, and then
>> > translate them:
>> >
>> > $feat_object->spliced_seq->translate(-codontable_id => $genetcode)->seq
>> >
>> > An example with this CDS:
>> >
>> > FT   CDS             join(81211..83052,122064..122745,122800..123184,
>> > FT                   135429..135872,138620..139940,140189..141784,
>> > FT                   141841..145922,145985..146089,146143..147085,
>> > FT                   147145..147638,147696..148243)
>> > FT                   /gene="FBgn0001313"
>> > FT                   /protein_id="FBpp0289299"
>> > FT                   /note="transcript_id=FBtr0300022"
>> > FT                   /db_xref="flybase_transcript_id:FBtr0300022"
>> > FT                   /db_xref="RefSeq_peptide:NP_001015505"
>> > FT                   /db_xref="RefSeq_mRNA:NM_001015505"
>> > FT                   /db_xref="Uniprot/SPTREMBL:Q281X0"
>> > FT                   /db_xref="Uniprot/SPTREMBL:Q5LJP0"
>> > FT                   /db_xref="FlyBaseCGID_transcript:CG17866-RB"
>> > FT                   /db_xref="FlyBaseCGID_translation:CG17866-PB"
>> > FT                   /db_xref="FlyBaseName_transcript:FBtr0300022"
>> > FT                   /db_xref="FlyBaseName_translation:FBpp0289299"
>> > FT                   /db_xref="GO:0003774"
>> > FT                   /db_xref="GO:0003777"
>> > FT                   /db_xref="GO:0003777"
>> > FT                   /db_xref="GO:0005524"
>> > FT                   /db_xref="GO:0005524"
>> > FT                   /db_xref="GO:0005858"
>> > FT                   /db_xref="GO:0005875"
>> > FT                   /db_xref="GO:0007018"
>> > FT                   /db_xref="GO:0007018"
>> > FT                   /db_xref="GO:0016887"
>> > FT                   /db_xref="GO:0030286"
>> > FT                   /db_xref="GO:0030286"
>> > FT                   /db_xref="GO:0042623"
>> > FT                   /db_xref="flybase_translation_id:FBpp0289299"
>> > FT                   /db_xref="goslim_goa:GO:0003674"
>> > FT                   /db_xref="goslim_goa:GO:0005575"
>> > FT                   /db_xref="goslim_goa:GO:0005622"
>> > FT                   /db_xref="goslim_goa:GO:0005623"
>> > FT                   /db_xref="goslim_goa:GO:0005856"
>> > FT                   /db_xref="goslim_goa:GO:0008150"
>> > FT                   /db_xref="goslim_goa:GO:0016887"
>> > FT                   /db_xref="goslim_goa:GO:0043226"
>> > FT                   /db_xref="goslim_goa:GO:0043234"
>> > FT                   /db_xref="UniParc:UPI00018FBBAC"
>> > FT
>> /translation="LQGLNAQLDQVDVQQIIRVLRSTHSVYIKQIDELIFESTHELMEA
>> > FT
>> MENIKFLHLLMQPCSQLDFSESPTFVSQLIPRTIHLIRFIWLNSEQYNRRDLITGIFRN
>> > FT
>> LSNQIIRFCTEKVNVEKILSGSSRFGIKICNMCIDCCLTYKGIYDIMSKTHAKINIRIG
>> > FT
>> WSLDNAMIFNHVDAFMERLNDVIDICESMMVFGRLDESESIPKPQFGGTSGTEFEATAD
>> > FT
>> NVENEFLVTLTALCTDSKEIILNVHKNEWYEEVIKYRRTVQSMEETVQRLMSNVFQHIC
>> > FT
>> NVEEALESLNVMIFYSYRSTIRKTFLRQVSSAWVFFSNEIDSSVHMLMDRSKMHESWVP
>> > FT
>> YYASRALGYRVHLDRLVWLCNRLNSSDWLPNVSEASVVLKKFESVRREFDKEVKKSFDE
>> > FT
>> WQKNCCSLLLNQKLDRYLLIRSKKKKGLIECNIDRTILTICEQAQHFERLGLGVPGMVR
>> > FT
>> KIYEKHETLRFVYNSVVQVCLNYNHILSALSEQERKLFRALIQACDRKIAPGVFKLTYG
>> > FT
>> GELSDAYIADCAKHTNKLQETMDIYKRAIQNIARFCEKICDTPMLKFNFSGAVTISIFE
>> > FT
>> NHLSSYLRRVSNILRGFYSTITDLIFAVFKEFQAVIEDMPIEWYGFVNVFDDMLATAFL
>> > FT
>> TSSKNSLNMLTNALHRDPDMAAAPILVMESDVRERCIVLTPDIDVIANLLSGYIDRIHN
>> > FT
>> ILEQFPRIGIKMKLPKEHQYESFSKAFLEDSESTQLICNIEAEINHEREEIDGYITFWN
>> > FT
>> SHRMLWETTELEFTKRVKATQMTADIFEASIEYYSAMADDISYVDAITHVYFILMNQNY
>> > FT
>> IKSSILDCIEKWQALNIKILLSHSFSLIRAIYRYMRKNERKMMMVPRTLKESLLAKQFF
>> > FT
>> ERIINEVPLKQAGFPPTLELFAILDKYQVEIPEEIRVKVIGLEAAWHHYLKRLGEADEM
>> > FT
>> LDNNREEFKKILVQQAEKFKIILKEFLDDFFLKLPTSANINPRIALKFLRIIALKIEDC
>> > FT
>> FTFEESLMRDLAVFNVNQPESIDLRKLDFEVRIVKNIWELIFEWQTNWEGWKKGYFWKM
>> > FT
>> NINEMEDTALNLYKEFTTLNKKFYDRHWEMLEATTKNVDSFRRTLPLITALKNPCMRER
>> > FT
>> HWNRVRDVIHVNFDENSKNFTLELIINLDFQAFSEDIQDISNPATMELQIENSIKNIAT
>> > FT
>> IGKNKVLKCFYHDGIYRIKNVEDCFQLLEEHMVQISAMKATRFVEPFITIVDYWEKTLS
>> > FT
>> YISETLEKGLTVQRQWLYLENIFQGDDIRKQLPEEAKRFATITEEFRTISSKMFQAKTA
>> > FT
>> VKATNLRPPPFLLNRFSRMDERLELIQRALEIYLEAKRQLFPRFYFISNDDLLEILGNS
>> > FT
>> KRPDLVQTHLKKLFDNLYKLELKRVGKTLSRWQASGMHSDDGEYVEFMMVIYIDGPSER
>> > FT
>> WLKQVEEYMLVVMKEMLKLTRGSLKKLVGNREKWISLWPGQMVLTTAQIQWTTECTRSL
>> > FT
>> IHCSMVDQKKPLRKLKKKQIKVLSKLSEMSRKDLTKTMRLKVNTLITLEIHGRDVIERM
>> > FT
>> YKSNCKDTGHFEWFSQLRFYWHRESELCVIRQTNTEHWGCFDEFNRINIEVLSVVAQQI
>> > FT
>> MSIMAALSTKALELMFEGQMIKLKHTVGLFITMNPGYAGRTELPDNLKSMFRPISMMVP
>> > FT
>> DNIIIAENLLFSDGFTNTRNLARKVYTLYELAKQQLSKQYHYDFGLRSMVALLRYAGRK
>> > FT
>> RRQLPNTTEEEIVYLAMKDMNVARLTANDLPLFNGIMSDIFPGVSLPTIDYSEFNIAIY
>> > FT
>> EEFREAGLQPITIAVKKVIELFETKNSRHSVMIIGDTGTAKSVTWRTLQNCFYRMNSQR
>> > FT
>> FSGWEAVTVYPVNPKALNLAELYGEYNLSTGEWLDGVLSSIMRIICGDEEPTQKWLLFD
>> > FT
>> GPVDAVWIENMNSVMDDNKLLTLVNSERITMPVQVSLLFEVGDLAVASPATVSRCGMVY
>> > FT
>> NDYNDWGWKPFVNSWLQRLRIKEFADFLRIHFDYMVPKILDFKRMRCKEPVRTNELNGV
>> > FT
>> VSLCKLLEIFGTKVNGINPINLELLEEMTRLWFMFCLVWSICSSVDEDSRQRLDSFIRE
>> > FT
>> LESCFPIKDTVFDYFVDPNERTFLPWDSKLLSSWKCDFESPFYKIIVPTGDTVRYEYVV
>> > FT
>> SKLLAEEYPVMLVGNVGTGKTSTAISVMEACDKNKFCILAVNMSAQTTAAGLQESIENR
>> > FT
>> TEKRTKTQFVPIGGKRMICFMDDFNMPAKDIYGSQPPLELIRQWIDYKYWFNRKTQQKI
>> > FT
>> YVQNTLLMAAMGPPGGGRQTISSRTQSRFVLLNLTFPSQETIIRIFGTMLCQKLESYPN
>> > FT
>> EVREMWLPITLCTINLYVSMISKMLPTPNKSHYLFNLRDISKVFQGLLRSEKELQNKKN
>> > FT
>> FFLRLWVHECFRVFSDRLVDDSDQFWFVNTINDILGKHFEVTFHSLCPSKVPPFFGDFA
>> > FT
>> HPQGFYEDLQVDFLRTFMKNQLEEYNNFPGMTRMNLVFFREAIEHIVRILRVISQPRGH
>> > FT
>> ILNMGIGGSGRQVLTKLAAFILEMAVFQIEVTKKYKTGDFREDLKNLYKVTGIKQRLTI
>> > FT
>> FIFSSDQIAEVSFLEITNNMLSTGEINLFKSDEFDELKPELERPAKKNGVLLTTEALYS
>> > FT
>> YFILNVRDFLHVALCFSPIGENFRSYIRQYPALLSSTTPNWFRFWPQEALLEVASHFLI
>> > FT
>> GFPLNVVVSGKEDEKHRESLVISTEAILQRDIAYVFSVIHSSVAKMSENMYAEVKRYNY
>> > FT
>> VTSPNYLQLVSGFKKLLEKKRLEVSTASNRLRNGLSKISETQEKVSLMSEELKASSEQV
>> > FT
>> KILARECEDFISMIEIQKSEATEQKEKVDAEAVLIRRDEIICLELAATARADLEVVMPM
>> > FT
>> IDAAVKALDALNKKDISEVKSYGRPPMKIEKVMEAVLILLGKEPTWENAKKVLSESTFL
>> > FT
>> NDLKNFDRDHISDKTLKRIAIYTKNPELEPDKVAVVSLACKSLMQWIMAIENYGKVYRI
>> > FT
>> VAPKQEKLDSAMKSLEEKQAALAAAKKKLEELQVVIEELYRQLEEKTNLLNELRAKEER
>> > FT
>> LRKQLERAIILVESLSGERERWIETVNQLDLSFEKLPGDCLLSVAFMSYLGAFDTKYRE
>> > FT
>> ELLVKWSLLIKDLLIPATLELKVTYFLVDAVSIREWNIQGLPADDLSTENGVIVTQGSR
>> > FT
>> WPLIIDPQMQANNWIKNMEERNQLMTLDFGMADYLRQLERALKEGLPVLLQNVGEYLDQ
>> > FT
>> AINPILRQSFTIQSGERLLKFNDKYISYNNSFRFYITTKISNPHYPPEISSKTTIVNFA
>> > FT
>> LKQDGLEAQLLGIIVRKEKPALEEQKDELVMTIARNKRTLIDLDNEILRLLNESRGSLL
>> > FT
>> DDDELFSTLQKSRQTSVLVKESLSIAEVTEVEIDAARQEYKPASERASILFFVLMDMSK
>> > FT
>> IDPMYVFSLAAYILLFTQSIERSPRNQLVHERIQNINEYHSYAVYRNTCRGLFERHKLL
>> > FT
>> FSIHMTAKILSNAGKLLEEEYDFILKGGIVLDKLGQAPNPAPWWISEQNWDNITELDKV
>> > FT
>> SGFHGIIDSFEQHYKAWNGWYATTFPEQEDLVGEWNDKLTDFQKICVLRSLRPDRISFC
>> > FT
>> LTQFIITKLGPRYVDPPVLDLKATFDESISQTPLIFVLSPGVDPAQSLISLSESVKMAQ
>> > FT
>> RMYSLSLGQGQAPIATKLIMDGIKDGNWVFLANCHLSLSWMPTLDKMIATMQSMKLHKK
>> > FT
>> FRLWLSSSPHPDFPISILQTSIKMTTEPPRGIKSNMKRLYNNINEANMENCSEPSKYKK
>> > FT
>> LLFALCFFHTVLLERKKFLELGWNVIYSFNDSDFEVSEILLLLYLNEYEDTPWGALKYL
>> > FT
>> IAGVNYGGHITDDWDRRLLITYINQFFCDQALQTRKFRLSTLPNYFIPDDGDVQSYLDQ
>> > FT
>> IQMFPNFDKPDAFGQHSNADIASLIGETRMLFEALLSMQVQTNSTSSNENGETKVFDLA
>> > FT
>> KEILMNTPDEINYEQTAKIIGINRTPLEVVLLQEIERYNKLLVDMSTQLRDLRRGIQGL
>> > FT
>> VVMSSDLEDIYLAVSEGRVPLQWLKAYNSLKPLAAWARDLIHRVGHFNSWAKTLRPPIL
>> > FT
>> FWLAAYTFPTGFVTAVLQTSARATKTPIDELSWDFYVFVEEDTAAARIIREGGGVYIRS
>> > FT
>> LFLEGGGWLRKNQCLQDPLPMELICPLPVIHFKPVENLKKRCRGVYQCPAYYYPVRSGS
>> > FT                   FVIAVDLKSGNEKADYWIKRGTALLLSLAN"
>> >
>> >
>> > Which gives:
>> >
>> >> FBtr0300022__FBgn0001313
>> >
>> LQGLNAQLDQVDVQQIIRVLRSTHSVYIKQIDELIFESTHELMEAMENIKFLHLLMQPCSQLDFSESPTFVSQLIPRTIHLIRFIWLNSEQYNRRDLITGIFRNLSNQIIRFCTEKVNVEKILSGSSRFGIKICNMCIDCCLTYKGIYDIMSKTHAKINIRIGWSLDNAMIFNHVDAFMERLNDVIDICESMMVFGRLDESESIPKPQFGGTSGTEFEATADNVENEFLVTLTALCTDSKEIILNVHKNEWYEEVIKYRRTVQSMEETVQRLMSNVFQHICNVEEALESLNVMIFYSYRSTIRKTFLRQVSSAWVFFSNEIDSSVHMLMDRSKMHESWVPYYASRALGYRVHLDRLVWLCNRLNSSDWLPNVSEASVVLKKFESVRREFDKEVKKSFDEWQKNCCSLLLNQKLDRYLLIRSKKKKGLIECNIDRTILTICEQAQHFERLGLGVPGMVRKIYEKHETLRFVYNSVVQVCLNYNHILSALSEQERKLFRALIQACDRKIAPGVFKLTYGGELSDAYIADCAKHTNKLQETMDIYKRAIQNIARFCEKICDTPMLKFNFSGAVTISIFENHLSSYLRRVSNILRGFYSTITDLIFAVFKEFQAVIEDMPIEWYGFVNVFDDMLATAFLTSSKNSLNMLTNALHRDPDMAAAPILVMESDVRERCIVLTPDIDVIANLLSGYIDRIHNILEQFPRIGIKMKLPKEHQYESFSKAFLEDSESTQLICNIEAEINHEREEIDGYITFWNSHRMLWETTELEFTKRVKATQMTADIFEASIEYYSAMADDISYVDAITHVYFILMNQNYIKSSILDCIEKWQALNIKILLSHSFSLIRAIYRYMRKNERKMMMVPRTLKESLLAKQFFERIINEVPLKQAGFPPTLELFAILDKYQVEIPEEIRVKVIGLEAAWHHYLKRLGEADEMLDNNREEFKKILVQQAEKFKIILKEFLDDFFLKLPTSANINPRIALKFLRIIALKIEDC!
>> >
>> FTFEESLMRDLAVFNVNQPESIDLRKLDFEVRIVKNIWELIFEWQTNWEGWKKGYFWKMNINEMEDTALNLYKEFTTLNKKFYDRHWEMLEATTKNVDSFRRTLPLITALKNPCMRERHWNRVRDVIHVNFDENSKNFTLELIINLDFQAFSEDIQDISNPATMELQIENSIKNIATIGKNKVLKWLLS*WYL*NKKR*GLFSAP*RTHGTNIGYESNSFC*AIYNHC*LLGKNTIVHK*DSGKGFNCSAPMALPRKYIPRRRHKKTTSRRGKTFCNNN*RVSNNIKQNVPGKDSRKSH*LTPSTVFIKPF*SNGRKTGTYSTCLRNLS*G*TTTFSKILFYF***PFRNFRKF*AAGLSSNPP*EVI**FIQA*AQARWENFKSVASFWNAFRRWRIC*VHDGYLYRWTIGALAKTSRRVHACCYERDA*TYSRIS*KTCREQRKMDFALARTNGANHSSDPMDN*VYA*PNSL*YG*SKKTPTQAKEKANKSSF*IIRNESKRPNKNNAP*SKYPHNA*NTWS*CYRKNV*IKL*GYGPF*MVFTTQILLAP*IGTMCNKADKHRALGMFR*V*SNKY*SALSRGTTNNVYNGSAFYKGVGAYVRGSNDKVKAHSWSIHYYESWICRTD*TS**FKVNV*THINDGT**YNYCGKFTFFGWFY*YKKLGPKGIYVV*AG*AATFKAISL*FWSSLNGGFASLRGSKKTSITKYY*RRNCLFGNERYECCEINS**FTPF*WYYV*HISWC*LTNYRLQ*I*YCDL*RI*GGGSPTNYHSRKKSN*AF*NKKL*ALSYDHRGYGNSQISYMENITKLFLSNE*SKIFRMGSSHRLPSKSKSIESSRALWGIQLVDW*MA*RSFKFYYANNLWR*RADSEMVVV*WTCGCSMD*KHELSNG***TSYACK*RTYNHASSSIAIV*SRRPGCCFTSNCFPMWNGL*RLQ*LGMETFCKLMVTAPKN*GVR*FFTNTF*LHGAKNTGF*T!
>> > NEVQRACKDK*VKWSCVAL*IARNIWHKGKWDKSH*FRTS*GDD*IVVYVLFSMVNLFKCG*RQSPKTR*
>> >
>> LYTGTRKLLSNKRYCV*LFCGSQ*TNLFTMG*QAVEQLEMRFRISFLQDYCSYWRHCSL*ICCFKTSC*RISCDACWKCWYRKNVNGYKCNGGL**K*ILHFSCEHVSTDNSSRVTRINRKSD*ETYKNAICTYRWQTDDMFYGRL*YACKRHLWISATFGAYSAMDRLQVLV**KNSTKNICAKHIINGCDGTAWRGQTNNFQSNSKSVCFIKLNFSFTRNNYSHIWNDALSKTRVIPK*SS*DVATYNPLYH*LICIND**NVTDAK*ISLLI*S*RYIQSLSRTIKK*KRTSKQKKFFFTALGS*VF*SVQRPIG*RLRSVLVCKYY**YTW*TF*SYFSQSLSFKGSTIFR*LCSPSRVLRRSTGRFLKNIYEKST*GI*QLSRND*NEPSIF*RSYRTYCSNPESYFPTAWTHFKYGDRWIRPTSINQVSCVYFGNGSFPN*GYQKIQNRRLSRRPKKLIQSNWN*TETNDFYI*QRPNSRSLISRNNKQYAKYWRNKLI*IR*IRRAKA*T*TPGKKKWGSANN*STIFLFYFKCARLPACCALF*PNRRKFSKLYKTISGFVKFNNSKLV*ILATRSPFGSSFAFSNRISIKRSGFWKRGRKTSRKFGYKHRSHSSTRYCLCIFSNSLKCC*NVGKYVCRS*AL*LCNLTKLFAACKWF*KTIRKEKIRSINCFQ*ITQWAFKNF*NSGKSILNVRRA*S*L*TS*NTC*RM*RFYIHD*NSKE*SNGTKGKSGCRSRAY*KG*NNLS*ISSYSSCGLGGGNAYDRCCCKSIRCIE*ERHFRS*IIWTAANENRKGYGSCIDLTWKRTNMGKC*KSFK*INIFERPKKL**RSYFR*NS*TYCNLYKKS*VRAR*SGCCIACVQIIDAMDNGHRKLRKSLPNSRSKAGKIR*CNEVT*RKASCFSCGKKKTRRASGCH*RTLPAA*RKN*PS**IACQGRTT*KATGACHYFGRIAFWRERKVD*NG*SVGLIL*KTSR*LL!
>> >
>> AFCCVYVVLRGF*HQIPRRITCKMVFIN*RSFNTSNFGA*GYVFSSRCCFDSRMEYSRSTC**FKY*KRSNSYSR*SLASYY*PSNAS**LDKKYGRA*SINDTRFRYGRLLTSARTSSKRRFACIVAKRGGILRSSY*SNFAAELYHSKWRKVIKI**QVYFIQ*FVQILHNDKNIKSTLPTGNLIKNYYCKFCTKARWA*SPTTRNYCSKRKTRPRRTKRRTGNDNSSKQTDINRSR**DSTAT**KSRFLIR*R*VIFNFTKIPSDISAS*GVA*HCRGN*SRN*CGPTRIQTSIGTRIHFILCFNGYV*N*SNVCFFSGSVYIIIHTVY*AKSS*SASP*KNSKY**ISFLCGLPKYLSWAFRAT*ATIFNSYDSKDSFKRWKAFGRRV*FYSERRYSIR*TGTSAQPGTMVDK*AKLG*YNRIR*SFWISWDNRFF*ATLQGLEWLVCHDLPRTRRSRWRME**TYRFSKNLCFTFTSTG*NFFLFDTIYYYQTWASIC*SASS*SQGNF**IDFTDSPHIRIITRCRSSPISHITIRIS*NGTTNVLT*LGSRTSTYCNKAYNGWHQGW*LGIFSKLSFVS*LDAYS*QDDSHYAVHETT*KISTVAKLKPSSGLSNIYFANQY*DDN*TSSWNQIKYETSI*QHK*G*YGKL**TQQV*EVIIRFVLLSYSPTRTKKIFRTWLECYLQL*RF*F*SFRNTTIIVS**I*RHSLGSFKVSHSRSKLRRTHYRRLGSPTINNLYKPIFL*PSIAD*KV*IINPSKLFYSR*RRCAIIFRPNTNVSQF**A*CFWTTFKCRYSVINRRNKNAF*GSAFYASPD**HK***KR*DKSI*SR*RNFNEYTG*DKL*TDGKNYWNQSNSLRSCLTSRN*AL**TSR*HVHSIT*LKTWNTGTCCNEFGLRGYLSSCL*RKGAITMVKSI*FIETISGMG*RLNTSCRTF**LGENTPPSNIILACSLHVSNWICYSSTTNFSSSYQN!
>> > TN**TLLGFLCFC*RRYCRSSYNKGRRRRLHSKFVFGGWRMVEEKPMPSGSTTDGTNLSITSNTL*ASRK
>> > PKKTMSWCLPVSRILLSR*VRIICNSRGLKVW**KG*LLDKARYCTFIKFSKL
>> >
>> >
>> > The start is good, but then it gets bad...
>> > The problem seems the same as this old one:
>> > http://bioperl.org/pipermail/bioperl-l/2004-August/016735.html
>> >
>> > I must be missing something....
>> >
>> > Thanks for your help!
>> >
>> > --
>> > Tristan
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>




More information about the Bioperl-l mailing list