[Bioperl-l] spliced
Tristan Lefebure
tristan.lefebure at gmail.com
Fri Jul 12 15:16:20 UTC 2013
Dear bioperlers,
I am trying to extract CDS sequences from ensembl EMBL files (*.dat),
but I get STOP codons where I suppose I should not get some. I am
using the usual way of extracting coding seq, splice the seq, and then
translate them:
$feat_object->spliced_seq->translate(-codontable_id => $genetcode)->seq
An example with this CDS:
FT CDS join(81211..83052,122064..122745,122800..123184,
FT 135429..135872,138620..139940,140189..141784,
FT 141841..145922,145985..146089,146143..147085,
FT 147145..147638,147696..148243)
FT /gene="FBgn0001313"
FT /protein_id="FBpp0289299"
FT /note="transcript_id=FBtr0300022"
FT /db_xref="flybase_transcript_id:FBtr0300022"
FT /db_xref="RefSeq_peptide:NP_001015505"
FT /db_xref="RefSeq_mRNA:NM_001015505"
FT /db_xref="Uniprot/SPTREMBL:Q281X0"
FT /db_xref="Uniprot/SPTREMBL:Q5LJP0"
FT /db_xref="FlyBaseCGID_transcript:CG17866-RB"
FT /db_xref="FlyBaseCGID_translation:CG17866-PB"
FT /db_xref="FlyBaseName_transcript:FBtr0300022"
FT /db_xref="FlyBaseName_translation:FBpp0289299"
FT /db_xref="GO:0003774"
FT /db_xref="GO:0003777"
FT /db_xref="GO:0003777"
FT /db_xref="GO:0005524"
FT /db_xref="GO:0005524"
FT /db_xref="GO:0005858"
FT /db_xref="GO:0005875"
FT /db_xref="GO:0007018"
FT /db_xref="GO:0007018"
FT /db_xref="GO:0016887"
FT /db_xref="GO:0030286"
FT /db_xref="GO:0030286"
FT /db_xref="GO:0042623"
FT /db_xref="flybase_translation_id:FBpp0289299"
FT /db_xref="goslim_goa:GO:0003674"
FT /db_xref="goslim_goa:GO:0005575"
FT /db_xref="goslim_goa:GO:0005622"
FT /db_xref="goslim_goa:GO:0005623"
FT /db_xref="goslim_goa:GO:0005856"
FT /db_xref="goslim_goa:GO:0008150"
FT /db_xref="goslim_goa:GO:0016887"
FT /db_xref="goslim_goa:GO:0043226"
FT /db_xref="goslim_goa:GO:0043234"
FT /db_xref="UniParc:UPI00018FBBAC"
FT /translation="LQGLNAQLDQVDVQQIIRVLRSTHSVYIKQIDELIFESTHELMEA
FT MENIKFLHLLMQPCSQLDFSESPTFVSQLIPRTIHLIRFIWLNSEQYNRRDLITGIFRN
FT LSNQIIRFCTEKVNVEKILSGSSRFGIKICNMCIDCCLTYKGIYDIMSKTHAKINIRIG
FT WSLDNAMIFNHVDAFMERLNDVIDICESMMVFGRLDESESIPKPQFGGTSGTEFEATAD
FT NVENEFLVTLTALCTDSKEIILNVHKNEWYEEVIKYRRTVQSMEETVQRLMSNVFQHIC
FT NVEEALESLNVMIFYSYRSTIRKTFLRQVSSAWVFFSNEIDSSVHMLMDRSKMHESWVP
FT YYASRALGYRVHLDRLVWLCNRLNSSDWLPNVSEASVVLKKFESVRREFDKEVKKSFDE
FT WQKNCCSLLLNQKLDRYLLIRSKKKKGLIECNIDRTILTICEQAQHFERLGLGVPGMVR
FT KIYEKHETLRFVYNSVVQVCLNYNHILSALSEQERKLFRALIQACDRKIAPGVFKLTYG
FT GELSDAYIADCAKHTNKLQETMDIYKRAIQNIARFCEKICDTPMLKFNFSGAVTISIFE
FT NHLSSYLRRVSNILRGFYSTITDLIFAVFKEFQAVIEDMPIEWYGFVNVFDDMLATAFL
FT TSSKNSLNMLTNALHRDPDMAAAPILVMESDVRERCIVLTPDIDVIANLLSGYIDRIHN
FT ILEQFPRIGIKMKLPKEHQYESFSKAFLEDSESTQLICNIEAEINHEREEIDGYITFWN
FT SHRMLWETTELEFTKRVKATQMTADIFEASIEYYSAMADDISYVDAITHVYFILMNQNY
FT IKSSILDCIEKWQALNIKILLSHSFSLIRAIYRYMRKNERKMMMVPRTLKESLLAKQFF
FT ERIINEVPLKQAGFPPTLELFAILDKYQVEIPEEIRVKVIGLEAAWHHYLKRLGEADEM
FT LDNNREEFKKILVQQAEKFKIILKEFLDDFFLKLPTSANINPRIALKFLRIIALKIEDC
FT FTFEESLMRDLAVFNVNQPESIDLRKLDFEVRIVKNIWELIFEWQTNWEGWKKGYFWKM
FT NINEMEDTALNLYKEFTTLNKKFYDRHWEMLEATTKNVDSFRRTLPLITALKNPCMRER
FT HWNRVRDVIHVNFDENSKNFTLELIINLDFQAFSEDIQDISNPATMELQIENSIKNIAT
FT IGKNKVLKCFYHDGIYRIKNVEDCFQLLEEHMVQISAMKATRFVEPFITIVDYWEKTLS
FT YISETLEKGLTVQRQWLYLENIFQGDDIRKQLPEEAKRFATITEEFRTISSKMFQAKTA
FT VKATNLRPPPFLLNRFSRMDERLELIQRALEIYLEAKRQLFPRFYFISNDDLLEILGNS
FT KRPDLVQTHLKKLFDNLYKLELKRVGKTLSRWQASGMHSDDGEYVEFMMVIYIDGPSER
FT WLKQVEEYMLVVMKEMLKLTRGSLKKLVGNREKWISLWPGQMVLTTAQIQWTTECTRSL
FT IHCSMVDQKKPLRKLKKKQIKVLSKLSEMSRKDLTKTMRLKVNTLITLEIHGRDVIERM
FT YKSNCKDTGHFEWFSQLRFYWHRESELCVIRQTNTEHWGCFDEFNRINIEVLSVVAQQI
FT MSIMAALSTKALELMFEGQMIKLKHTVGLFITMNPGYAGRTELPDNLKSMFRPISMMVP
FT DNIIIAENLLFSDGFTNTRNLARKVYTLYELAKQQLSKQYHYDFGLRSMVALLRYAGRK
FT RRQLPNTTEEEIVYLAMKDMNVARLTANDLPLFNGIMSDIFPGVSLPTIDYSEFNIAIY
FT EEFREAGLQPITIAVKKVIELFETKNSRHSVMIIGDTGTAKSVTWRTLQNCFYRMNSQR
FT FSGWEAVTVYPVNPKALNLAELYGEYNLSTGEWLDGVLSSIMRIICGDEEPTQKWLLFD
FT GPVDAVWIENMNSVMDDNKLLTLVNSERITMPVQVSLLFEVGDLAVASPATVSRCGMVY
FT NDYNDWGWKPFVNSWLQRLRIKEFADFLRIHFDYMVPKILDFKRMRCKEPVRTNELNGV
FT VSLCKLLEIFGTKVNGINPINLELLEEMTRLWFMFCLVWSICSSVDEDSRQRLDSFIRE
FT LESCFPIKDTVFDYFVDPNERTFLPWDSKLLSSWKCDFESPFYKIIVPTGDTVRYEYVV
FT SKLLAEEYPVMLVGNVGTGKTSTAISVMEACDKNKFCILAVNMSAQTTAAGLQESIENR
FT TEKRTKTQFVPIGGKRMICFMDDFNMPAKDIYGSQPPLELIRQWIDYKYWFNRKTQQKI
FT YVQNTLLMAAMGPPGGGRQTISSRTQSRFVLLNLTFPSQETIIRIFGTMLCQKLESYPN
FT EVREMWLPITLCTINLYVSMISKMLPTPNKSHYLFNLRDISKVFQGLLRSEKELQNKKN
FT FFLRLWVHECFRVFSDRLVDDSDQFWFVNTINDILGKHFEVTFHSLCPSKVPPFFGDFA
FT HPQGFYEDLQVDFLRTFMKNQLEEYNNFPGMTRMNLVFFREAIEHIVRILRVISQPRGH
FT ILNMGIGGSGRQVLTKLAAFILEMAVFQIEVTKKYKTGDFREDLKNLYKVTGIKQRLTI
FT FIFSSDQIAEVSFLEITNNMLSTGEINLFKSDEFDELKPELERPAKKNGVLLTTEALYS
FT YFILNVRDFLHVALCFSPIGENFRSYIRQYPALLSSTTPNWFRFWPQEALLEVASHFLI
FT GFPLNVVVSGKEDEKHRESLVISTEAILQRDIAYVFSVIHSSVAKMSENMYAEVKRYNY
FT VTSPNYLQLVSGFKKLLEKKRLEVSTASNRLRNGLSKISETQEKVSLMSEELKASSEQV
FT KILARECEDFISMIEIQKSEATEQKEKVDAEAVLIRRDEIICLELAATARADLEVVMPM
FT IDAAVKALDALNKKDISEVKSYGRPPMKIEKVMEAVLILLGKEPTWENAKKVLSESTFL
FT NDLKNFDRDHISDKTLKRIAIYTKNPELEPDKVAVVSLACKSLMQWIMAIENYGKVYRI
FT VAPKQEKLDSAMKSLEEKQAALAAAKKKLEELQVVIEELYRQLEEKTNLLNELRAKEER
FT LRKQLERAIILVESLSGERERWIETVNQLDLSFEKLPGDCLLSVAFMSYLGAFDTKYRE
FT ELLVKWSLLIKDLLIPATLELKVTYFLVDAVSIREWNIQGLPADDLSTENGVIVTQGSR
FT WPLIIDPQMQANNWIKNMEERNQLMTLDFGMADYLRQLERALKEGLPVLLQNVGEYLDQ
FT AINPILRQSFTIQSGERLLKFNDKYISYNNSFRFYITTKISNPHYPPEISSKTTIVNFA
FT LKQDGLEAQLLGIIVRKEKPALEEQKDELVMTIARNKRTLIDLDNEILRLLNESRGSLL
FT DDDELFSTLQKSRQTSVLVKESLSIAEVTEVEIDAARQEYKPASERASILFFVLMDMSK
FT IDPMYVFSLAAYILLFTQSIERSPRNQLVHERIQNINEYHSYAVYRNTCRGLFERHKLL
FT FSIHMTAKILSNAGKLLEEEYDFILKGGIVLDKLGQAPNPAPWWISEQNWDNITELDKV
FT SGFHGIIDSFEQHYKAWNGWYATTFPEQEDLVGEWNDKLTDFQKICVLRSLRPDRISFC
FT LTQFIITKLGPRYVDPPVLDLKATFDESISQTPLIFVLSPGVDPAQSLISLSESVKMAQ
FT RMYSLSLGQGQAPIATKLIMDGIKDGNWVFLANCHLSLSWMPTLDKMIATMQSMKLHKK
FT FRLWLSSSPHPDFPISILQTSIKMTTEPPRGIKSNMKRLYNNINEANMENCSEPSKYKK
FT LLFALCFFHTVLLERKKFLELGWNVIYSFNDSDFEVSEILLLLYLNEYEDTPWGALKYL
FT IAGVNYGGHITDDWDRRLLITYINQFFCDQALQTRKFRLSTLPNYFIPDDGDVQSYLDQ
FT IQMFPNFDKPDAFGQHSNADIASLIGETRMLFEALLSMQVQTNSTSSNENGETKVFDLA
FT KEILMNTPDEINYEQTAKIIGINRTPLEVVLLQEIERYNKLLVDMSTQLRDLRRGIQGL
FT VVMSSDLEDIYLAVSEGRVPLQWLKAYNSLKPLAAWARDLIHRVGHFNSWAKTLRPPIL
FT FWLAAYTFPTGFVTAVLQTSARATKTPIDELSWDFYVFVEEDTAAARIIREGGGVYIRS
FT LFLEGGGWLRKNQCLQDPLPMELICPLPVIHFKPVENLKKRCRGVYQCPAYYYPVRSGS
FT FVIAVDLKSGNEKADYWIKRGTALLLSLAN"
Which gives:
>FBtr0300022__FBgn0001313
LQGLNAQLDQVDVQQIIRVLRSTHSVYIKQIDELIFESTHELMEAMENIKFLHLLMQPCSQLDFSESPTFVSQLIPRTIHLIRFIWLNSEQYNRRDLITGIFRNLSNQIIRFCTEKVNVEKILSGSSRFGIKICNMCIDCCLTYKGIYDIMSKTHAKINIRIGWSLDNAMIFNHVDAFMERLNDVIDICESMMVFGRLDESESIPKPQFGGTSGTEFEATADNVENEFLVTLTALCTDSKEIILNVHKNEWYEEVIKYRRTVQSMEETVQRLMSNVFQHICNVEEALESLNVMIFYSYRSTIRKTFLRQVSSAWVFFSNEIDSSVHMLMDRSKMHESWVPYYASRALGYRVHLDRLVWLCNRLNSSDWLPNVSEASVVLKKFESVRREFDKEVKKSFDEWQKNCCSLLLNQKLDRYLLIRSKKKKGLIECNIDRTILTICEQAQHFERLGLGVPGMVRKIYEKHETLRFVYNSVVQVCLNYNHILSALSEQERKLFRALIQACDRKIAPGVFKLTYGGELSDAYIADCAKHTNKLQETMDIYKRAIQNIARFCEKICDTPMLKFNFSGAVTISIFENHLSSYLRRVSNILRGFYSTITDLIFAVFKEFQAVIEDMPIEWYGFVNVFDDMLATAFLTSSKNSLNMLTNALHRDPDMAAAPILVMESDVRERCIVLTPDIDVIANLLSGYIDRIHNILEQFPRIGIKMKLPKEHQYESFSKAFLEDSESTQLICNIEAEINHEREEIDGYITFWNSHRMLWETTELEFTKRVKATQMTADIFEASIEYYSAMADDISYVDAITHVYFILMNQNYIKSSILDCIEKWQALNIKILLSHSFSLIRAIYRYMRKNERKMMMVPRTLKESLLAKQFFERIINEVPLKQAGFPPTLELFAILDKYQVEIPEEIRVKVIGLEAAWHHYLKRLGEADEMLDNNREEFKKILVQQAEKFKIILKEFLDDFFLKLPTSANINPRIALKFLRIIALKIEDCFTFEESLMRDLAVFNVNQPESIDLRKLDFEVRIVKNIWELIFEWQTNWEGWKKGYFWKMNINEMEDTALNLYKEFTTLNKKFYDRHWEMLEATTKNVDSFRRTLPLITALKNPCMRERHWNRVRDVIHVNFDENSKNFTLELIINLDFQAFSEDIQDISNPATMELQIENSIKNIATIGKNKVLKWLLS*WYL*NKKR*GLFSAP*RTHGTNIGYESNSFC*AIYNHC*LLGKNTIVHK*DSGKGFNCSAPMALPRKYIPRRRHKKTTSRRGKTFCNNN*RVSNNIKQNVPGKDSRKSH*LTPSTVFIKPF*SNGRKTGTYSTCLRNLS*G*TTTFSKILFYF***PFRNFRKF*AAGLSSNPP*EVI**FIQA*AQARWENFKSVASFWNAFRRWRIC*VHDGYLYRWTIGALAKTSRRVHACCYERDA*TYSRIS*KTCREQRKMDFALARTNGANHSSDPMDN*VYA*PNSL*YG*SKKTPTQAKEKANKSSF*IIRNESKRPNKNNAP*SKYPHNA*NTWS*CYRKNV*IKL*GYGPF*MVFTTQILLAP*IGTMCNKADKHRALGMFR*V*SNKY*SALSRGTTNNVYNGSAFYKGVGAYVRGSNDKVKAHSWSIHYYESWICRTD*TS**FKVNV*THINDGT**YNYCGKFTFFGWFY*YKKLGPKGIYVV*AG*AATFKAISL*FWSSLNGGFASLRGSKKTSITKYY*RRNCLFGNERYECCEINS**FTPF*WYYV*HISWC*LTNYRLQ*I*YCDL*RI*GGGSPTNYHSRKKSN*AF*NKKL*ALSYDHRGYGNSQISYMENITKLFLSNE*SKIFRMGSSHRLPSKSKSIESSRALWGIQLVDW*MA*RSFKFYYANNLWR*RADSEMVVV*WTCGCSMD*KHELSNG***TSYACK*RTYNHASSSIAIV*SRRPGCCFTSNCFPMWNGL*RLQ*LGMETFCKLMVTAPKN*GVR*FFTNTF*LHGAKNTGF*TNEVQRACKDK*VKWSCVAL*IARNIWHKGKWDKSH*FRTS*GDD*IVVYVLFSMVNLFKCG*RQSPKTR*LYTGTRKLLSNKRYCV*LFCGSQ*TNLFTMG*QAVEQLEMRFRISFLQDYCSYWRHCSL*ICCFKTSC*RISCDACWKCWYRKNVNGYKCNGGL**K*ILHFSCEHVSTDNSSRVTRINRKSD*ETYKNAICTYRWQTDDMFYGRL*YACKRHLWISATFGAYSAMDRLQVLV**KNSTKNICAKHIINGCDGTAWRGQTNNFQSNSKSVCFIKLNFSFTRNNYSHIWNDALSKTRVIPK*SS*DVATYNPLYH*LICIND**NVTDAK*ISLLI*S*RYIQSLSRTIKK*KRTSKQKKFFFTALGS*VF*SVQRPIG*RLRSVLVCKYY**YTW*TF*SYFSQSLSFKGSTIFR*LCSPSRVLRRSTGRFLKNIYEKST*GI*QLSRND*NEPSIF*RSYRTYCSNPESYFPTAWTHFKYGDRWIRPTSINQVSCVYFGNGSFPN*GYQKIQNRRLSRRPKKLIQSNWN*TETNDFYI*QRPNSRSLISRNNKQYAKYWRNKLI*IR*IRRAKA*T*TPGKKKWGSANN*STIFLFYFKCARLPACCALF*PNRRKFSKLYKTISGFVKFNNSKLV*ILATRSPFGSSFAFSNRISIKRSGFWKRGRKTSRKFGYKHRSHSSTRYCLCIFSNSLKCC*NVGKYVCRS*AL*LCNLTKLFAACKWF*KTIRKEKIRSINCFQ*ITQWAFKNF*NSGKSILNVRRA*S*L*TS*NTC*RM*RFYIHD*NSKE*SNGTKGKSGCRSRAY*KG*NNLS*ISSYSSCGLGGGNAYDRCCCKSIRCIE*ERHFRS*IIWTAANENRKGYGSCIDLTWKRTNMGKC*KSFK*INIFERPKKL**RSYFR*NS*TYCNLYKKS*VRAR*SGCCIACVQIIDAMDNGHRKLRKSLPNSRSKAGKIR*CNEVT*RKASCFSCGKKKTRRASGCH*RTLPAA*RKN*PS**IACQGRTT*KATGACHYFGRIAFWRERKVD*NG*SVGLIL*KTSR*LLAFCCVYVVLRGF*HQIPRRITCKMVFIN*RSFNTSNFGA*GYVFSSRCCFDSRMEYSRSTC**FKY*KRSNSYSR*SLASYY*PSNAS**LDKKYGRA*SINDTRFRYGRLLTSARTSSKRRFACIVAKRGGILRSSY*SNFAAELYHSKWRKVIKI**QVYFIQ*FVQILHNDKNIKSTLPTGNLIKNYYCKFCTKARWA*SPTTRNYCSKRKTRPRRTKRRTGNDNSSKQTDINRSR**DSTAT**KSRFLIR*R*VIFNFTKIPSDISAS*GVA*HCRGN*SRN*CGPTRIQTSIGTRIHFILCFNGYV*N*SNVCFFSGSVYIIIHTVY*AKSS*SASP*KNSKY**ISFLCGLPKYLSWAFRAT*ATIFNSYDSKDSFKRWKAFGRRV*FYSERRYSIR*TGTSAQPGTMVDK*AKLG*YNRIR*SFWISWDNRFF*ATLQGLEWLVCHDLPRTRRSRWRME**TYRFSKNLCFTFTSTG*NFFLFDTIYYYQTWASIC*SASS*SQGNF**IDFTDSPHIRIITRCRSSPISHITIRIS*NGTTNVLT*LGSRTSTYCNKAYNGWHQGW*LGIFSKLSFVS*LDAYS*QDDSHYAVHETT*KISTVAKLKPSSGLSNIYFANQY*DDN*TSSWNQIKYETSI*QHK*G*YGKL**TQQV*EVIIRFVLLSYSPTRTKKIFRTWLECYLQL*RF*F*SFRNTTIIVS**I*RHSLGSFKVSHSRSKLRRTHYRRLGSPTINNLYKPIFL*PSIAD*KV*IINPSKLFYSR*RRCAIIFRPNTNVSQF**A*CFWTTFKCRYSVINRRNKNAF*GSAFYASPD**HK***KR*DKSI*SR*RNFNEYTG*DKL*TDGKNYWNQSNSLRSCLTSRN*AL**TSR*HVHSIT*LKTWNTGTCCNEFGLRGYLSSCL*RKGAITMVKSI*FIETISGMG*RLNTSCRTF**LGENTPPSNIILACSLHVSNWICYSSTTNFSSSYQNTN**TLLGFLCFC*RRYCRSSYNKGRRRRLHSKFVFGGWRMVEEKPMPSGSTTDGTNLSITSNTL*ASRKPKKTMSWCLPVSRILLSR*VRIICNSRGLKVW**KG*LLDKARYCTFIKFSKL
The start is good, but then it gets bad...
The problem seems the same as this old one:
http://bioperl.org/pipermail/bioperl-l/2004-August/016735.html
I must be missing something....
Thanks for your help!
--
Tristan
More information about the Bioperl-l
mailing list