[Bioperl-l] spliced

Tristan Lefebure tristan.lefebure at gmail.com
Fri Jul 12 15:16:20 UTC 2013


Dear bioperlers,

I am trying to extract CDS sequences from ensembl EMBL files (*.dat),
but I get STOP codons where I suppose I should not get some. I am
using the usual way of extracting coding seq, splice the seq, and then
translate them:

$feat_object->spliced_seq->translate(-codontable_id => $genetcode)->seq

An example with this CDS:

FT   CDS             join(81211..83052,122064..122745,122800..123184,
FT                   135429..135872,138620..139940,140189..141784,
FT                   141841..145922,145985..146089,146143..147085,
FT                   147145..147638,147696..148243)
FT                   /gene="FBgn0001313"
FT                   /protein_id="FBpp0289299"
FT                   /note="transcript_id=FBtr0300022"
FT                   /db_xref="flybase_transcript_id:FBtr0300022"
FT                   /db_xref="RefSeq_peptide:NP_001015505"
FT                   /db_xref="RefSeq_mRNA:NM_001015505"
FT                   /db_xref="Uniprot/SPTREMBL:Q281X0"
FT                   /db_xref="Uniprot/SPTREMBL:Q5LJP0"
FT                   /db_xref="FlyBaseCGID_transcript:CG17866-RB"
FT                   /db_xref="FlyBaseCGID_translation:CG17866-PB"
FT                   /db_xref="FlyBaseName_transcript:FBtr0300022"
FT                   /db_xref="FlyBaseName_translation:FBpp0289299"
FT                   /db_xref="GO:0003774"
FT                   /db_xref="GO:0003777"
FT                   /db_xref="GO:0003777"
FT                   /db_xref="GO:0005524"
FT                   /db_xref="GO:0005524"
FT                   /db_xref="GO:0005858"
FT                   /db_xref="GO:0005875"
FT                   /db_xref="GO:0007018"
FT                   /db_xref="GO:0007018"
FT                   /db_xref="GO:0016887"
FT                   /db_xref="GO:0030286"
FT                   /db_xref="GO:0030286"
FT                   /db_xref="GO:0042623"
FT                   /db_xref="flybase_translation_id:FBpp0289299"
FT                   /db_xref="goslim_goa:GO:0003674"
FT                   /db_xref="goslim_goa:GO:0005575"
FT                   /db_xref="goslim_goa:GO:0005622"
FT                   /db_xref="goslim_goa:GO:0005623"
FT                   /db_xref="goslim_goa:GO:0005856"
FT                   /db_xref="goslim_goa:GO:0008150"
FT                   /db_xref="goslim_goa:GO:0016887"
FT                   /db_xref="goslim_goa:GO:0043226"
FT                   /db_xref="goslim_goa:GO:0043234"
FT                   /db_xref="UniParc:UPI00018FBBAC"
FT                   /translation="LQGLNAQLDQVDVQQIIRVLRSTHSVYIKQIDELIFESTHELMEA
FT                   MENIKFLHLLMQPCSQLDFSESPTFVSQLIPRTIHLIRFIWLNSEQYNRRDLITGIFRN
FT                   LSNQIIRFCTEKVNVEKILSGSSRFGIKICNMCIDCCLTYKGIYDIMSKTHAKINIRIG
FT                   WSLDNAMIFNHVDAFMERLNDVIDICESMMVFGRLDESESIPKPQFGGTSGTEFEATAD
FT                   NVENEFLVTLTALCTDSKEIILNVHKNEWYEEVIKYRRTVQSMEETVQRLMSNVFQHIC
FT                   NVEEALESLNVMIFYSYRSTIRKTFLRQVSSAWVFFSNEIDSSVHMLMDRSKMHESWVP
FT                   YYASRALGYRVHLDRLVWLCNRLNSSDWLPNVSEASVVLKKFESVRREFDKEVKKSFDE
FT                   WQKNCCSLLLNQKLDRYLLIRSKKKKGLIECNIDRTILTICEQAQHFERLGLGVPGMVR
FT                   KIYEKHETLRFVYNSVVQVCLNYNHILSALSEQERKLFRALIQACDRKIAPGVFKLTYG
FT                   GELSDAYIADCAKHTNKLQETMDIYKRAIQNIARFCEKICDTPMLKFNFSGAVTISIFE
FT                   NHLSSYLRRVSNILRGFYSTITDLIFAVFKEFQAVIEDMPIEWYGFVNVFDDMLATAFL
FT                   TSSKNSLNMLTNALHRDPDMAAAPILVMESDVRERCIVLTPDIDVIANLLSGYIDRIHN
FT                   ILEQFPRIGIKMKLPKEHQYESFSKAFLEDSESTQLICNIEAEINHEREEIDGYITFWN
FT                   SHRMLWETTELEFTKRVKATQMTADIFEASIEYYSAMADDISYVDAITHVYFILMNQNY
FT                   IKSSILDCIEKWQALNIKILLSHSFSLIRAIYRYMRKNERKMMMVPRTLKESLLAKQFF
FT                   ERIINEVPLKQAGFPPTLELFAILDKYQVEIPEEIRVKVIGLEAAWHHYLKRLGEADEM
FT                   LDNNREEFKKILVQQAEKFKIILKEFLDDFFLKLPTSANINPRIALKFLRIIALKIEDC
FT                   FTFEESLMRDLAVFNVNQPESIDLRKLDFEVRIVKNIWELIFEWQTNWEGWKKGYFWKM
FT                   NINEMEDTALNLYKEFTTLNKKFYDRHWEMLEATTKNVDSFRRTLPLITALKNPCMRER
FT                   HWNRVRDVIHVNFDENSKNFTLELIINLDFQAFSEDIQDISNPATMELQIENSIKNIAT
FT                   IGKNKVLKCFYHDGIYRIKNVEDCFQLLEEHMVQISAMKATRFVEPFITIVDYWEKTLS
FT                   YISETLEKGLTVQRQWLYLENIFQGDDIRKQLPEEAKRFATITEEFRTISSKMFQAKTA
FT                   VKATNLRPPPFLLNRFSRMDERLELIQRALEIYLEAKRQLFPRFYFISNDDLLEILGNS
FT                   KRPDLVQTHLKKLFDNLYKLELKRVGKTLSRWQASGMHSDDGEYVEFMMVIYIDGPSER
FT                   WLKQVEEYMLVVMKEMLKLTRGSLKKLVGNREKWISLWPGQMVLTTAQIQWTTECTRSL
FT                   IHCSMVDQKKPLRKLKKKQIKVLSKLSEMSRKDLTKTMRLKVNTLITLEIHGRDVIERM
FT                   YKSNCKDTGHFEWFSQLRFYWHRESELCVIRQTNTEHWGCFDEFNRINIEVLSVVAQQI
FT                   MSIMAALSTKALELMFEGQMIKLKHTVGLFITMNPGYAGRTELPDNLKSMFRPISMMVP
FT                   DNIIIAENLLFSDGFTNTRNLARKVYTLYELAKQQLSKQYHYDFGLRSMVALLRYAGRK
FT                   RRQLPNTTEEEIVYLAMKDMNVARLTANDLPLFNGIMSDIFPGVSLPTIDYSEFNIAIY
FT                   EEFREAGLQPITIAVKKVIELFETKNSRHSVMIIGDTGTAKSVTWRTLQNCFYRMNSQR
FT                   FSGWEAVTVYPVNPKALNLAELYGEYNLSTGEWLDGVLSSIMRIICGDEEPTQKWLLFD
FT                   GPVDAVWIENMNSVMDDNKLLTLVNSERITMPVQVSLLFEVGDLAVASPATVSRCGMVY
FT                   NDYNDWGWKPFVNSWLQRLRIKEFADFLRIHFDYMVPKILDFKRMRCKEPVRTNELNGV
FT                   VSLCKLLEIFGTKVNGINPINLELLEEMTRLWFMFCLVWSICSSVDEDSRQRLDSFIRE
FT                   LESCFPIKDTVFDYFVDPNERTFLPWDSKLLSSWKCDFESPFYKIIVPTGDTVRYEYVV
FT                   SKLLAEEYPVMLVGNVGTGKTSTAISVMEACDKNKFCILAVNMSAQTTAAGLQESIENR
FT                   TEKRTKTQFVPIGGKRMICFMDDFNMPAKDIYGSQPPLELIRQWIDYKYWFNRKTQQKI
FT                   YVQNTLLMAAMGPPGGGRQTISSRTQSRFVLLNLTFPSQETIIRIFGTMLCQKLESYPN
FT                   EVREMWLPITLCTINLYVSMISKMLPTPNKSHYLFNLRDISKVFQGLLRSEKELQNKKN
FT                   FFLRLWVHECFRVFSDRLVDDSDQFWFVNTINDILGKHFEVTFHSLCPSKVPPFFGDFA
FT                   HPQGFYEDLQVDFLRTFMKNQLEEYNNFPGMTRMNLVFFREAIEHIVRILRVISQPRGH
FT                   ILNMGIGGSGRQVLTKLAAFILEMAVFQIEVTKKYKTGDFREDLKNLYKVTGIKQRLTI
FT                   FIFSSDQIAEVSFLEITNNMLSTGEINLFKSDEFDELKPELERPAKKNGVLLTTEALYS
FT                   YFILNVRDFLHVALCFSPIGENFRSYIRQYPALLSSTTPNWFRFWPQEALLEVASHFLI
FT                   GFPLNVVVSGKEDEKHRESLVISTEAILQRDIAYVFSVIHSSVAKMSENMYAEVKRYNY
FT                   VTSPNYLQLVSGFKKLLEKKRLEVSTASNRLRNGLSKISETQEKVSLMSEELKASSEQV
FT                   KILARECEDFISMIEIQKSEATEQKEKVDAEAVLIRRDEIICLELAATARADLEVVMPM
FT                   IDAAVKALDALNKKDISEVKSYGRPPMKIEKVMEAVLILLGKEPTWENAKKVLSESTFL
FT                   NDLKNFDRDHISDKTLKRIAIYTKNPELEPDKVAVVSLACKSLMQWIMAIENYGKVYRI
FT                   VAPKQEKLDSAMKSLEEKQAALAAAKKKLEELQVVIEELYRQLEEKTNLLNELRAKEER
FT                   LRKQLERAIILVESLSGERERWIETVNQLDLSFEKLPGDCLLSVAFMSYLGAFDTKYRE
FT                   ELLVKWSLLIKDLLIPATLELKVTYFLVDAVSIREWNIQGLPADDLSTENGVIVTQGSR
FT                   WPLIIDPQMQANNWIKNMEERNQLMTLDFGMADYLRQLERALKEGLPVLLQNVGEYLDQ
FT                   AINPILRQSFTIQSGERLLKFNDKYISYNNSFRFYITTKISNPHYPPEISSKTTIVNFA
FT                   LKQDGLEAQLLGIIVRKEKPALEEQKDELVMTIARNKRTLIDLDNEILRLLNESRGSLL
FT                   DDDELFSTLQKSRQTSVLVKESLSIAEVTEVEIDAARQEYKPASERASILFFVLMDMSK
FT                   IDPMYVFSLAAYILLFTQSIERSPRNQLVHERIQNINEYHSYAVYRNTCRGLFERHKLL
FT                   FSIHMTAKILSNAGKLLEEEYDFILKGGIVLDKLGQAPNPAPWWISEQNWDNITELDKV
FT                   SGFHGIIDSFEQHYKAWNGWYATTFPEQEDLVGEWNDKLTDFQKICVLRSLRPDRISFC
FT                   LTQFIITKLGPRYVDPPVLDLKATFDESISQTPLIFVLSPGVDPAQSLISLSESVKMAQ
FT                   RMYSLSLGQGQAPIATKLIMDGIKDGNWVFLANCHLSLSWMPTLDKMIATMQSMKLHKK
FT                   FRLWLSSSPHPDFPISILQTSIKMTTEPPRGIKSNMKRLYNNINEANMENCSEPSKYKK
FT                   LLFALCFFHTVLLERKKFLELGWNVIYSFNDSDFEVSEILLLLYLNEYEDTPWGALKYL
FT                   IAGVNYGGHITDDWDRRLLITYINQFFCDQALQTRKFRLSTLPNYFIPDDGDVQSYLDQ
FT                   IQMFPNFDKPDAFGQHSNADIASLIGETRMLFEALLSMQVQTNSTSSNENGETKVFDLA
FT                   KEILMNTPDEINYEQTAKIIGINRTPLEVVLLQEIERYNKLLVDMSTQLRDLRRGIQGL
FT                   VVMSSDLEDIYLAVSEGRVPLQWLKAYNSLKPLAAWARDLIHRVGHFNSWAKTLRPPIL
FT                   FWLAAYTFPTGFVTAVLQTSARATKTPIDELSWDFYVFVEEDTAAARIIREGGGVYIRS
FT                   LFLEGGGWLRKNQCLQDPLPMELICPLPVIHFKPVENLKKRCRGVYQCPAYYYPVRSGS
FT                   FVIAVDLKSGNEKADYWIKRGTALLLSLAN"


Which gives:

>FBtr0300022__FBgn0001313
LQGLNAQLDQVDVQQIIRVLRSTHSVYIKQIDELIFESTHELMEAMENIKFLHLLMQPCSQLDFSESPTFVSQLIPRTIHLIRFIWLNSEQYNRRDLITGIFRNLSNQIIRFCTEKVNVEKILSGSSRFGIKICNMCIDCCLTYKGIYDIMSKTHAKINIRIGWSLDNAMIFNHVDAFMERLNDVIDICESMMVFGRLDESESIPKPQFGGTSGTEFEATADNVENEFLVTLTALCTDSKEIILNVHKNEWYEEVIKYRRTVQSMEETVQRLMSNVFQHICNVEEALESLNVMIFYSYRSTIRKTFLRQVSSAWVFFSNEIDSSVHMLMDRSKMHESWVPYYASRALGYRVHLDRLVWLCNRLNSSDWLPNVSEASVVLKKFESVRREFDKEVKKSFDEWQKNCCSLLLNQKLDRYLLIRSKKKKGLIECNIDRTILTICEQAQHFERLGLGVPGMVRKIYEKHETLRFVYNSVVQVCLNYNHILSALSEQERKLFRALIQACDRKIAPGVFKLTYGGELSDAYIADCAKHTNKLQETMDIYKRAIQNIARFCEKICDTPMLKFNFSGAVTISIFENHLSSYLRRVSNILRGFYSTITDLIFAVFKEFQAVIEDMPIEWYGFVNVFDDMLATAFLTSSKNSLNMLTNALHRDPDMAAAPILVMESDVRERCIVLTPDIDVIANLLSGYIDRIHNILEQFPRIGIKMKLPKEHQYESFSKAFLEDSESTQLICNIEAEINHEREEIDGYITFWNSHRMLWETTELEFTKRVKATQMTADIFEASIEYYSAMADDISYVDAITHVYFILMNQNYIKSSILDCIEKWQALNIKILLSHSFSLIRAIYRYMRKNERKMMMVPRTLKESLLAKQFFERIINEVPLKQAGFPPTLELFAILDKYQVEIPEEIRVKVIGLEAAWHHYLKRLGEADEMLDNNREEFKKILVQQAEKFKIILKEFLDDFFLKLPTSANINPRIALKFLRIIALKIEDCFTFEESLMRDLAVFNVNQPESIDLRKLDFEVRIVKNIWELIFEWQTNWEGWKKGYFWKMNINEMEDTALNLYKEFTTLNKKFYDRHWEMLEATTKNVDSFRRTLPLITALKNPCMRERHWNRVRDVIHVNFDENSKNFTLELIINLDFQAFSEDIQDISNPATMELQIENSIKNIATIGKNKVLKWLLS*WYL*NKKR*GLFSAP*RTHGTNIGYESNSFC*AIYNHC*LLGKNTIVHK*DSGKGFNCSAPMALPRKYIPRRRHKKTTSRRGKTFCNNN*RVSNNIKQNVPGKDSRKSH*LTPSTVFIKPF*SNGRKTGTYSTCLRNLS*G*TTTFSKILFYF***PFRNFRKF*AAGLSSNPP*EVI**FIQA*AQARWENFKSVASFWNAFRRWRIC*VHDGYLYRWTIGALAKTSRRVHACCYERDA*TYSRIS*KTCREQRKMDFALARTNGANHSSDPMDN*VYA*PNSL*YG*SKKTPTQAKEKANKSSF*IIRNESKRPNKNNAP*SKYPHNA*NTWS*CYRKNV*IKL*GYGPF*MVFTTQILLAP*IGTMCNKADKHRALGMFR*V*SNKY*SALSRGTTNNVYNGSAFYKGVGAYVRGSNDKVKAHSWSIHYYESWICRTD*TS**FKVNV*THINDGT**YNYCGKFTFFGWFY*YKKLGPKGIYVV*AG*AATFKAISL*FWSSLNGGFASLRGSKKTSITKYY*RRNCLFGNERYECCEINS**FTPF*WYYV*HISWC*LTNYRLQ*I*YCDL*RI*GGGSPTNYHSRKKSN*AF*NKKL*ALSYDHRGYGNSQISYMENITKLFLSNE*SKIFRMGSSHRLPSKSKSIESSRALWGIQLVDW*MA*RSFKFYYANNLWR*RADSEMVVV*WTCGCSMD*KHELSNG***TSYACK*RTYNHASSSIAIV*SRRPGCCFTSNCFPMWNGL*RLQ*LGMETFCKLMVTAPKN*GVR*FFTNTF*LHGAKNTGF*TNEVQRACKDK*VKWSCVAL*IARNIWHKGKWDKSH*FRTS*GDD*IVVYVLFSMVNLFKCG*RQSPKTR*LYTGTRKLLSNKRYCV*LFCGSQ*TNLFTMG*QAVEQLEMRFRISFLQDYCSYWRHCSL*ICCFKTSC*RISCDACWKCWYRKNVNGYKCNGGL**K*ILHFSCEHVSTDNSSRVTRINRKSD*ETYKNAICTYRWQTDDMFYGRL*YACKRHLWISATFGAYSAMDRLQVLV**KNSTKNICAKHIINGCDGTAWRGQTNNFQSNSKSVCFIKLNFSFTRNNYSHIWNDALSKTRVIPK*SS*DVATYNPLYH*LICIND**NVTDAK*ISLLI*S*RYIQSLSRTIKK*KRTSKQKKFFFTALGS*VF*SVQRPIG*RLRSVLVCKYY**YTW*TF*SYFSQSLSFKGSTIFR*LCSPSRVLRRSTGRFLKNIYEKST*GI*QLSRND*NEPSIF*RSYRTYCSNPESYFPTAWTHFKYGDRWIRPTSINQVSCVYFGNGSFPN*GYQKIQNRRLSRRPKKLIQSNWN*TETNDFYI*QRPNSRSLISRNNKQYAKYWRNKLI*IR*IRRAKA*T*TPGKKKWGSANN*STIFLFYFKCARLPACCALF*PNRRKFSKLYKTISGFVKFNNSKLV*ILATRSPFGSSFAFSNRISIKRSGFWKRGRKTSRKFGYKHRSHSSTRYCLCIFSNSLKCC*NVGKYVCRS*AL*LCNLTKLFAACKWF*KTIRKEKIRSINCFQ*ITQWAFKNF*NSGKSILNVRRA*S*L*TS*NTC*RM*RFYIHD*NSKE*SNGTKGKSGCRSRAY*KG*NNLS*ISSYSSCGLGGGNAYDRCCCKSIRCIE*ERHFRS*IIWTAANENRKGYGSCIDLTWKRTNMGKC*KSFK*INIFERPKKL**RSYFR*NS*TYCNLYKKS*VRAR*SGCCIACVQIIDAMDNGHRKLRKSLPNSRSKAGKIR*CNEVT*RKASCFSCGKKKTRRASGCH*RTLPAA*RKN*PS**IACQGRTT*KATGACHYFGRIAFWRERKVD*NG*SVGLIL*KTSR*LLAFCCVYVVLRGF*HQIPRRITCKMVFIN*RSFNTSNFGA*GYVFSSRCCFDSRMEYSRSTC**FKY*KRSNSYSR*SLASYY*PSNAS**LDKKYGRA*SINDTRFRYGRLLTSARTSSKRRFACIVAKRGGILRSSY*SNFAAELYHSKWRKVIKI**QVYFIQ*FVQILHNDKNIKSTLPTGNLIKNYYCKFCTKARWA*SPTTRNYCSKRKTRPRRTKRRTGNDNSSKQTDINRSR**DSTAT**KSRFLIR*R*VIFNFTKIPSDISAS*GVA*HCRGN*SRN*CGPTRIQTSIGTRIHFILCFNGYV*N*SNVCFFSGSVYIIIHTVY*AKSS*SASP*KNSKY**ISFLCGLPKYLSWAFRAT*ATIFNSYDSKDSFKRWKAFGRRV*FYSERRYSIR*TGTSAQPGTMVDK*AKLG*YNRIR*SFWISWDNRFF*ATLQGLEWLVCHDLPRTRRSRWRME**TYRFSKNLCFTFTSTG*NFFLFDTIYYYQTWASIC*SASS*SQGNF**IDFTDSPHIRIITRCRSSPISHITIRIS*NGTTNVLT*LGSRTSTYCNKAYNGWHQGW*LGIFSKLSFVS*LDAYS*QDDSHYAVHETT*KISTVAKLKPSSGLSNIYFANQY*DDN*TSSWNQIKYETSI*QHK*G*YGKL**TQQV*EVIIRFVLLSYSPTRTKKIFRTWLECYLQL*RF*F*SFRNTTIIVS**I*RHSLGSFKVSHSRSKLRRTHYRRLGSPTINNLYKPIFL*PSIAD*KV*IINPSKLFYSR*RRCAIIFRPNTNVSQF**A*CFWTTFKCRYSVINRRNKNAF*GSAFYASPD**HK***KR*DKSI*SR*RNFNEYTG*DKL*TDGKNYWNQSNSLRSCLTSRN*AL**TSR*HVHSIT*LKTWNTGTCCNEFGLRGYLSSCL*RKGAITMVKSI*FIETISGMG*RLNTSCRTF**LGENTPPSNIILACSLHVSNWICYSSTTNFSSSYQNTN**TLLGFLCFC*RRYCRSSYNKGRRRRLHSKFVFGGWRMVEEKPMPSGSTTDGTNLSITSNTL*ASRKPKKTMSWCLPVSRILLSR*VRIICNSRGLKVW**KG*LLDKARYCTFIKFSKL


The start is good, but then it gets bad...
The problem seems the same as this old one:
http://bioperl.org/pipermail/bioperl-l/2004-August/016735.html

I must be missing something....

Thanks for your help!

--
Tristan




More information about the Bioperl-l mailing list