[Bioperl-l] spliced

Fields, Christopher J cjfields at illinois.edu
Fri Jul 12 17:59:11 UTC 2013


According to FlyBase that particular gene is listed as incomplete.  Do you have the original EMBL nuc accession # so we can test this?

chris

On Jul 12, 2013, at 10:16 AM, Tristan Lefebure <tristan.lefebure at gmail.com> wrote:

> Dear bioperlers,
> 
> I am trying to extract CDS sequences from ensembl EMBL files (*.dat),
> but I get STOP codons where I suppose I should not get some. I am
> using the usual way of extracting coding seq, splice the seq, and then
> translate them:
> 
> $feat_object->spliced_seq->translate(-codontable_id => $genetcode)->seq
> 
> An example with this CDS:
> 
> FT   CDS             join(81211..83052,122064..122745,122800..123184,
> FT                   135429..135872,138620..139940,140189..141784,
> FT                   141841..145922,145985..146089,146143..147085,
> FT                   147145..147638,147696..148243)
> FT                   /gene="FBgn0001313"
> FT                   /protein_id="FBpp0289299"
> FT                   /note="transcript_id=FBtr0300022"
> FT                   /db_xref="flybase_transcript_id:FBtr0300022"
> FT                   /db_xref="RefSeq_peptide:NP_001015505"
> FT                   /db_xref="RefSeq_mRNA:NM_001015505"
> FT                   /db_xref="Uniprot/SPTREMBL:Q281X0"
> FT                   /db_xref="Uniprot/SPTREMBL:Q5LJP0"
> FT                   /db_xref="FlyBaseCGID_transcript:CG17866-RB"
> FT                   /db_xref="FlyBaseCGID_translation:CG17866-PB"
> FT                   /db_xref="FlyBaseName_transcript:FBtr0300022"
> FT                   /db_xref="FlyBaseName_translation:FBpp0289299"
> FT                   /db_xref="GO:0003774"
> FT                   /db_xref="GO:0003777"
> FT                   /db_xref="GO:0003777"
> FT                   /db_xref="GO:0005524"
> FT                   /db_xref="GO:0005524"
> FT                   /db_xref="GO:0005858"
> FT                   /db_xref="GO:0005875"
> FT                   /db_xref="GO:0007018"
> FT                   /db_xref="GO:0007018"
> FT                   /db_xref="GO:0016887"
> FT                   /db_xref="GO:0030286"
> FT                   /db_xref="GO:0030286"
> FT                   /db_xref="GO:0042623"
> FT                   /db_xref="flybase_translation_id:FBpp0289299"
> FT                   /db_xref="goslim_goa:GO:0003674"
> FT                   /db_xref="goslim_goa:GO:0005575"
> FT                   /db_xref="goslim_goa:GO:0005622"
> FT                   /db_xref="goslim_goa:GO:0005623"
> FT                   /db_xref="goslim_goa:GO:0005856"
> FT                   /db_xref="goslim_goa:GO:0008150"
> FT                   /db_xref="goslim_goa:GO:0016887"
> FT                   /db_xref="goslim_goa:GO:0043226"
> FT                   /db_xref="goslim_goa:GO:0043234"
> FT                   /db_xref="UniParc:UPI00018FBBAC"
> FT                   /translation="LQGLNAQLDQVDVQQIIRVLRSTHSVYIKQIDELIFESTHELMEA
> FT                   MENIKFLHLLMQPCSQLDFSESPTFVSQLIPRTIHLIRFIWLNSEQYNRRDLITGIFRN
> FT                   LSNQIIRFCTEKVNVEKILSGSSRFGIKICNMCIDCCLTYKGIYDIMSKTHAKINIRIG
> FT                   WSLDNAMIFNHVDAFMERLNDVIDICESMMVFGRLDESESIPKPQFGGTSGTEFEATAD
> FT                   NVENEFLVTLTALCTDSKEIILNVHKNEWYEEVIKYRRTVQSMEETVQRLMSNVFQHIC
> FT                   NVEEALESLNVMIFYSYRSTIRKTFLRQVSSAWVFFSNEIDSSVHMLMDRSKMHESWVP
> FT                   YYASRALGYRVHLDRLVWLCNRLNSSDWLPNVSEASVVLKKFESVRREFDKEVKKSFDE
> FT                   WQKNCCSLLLNQKLDRYLLIRSKKKKGLIECNIDRTILTICEQAQHFERLGLGVPGMVR
> FT                   KIYEKHETLRFVYNSVVQVCLNYNHILSALSEQERKLFRALIQACDRKIAPGVFKLTYG
> FT                   GELSDAYIADCAKHTNKLQETMDIYKRAIQNIARFCEKICDTPMLKFNFSGAVTISIFE
> FT                   NHLSSYLRRVSNILRGFYSTITDLIFAVFKEFQAVIEDMPIEWYGFVNVFDDMLATAFL
> FT                   TSSKNSLNMLTNALHRDPDMAAAPILVMESDVRERCIVLTPDIDVIANLLSGYIDRIHN
> FT                   ILEQFPRIGIKMKLPKEHQYESFSKAFLEDSESTQLICNIEAEINHEREEIDGYITFWN
> FT                   SHRMLWETTELEFTKRVKATQMTADIFEASIEYYSAMADDISYVDAITHVYFILMNQNY
> FT                   IKSSILDCIEKWQALNIKILLSHSFSLIRAIYRYMRKNERKMMMVPRTLKESLLAKQFF
> FT                   ERIINEVPLKQAGFPPTLELFAILDKYQVEIPEEIRVKVIGLEAAWHHYLKRLGEADEM
> FT                   LDNNREEFKKILVQQAEKFKIILKEFLDDFFLKLPTSANINPRIALKFLRIIALKIEDC
> FT                   FTFEESLMRDLAVFNVNQPESIDLRKLDFEVRIVKNIWELIFEWQTNWEGWKKGYFWKM
> FT                   NINEMEDTALNLYKEFTTLNKKFYDRHWEMLEATTKNVDSFRRTLPLITALKNPCMRER
> FT                   HWNRVRDVIHVNFDENSKNFTLELIINLDFQAFSEDIQDISNPATMELQIENSIKNIAT
> FT                   IGKNKVLKCFYHDGIYRIKNVEDCFQLLEEHMVQISAMKATRFVEPFITIVDYWEKTLS
> FT                   YISETLEKGLTVQRQWLYLENIFQGDDIRKQLPEEAKRFATITEEFRTISSKMFQAKTA
> FT                   VKATNLRPPPFLLNRFSRMDERLELIQRALEIYLEAKRQLFPRFYFISNDDLLEILGNS
> FT                   KRPDLVQTHLKKLFDNLYKLELKRVGKTLSRWQASGMHSDDGEYVEFMMVIYIDGPSER
> FT                   WLKQVEEYMLVVMKEMLKLTRGSLKKLVGNREKWISLWPGQMVLTTAQIQWTTECTRSL
> FT                   IHCSMVDQKKPLRKLKKKQIKVLSKLSEMSRKDLTKTMRLKVNTLITLEIHGRDVIERM
> FT                   YKSNCKDTGHFEWFSQLRFYWHRESELCVIRQTNTEHWGCFDEFNRINIEVLSVVAQQI
> FT                   MSIMAALSTKALELMFEGQMIKLKHTVGLFITMNPGYAGRTELPDNLKSMFRPISMMVP
> FT                   DNIIIAENLLFSDGFTNTRNLARKVYTLYELAKQQLSKQYHYDFGLRSMVALLRYAGRK
> FT                   RRQLPNTTEEEIVYLAMKDMNVARLTANDLPLFNGIMSDIFPGVSLPTIDYSEFNIAIY
> FT                   EEFREAGLQPITIAVKKVIELFETKNSRHSVMIIGDTGTAKSVTWRTLQNCFYRMNSQR
> FT                   FSGWEAVTVYPVNPKALNLAELYGEYNLSTGEWLDGVLSSIMRIICGDEEPTQKWLLFD
> FT                   GPVDAVWIENMNSVMDDNKLLTLVNSERITMPVQVSLLFEVGDLAVASPATVSRCGMVY
> FT                   NDYNDWGWKPFVNSWLQRLRIKEFADFLRIHFDYMVPKILDFKRMRCKEPVRTNELNGV
> FT                   VSLCKLLEIFGTKVNGINPINLELLEEMTRLWFMFCLVWSICSSVDEDSRQRLDSFIRE
> FT                   LESCFPIKDTVFDYFVDPNERTFLPWDSKLLSSWKCDFESPFYKIIVPTGDTVRYEYVV
> FT                   SKLLAEEYPVMLVGNVGTGKTSTAISVMEACDKNKFCILAVNMSAQTTAAGLQESIENR
> FT                   TEKRTKTQFVPIGGKRMICFMDDFNMPAKDIYGSQPPLELIRQWIDYKYWFNRKTQQKI
> FT                   YVQNTLLMAAMGPPGGGRQTISSRTQSRFVLLNLTFPSQETIIRIFGTMLCQKLESYPN
> FT                   EVREMWLPITLCTINLYVSMISKMLPTPNKSHYLFNLRDISKVFQGLLRSEKELQNKKN
> FT                   FFLRLWVHECFRVFSDRLVDDSDQFWFVNTINDILGKHFEVTFHSLCPSKVPPFFGDFA
> FT                   HPQGFYEDLQVDFLRTFMKNQLEEYNNFPGMTRMNLVFFREAIEHIVRILRVISQPRGH
> FT                   ILNMGIGGSGRQVLTKLAAFILEMAVFQIEVTKKYKTGDFREDLKNLYKVTGIKQRLTI
> FT                   FIFSSDQIAEVSFLEITNNMLSTGEINLFKSDEFDELKPELERPAKKNGVLLTTEALYS
> FT                   YFILNVRDFLHVALCFSPIGENFRSYIRQYPALLSSTTPNWFRFWPQEALLEVASHFLI
> FT                   GFPLNVVVSGKEDEKHRESLVISTEAILQRDIAYVFSVIHSSVAKMSENMYAEVKRYNY
> FT                   VTSPNYLQLVSGFKKLLEKKRLEVSTASNRLRNGLSKISETQEKVSLMSEELKASSEQV
> FT                   KILARECEDFISMIEIQKSEATEQKEKVDAEAVLIRRDEIICLELAATARADLEVVMPM
> FT                   IDAAVKALDALNKKDISEVKSYGRPPMKIEKVMEAVLILLGKEPTWENAKKVLSESTFL
> FT                   NDLKNFDRDHISDKTLKRIAIYTKNPELEPDKVAVVSLACKSLMQWIMAIENYGKVYRI
> FT                   VAPKQEKLDSAMKSLEEKQAALAAAKKKLEELQVVIEELYRQLEEKTNLLNELRAKEER
> FT                   LRKQLERAIILVESLSGERERWIETVNQLDLSFEKLPGDCLLSVAFMSYLGAFDTKYRE
> FT                   ELLVKWSLLIKDLLIPATLELKVTYFLVDAVSIREWNIQGLPADDLSTENGVIVTQGSR
> FT                   WPLIIDPQMQANNWIKNMEERNQLMTLDFGMADYLRQLERALKEGLPVLLQNVGEYLDQ
> FT                   AINPILRQSFTIQSGERLLKFNDKYISYNNSFRFYITTKISNPHYPPEISSKTTIVNFA
> FT                   LKQDGLEAQLLGIIVRKEKPALEEQKDELVMTIARNKRTLIDLDNEILRLLNESRGSLL
> FT                   DDDELFSTLQKSRQTSVLVKESLSIAEVTEVEIDAARQEYKPASERASILFFVLMDMSK
> FT                   IDPMYVFSLAAYILLFTQSIERSPRNQLVHERIQNINEYHSYAVYRNTCRGLFERHKLL
> FT                   FSIHMTAKILSNAGKLLEEEYDFILKGGIVLDKLGQAPNPAPWWISEQNWDNITELDKV
> FT                   SGFHGIIDSFEQHYKAWNGWYATTFPEQEDLVGEWNDKLTDFQKICVLRSLRPDRISFC
> FT                   LTQFIITKLGPRYVDPPVLDLKATFDESISQTPLIFVLSPGVDPAQSLISLSESVKMAQ
> FT                   RMYSLSLGQGQAPIATKLIMDGIKDGNWVFLANCHLSLSWMPTLDKMIATMQSMKLHKK
> FT                   FRLWLSSSPHPDFPISILQTSIKMTTEPPRGIKSNMKRLYNNINEANMENCSEPSKYKK
> FT                   LLFALCFFHTVLLERKKFLELGWNVIYSFNDSDFEVSEILLLLYLNEYEDTPWGALKYL
> FT                   IAGVNYGGHITDDWDRRLLITYINQFFCDQALQTRKFRLSTLPNYFIPDDGDVQSYLDQ
> FT                   IQMFPNFDKPDAFGQHSNADIASLIGETRMLFEALLSMQVQTNSTSSNENGETKVFDLA
> FT                   KEILMNTPDEINYEQTAKIIGINRTPLEVVLLQEIERYNKLLVDMSTQLRDLRRGIQGL
> FT                   VVMSSDLEDIYLAVSEGRVPLQWLKAYNSLKPLAAWARDLIHRVGHFNSWAKTLRPPIL
> FT                   FWLAAYTFPTGFVTAVLQTSARATKTPIDELSWDFYVFVEEDTAAARIIREGGGVYIRS
> FT                   LFLEGGGWLRKNQCLQDPLPMELICPLPVIHFKPVENLKKRCRGVYQCPAYYYPVRSGS
> FT                   FVIAVDLKSGNEKADYWIKRGTALLLSLAN"
> 
> 
> Which gives:
> 
>> FBtr0300022__FBgn0001313
> LQGLNAQLDQVDVQQIIRVLRSTHSVYIKQIDELIFESTHELMEAMENIKFLHLLMQPCSQLDFSESPTFVSQLIPRTIHLIRFIWLNSEQYNRRDLITGIFRNLSNQIIRFCTEKVNVEKILSGSSRFGIKICNMCIDCCLTYKGIYDIMSKTHAKINIRIGWSLDNAMIFNHVDAFMERLNDVIDICESMMVFGRLDESESIPKPQFGGTSGTEFEATADNVENEFLVTLTALCTDSKEIILNVHKNEWYEEVIKYRRTVQSMEETVQRLMSNVFQHICNVEEALESLNVMIFYSYRSTIRKTFLRQVSSAWVFFSNEIDSSVHMLMDRSKMHESWVPYYASRALGYRVHLDRLVWLCNRLNSSDWLPNVSEASVVLKKFESVRREFDKEVKKSFDEWQKNCCSLLLNQKLDRYLLIRSKKKKGLIECNIDRTILTICEQAQHFERLGLGVPGMVRKIYEKHETLRFVYNSVVQVCLNYNHILSALSEQERKLFRALIQACDRKIAPGVFKLTYGGELSDAYIADCAKHTNKLQETMDIYKRAIQNIARFCEKICDTPMLKFNFSGAVTISIFENHLSSYLRRVSNILRGFYSTITDLIFAVFKEFQAVIEDMPIEWYGFVNVFDDMLATAFLTSSKNSLNMLTNALHRDPDMAAAPILVMESDVRERCIVLTPDIDVIANLLSGYIDRIHNILEQFPRIGIKMKLPKEHQYESFSKAFLEDSESTQLICNIEAEINHEREEIDGYITFWNSHRMLWETTELEFTKRVKATQMTADIFEASIEYYSAMADDISYVDAITHVYFILMNQNYIKSSILDCIEKWQALNIKILLSHSFSLIRAIYRYMRKNERKMMMVPRTLKESLLAKQFFERIINEVPLKQAGFPPTLELFAILDKYQVEIPEEIRVKVIGLEAAWHHYLKRLGEADEMLDNNREEFKKILVQQAEKFKIILKEFLDDFFLKLPTSANINPRIALKFLRIIALKIEDC!
> FTFEESLMRDLAVFNVNQPESIDLRKLDFEVRIVKNIWELIFEWQTNWEGWKKGYFWKMNINEMEDTALNLYKEFTTLNKKFYDRHWEMLEATTKNVDSFRRTLPLITALKNPCMRERHWNRVRDVIHVNFDENSKNFTLELIINLDFQAFSEDIQDISNPATMELQIENSIKNIATIGKNKVLKWLLS*WYL*NKKR*GLFSAP*RTHGTNIGYESNSFC*AIYNHC*LLGKNTIVHK*DSGKGFNCSAPMALPRKYIPRRRHKKTTSRRGKTFCNNN*RVSNNIKQNVPGKDSRKSH*LTPSTVFIKPF*SNGRKTGTYSTCLRNLS*G*TTTFSKILFYF***PFRNFRKF*AAGLSSNPP*EVI**FIQA*AQARWENFKSVASFWNAFRRWRIC*VHDGYLYRWTIGALAKTSRRVHACCYERDA*TYSRIS*KTCREQRKMDFALARTNGANHSSDPMDN*VYA*PNSL*YG*SKKTPTQAKEKANKSSF*IIRNESKRPNKNNAP*SKYPHNA*NTWS*CYRKNV*IKL*GYGPF*MVFTTQILLAP*IGTMCNKADKHRALGMFR*V*SNKY*SALSRGTTNNVYNGSAFYKGVGAYVRGSNDKVKAHSWSIHYYESWICRTD*TS**FKVNV*THINDGT**YNYCGKFTFFGWFY*YKKLGPKGIYVV*AG*AATFKAISL*FWSSLNGGFASLRGSKKTSITKYY*RRNCLFGNERYECCEINS**FTPF*WYYV*HISWC*LTNYRLQ*I*YCDL*RI*GGGSPTNYHSRKKSN*AF*NKKL*ALSYDHRGYGNSQISYMENITKLFLSNE*SKIFRMGSSHRLPSKSKSIESSRALWGIQLVDW*MA*RSFKFYYANNLWR*RADSEMVVV*WTCGCSMD*KHELSNG***TSYACK*RTYNHASSSIAIV*SRRPGCCFTSNCFPMWNGL*RLQ*LGMETFCKLMVTAPKN*GVR*FFTNTF*LHGAKNTGF*T!
> NEVQRACKDK*VKWSCVAL*IARNIWHKGKWDKSH*FRTS*GDD*IVVYVLFSMVNLFKCG*RQSPKTR*
> LYTGTRKLLSNKRYCV*LFCGSQ*TNLFTMG*QAVEQLEMRFRISFLQDYCSYWRHCSL*ICCFKTSC*RISCDACWKCWYRKNVNGYKCNGGL**K*ILHFSCEHVSTDNSSRVTRINRKSD*ETYKNAICTYRWQTDDMFYGRL*YACKRHLWISATFGAYSAMDRLQVLV**KNSTKNICAKHIINGCDGTAWRGQTNNFQSNSKSVCFIKLNFSFTRNNYSHIWNDALSKTRVIPK*SS*DVATYNPLYH*LICIND**NVTDAK*ISLLI*S*RYIQSLSRTIKK*KRTSKQKKFFFTALGS*VF*SVQRPIG*RLRSVLVCKYY**YTW*TF*SYFSQSLSFKGSTIFR*LCSPSRVLRRSTGRFLKNIYEKST*GI*QLSRND*NEPSIF*RSYRTYCSNPESYFPTAWTHFKYGDRWIRPTSINQVSCVYFGNGSFPN*GYQKIQNRRLSRRPKKLIQSNWN*TETNDFYI*QRPNSRSLISRNNKQYAKYWRNKLI*IR*IRRAKA*T*TPGKKKWGSANN*STIFLFYFKCARLPACCALF*PNRRKFSKLYKTISGFVKFNNSKLV*ILATRSPFGSSFAFSNRISIKRSGFWKRGRKTSRKFGYKHRSHSSTRYCLCIFSNSLKCC*NVGKYVCRS*AL*LCNLTKLFAACKWF*KTIRKEKIRSINCFQ*ITQWAFKNF*NSGKSILNVRRA*S*L*TS*NTC*RM*RFYIHD*NSKE*SNGTKGKSGCRSRAY*KG*NNLS*ISSYSSCGLGGGNAYDRCCCKSIRCIE*ERHFRS*IIWTAANENRKGYGSCIDLTWKRTNMGKC*KSFK*INIFERPKKL**RSYFR*NS*TYCNLYKKS*VRAR*SGCCIACVQIIDAMDNGHRKLRKSLPNSRSKAGKIR*CNEVT*RKASCFSCGKKKTRRASGCH*RTLPAA*RKN*PS**IACQGRTT*KATGACHYFGRIAFWRERKVD*NG*SVGLIL*KTSR*LL!
> AFCCVYVVLRGF*HQIPRRITCKMVFIN*RSFNTSNFGA*GYVFSSRCCFDSRMEYSRSTC**FKY*KRSNSYSR*SLASYY*PSNAS**LDKKYGRA*SINDTRFRYGRLLTSARTSSKRRFACIVAKRGGILRSSY*SNFAAELYHSKWRKVIKI**QVYFIQ*FVQILHNDKNIKSTLPTGNLIKNYYCKFCTKARWA*SPTTRNYCSKRKTRPRRTKRRTGNDNSSKQTDINRSR**DSTAT**KSRFLIR*R*VIFNFTKIPSDISAS*GVA*HCRGN*SRN*CGPTRIQTSIGTRIHFILCFNGYV*N*SNVCFFSGSVYIIIHTVY*AKSS*SASP*KNSKY**ISFLCGLPKYLSWAFRAT*ATIFNSYDSKDSFKRWKAFGRRV*FYSERRYSIR*TGTSAQPGTMVDK*AKLG*YNRIR*SFWISWDNRFF*ATLQGLEWLVCHDLPRTRRSRWRME**TYRFSKNLCFTFTSTG*NFFLFDTIYYYQTWASIC*SASS*SQGNF**IDFTDSPHIRIITRCRSSPISHITIRIS*NGTTNVLT*LGSRTSTYCNKAYNGWHQGW*LGIFSKLSFVS*LDAYS*QDDSHYAVHETT*KISTVAKLKPSSGLSNIYFANQY*DDN*TSSWNQIKYETSI*QHK*G*YGKL**TQQV*EVIIRFVLLSYSPTRTKKIFRTWLECYLQL*RF*F*SFRNTTIIVS**I*RHSLGSFKVSHSRSKLRRTHYRRLGSPTINNLYKPIFL*PSIAD*KV*IINPSKLFYSR*RRCAIIFRPNTNVSQF**A*CFWTTFKCRYSVINRRNKNAF*GSAFYASPD**HK***KR*DKSI*SR*RNFNEYTG*DKL*TDGKNYWNQSNSLRSCLTSRN*AL**TSR*HVHSIT*LKTWNTGTCCNEFGLRGYLSSCL*RKGAITMVKSI*FIETISGMG*RLNTSCRTF**LGENTPPSNIILACSLHVSNWICYSSTTNFSSSYQN!
> TN**TLLGFLCFC*RRYCRSSYNKGRRRRLHSKFVFGGWRMVEEKPMPSGSTTDGTNLSITSNTL*ASRK
> PKKTMSWCLPVSRILLSR*VRIICNSRGLKVW**KG*LLDKARYCTFIKFSKL
> 
> 
> The start is good, but then it gets bad...
> The problem seems the same as this old one:
> http://bioperl.org/pipermail/bioperl-l/2004-August/016735.html
> 
> I must be missing something....
> 
> Thanks for your help!
> 
> --
> Tristan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list