[Bioperl-l] spliced
Fields, Christopher J
cjfields at illinois.edu
Fri Jul 12 17:59:11 UTC 2013
According to FlyBase that particular gene is listed as incomplete. Do you have the original EMBL nuc accession # so we can test this?
chris
On Jul 12, 2013, at 10:16 AM, Tristan Lefebure <tristan.lefebure at gmail.com> wrote:
> Dear bioperlers,
>
> I am trying to extract CDS sequences from ensembl EMBL files (*.dat),
> but I get STOP codons where I suppose I should not get some. I am
> using the usual way of extracting coding seq, splice the seq, and then
> translate them:
>
> $feat_object->spliced_seq->translate(-codontable_id => $genetcode)->seq
>
> An example with this CDS:
>
> FT CDS join(81211..83052,122064..122745,122800..123184,
> FT 135429..135872,138620..139940,140189..141784,
> FT 141841..145922,145985..146089,146143..147085,
> FT 147145..147638,147696..148243)
> FT /gene="FBgn0001313"
> FT /protein_id="FBpp0289299"
> FT /note="transcript_id=FBtr0300022"
> FT /db_xref="flybase_transcript_id:FBtr0300022"
> FT /db_xref="RefSeq_peptide:NP_001015505"
> FT /db_xref="RefSeq_mRNA:NM_001015505"
> FT /db_xref="Uniprot/SPTREMBL:Q281X0"
> FT /db_xref="Uniprot/SPTREMBL:Q5LJP0"
> FT /db_xref="FlyBaseCGID_transcript:CG17866-RB"
> FT /db_xref="FlyBaseCGID_translation:CG17866-PB"
> FT /db_xref="FlyBaseName_transcript:FBtr0300022"
> FT /db_xref="FlyBaseName_translation:FBpp0289299"
> FT /db_xref="GO:0003774"
> FT /db_xref="GO:0003777"
> FT /db_xref="GO:0003777"
> FT /db_xref="GO:0005524"
> FT /db_xref="GO:0005524"
> FT /db_xref="GO:0005858"
> FT /db_xref="GO:0005875"
> FT /db_xref="GO:0007018"
> FT /db_xref="GO:0007018"
> FT /db_xref="GO:0016887"
> FT /db_xref="GO:0030286"
> FT /db_xref="GO:0030286"
> FT /db_xref="GO:0042623"
> FT /db_xref="flybase_translation_id:FBpp0289299"
> FT /db_xref="goslim_goa:GO:0003674"
> FT /db_xref="goslim_goa:GO:0005575"
> FT /db_xref="goslim_goa:GO:0005622"
> FT /db_xref="goslim_goa:GO:0005623"
> FT /db_xref="goslim_goa:GO:0005856"
> FT /db_xref="goslim_goa:GO:0008150"
> FT /db_xref="goslim_goa:GO:0016887"
> FT /db_xref="goslim_goa:GO:0043226"
> FT /db_xref="goslim_goa:GO:0043234"
> FT /db_xref="UniParc:UPI00018FBBAC"
> FT /translation="LQGLNAQLDQVDVQQIIRVLRSTHSVYIKQIDELIFESTHELMEA
> FT MENIKFLHLLMQPCSQLDFSESPTFVSQLIPRTIHLIRFIWLNSEQYNRRDLITGIFRN
> FT LSNQIIRFCTEKVNVEKILSGSSRFGIKICNMCIDCCLTYKGIYDIMSKTHAKINIRIG
> FT WSLDNAMIFNHVDAFMERLNDVIDICESMMVFGRLDESESIPKPQFGGTSGTEFEATAD
> FT NVENEFLVTLTALCTDSKEIILNVHKNEWYEEVIKYRRTVQSMEETVQRLMSNVFQHIC
> FT NVEEALESLNVMIFYSYRSTIRKTFLRQVSSAWVFFSNEIDSSVHMLMDRSKMHESWVP
> FT YYASRALGYRVHLDRLVWLCNRLNSSDWLPNVSEASVVLKKFESVRREFDKEVKKSFDE
> FT WQKNCCSLLLNQKLDRYLLIRSKKKKGLIECNIDRTILTICEQAQHFERLGLGVPGMVR
> FT KIYEKHETLRFVYNSVVQVCLNYNHILSALSEQERKLFRALIQACDRKIAPGVFKLTYG
> FT GELSDAYIADCAKHTNKLQETMDIYKRAIQNIARFCEKICDTPMLKFNFSGAVTISIFE
> FT NHLSSYLRRVSNILRGFYSTITDLIFAVFKEFQAVIEDMPIEWYGFVNVFDDMLATAFL
> FT TSSKNSLNMLTNALHRDPDMAAAPILVMESDVRERCIVLTPDIDVIANLLSGYIDRIHN
> FT ILEQFPRIGIKMKLPKEHQYESFSKAFLEDSESTQLICNIEAEINHEREEIDGYITFWN
> FT SHRMLWETTELEFTKRVKATQMTADIFEASIEYYSAMADDISYVDAITHVYFILMNQNY
> FT IKSSILDCIEKWQALNIKILLSHSFSLIRAIYRYMRKNERKMMMVPRTLKESLLAKQFF
> FT ERIINEVPLKQAGFPPTLELFAILDKYQVEIPEEIRVKVIGLEAAWHHYLKRLGEADEM
> FT LDNNREEFKKILVQQAEKFKIILKEFLDDFFLKLPTSANINPRIALKFLRIIALKIEDC
> FT FTFEESLMRDLAVFNVNQPESIDLRKLDFEVRIVKNIWELIFEWQTNWEGWKKGYFWKM
> FT NINEMEDTALNLYKEFTTLNKKFYDRHWEMLEATTKNVDSFRRTLPLITALKNPCMRER
> FT HWNRVRDVIHVNFDENSKNFTLELIINLDFQAFSEDIQDISNPATMELQIENSIKNIAT
> FT IGKNKVLKCFYHDGIYRIKNVEDCFQLLEEHMVQISAMKATRFVEPFITIVDYWEKTLS
> FT YISETLEKGLTVQRQWLYLENIFQGDDIRKQLPEEAKRFATITEEFRTISSKMFQAKTA
> FT VKATNLRPPPFLLNRFSRMDERLELIQRALEIYLEAKRQLFPRFYFISNDDLLEILGNS
> FT KRPDLVQTHLKKLFDNLYKLELKRVGKTLSRWQASGMHSDDGEYVEFMMVIYIDGPSER
> FT WLKQVEEYMLVVMKEMLKLTRGSLKKLVGNREKWISLWPGQMVLTTAQIQWTTECTRSL
> FT IHCSMVDQKKPLRKLKKKQIKVLSKLSEMSRKDLTKTMRLKVNTLITLEIHGRDVIERM
> FT YKSNCKDTGHFEWFSQLRFYWHRESELCVIRQTNTEHWGCFDEFNRINIEVLSVVAQQI
> FT MSIMAALSTKALELMFEGQMIKLKHTVGLFITMNPGYAGRTELPDNLKSMFRPISMMVP
> FT DNIIIAENLLFSDGFTNTRNLARKVYTLYELAKQQLSKQYHYDFGLRSMVALLRYAGRK
> FT RRQLPNTTEEEIVYLAMKDMNVARLTANDLPLFNGIMSDIFPGVSLPTIDYSEFNIAIY
> FT EEFREAGLQPITIAVKKVIELFETKNSRHSVMIIGDTGTAKSVTWRTLQNCFYRMNSQR
> FT FSGWEAVTVYPVNPKALNLAELYGEYNLSTGEWLDGVLSSIMRIICGDEEPTQKWLLFD
> FT GPVDAVWIENMNSVMDDNKLLTLVNSERITMPVQVSLLFEVGDLAVASPATVSRCGMVY
> FT NDYNDWGWKPFVNSWLQRLRIKEFADFLRIHFDYMVPKILDFKRMRCKEPVRTNELNGV
> FT VSLCKLLEIFGTKVNGINPINLELLEEMTRLWFMFCLVWSICSSVDEDSRQRLDSFIRE
> FT LESCFPIKDTVFDYFVDPNERTFLPWDSKLLSSWKCDFESPFYKIIVPTGDTVRYEYVV
> FT SKLLAEEYPVMLVGNVGTGKTSTAISVMEACDKNKFCILAVNMSAQTTAAGLQESIENR
> FT TEKRTKTQFVPIGGKRMICFMDDFNMPAKDIYGSQPPLELIRQWIDYKYWFNRKTQQKI
> FT YVQNTLLMAAMGPPGGGRQTISSRTQSRFVLLNLTFPSQETIIRIFGTMLCQKLESYPN
> FT EVREMWLPITLCTINLYVSMISKMLPTPNKSHYLFNLRDISKVFQGLLRSEKELQNKKN
> FT FFLRLWVHECFRVFSDRLVDDSDQFWFVNTINDILGKHFEVTFHSLCPSKVPPFFGDFA
> FT HPQGFYEDLQVDFLRTFMKNQLEEYNNFPGMTRMNLVFFREAIEHIVRILRVISQPRGH
> FT ILNMGIGGSGRQVLTKLAAFILEMAVFQIEVTKKYKTGDFREDLKNLYKVTGIKQRLTI
> FT FIFSSDQIAEVSFLEITNNMLSTGEINLFKSDEFDELKPELERPAKKNGVLLTTEALYS
> FT YFILNVRDFLHVALCFSPIGENFRSYIRQYPALLSSTTPNWFRFWPQEALLEVASHFLI
> FT GFPLNVVVSGKEDEKHRESLVISTEAILQRDIAYVFSVIHSSVAKMSENMYAEVKRYNY
> FT VTSPNYLQLVSGFKKLLEKKRLEVSTASNRLRNGLSKISETQEKVSLMSEELKASSEQV
> FT KILARECEDFISMIEIQKSEATEQKEKVDAEAVLIRRDEIICLELAATARADLEVVMPM
> FT IDAAVKALDALNKKDISEVKSYGRPPMKIEKVMEAVLILLGKEPTWENAKKVLSESTFL
> FT NDLKNFDRDHISDKTLKRIAIYTKNPELEPDKVAVVSLACKSLMQWIMAIENYGKVYRI
> FT VAPKQEKLDSAMKSLEEKQAALAAAKKKLEELQVVIEELYRQLEEKTNLLNELRAKEER
> FT LRKQLERAIILVESLSGERERWIETVNQLDLSFEKLPGDCLLSVAFMSYLGAFDTKYRE
> FT ELLVKWSLLIKDLLIPATLELKVTYFLVDAVSIREWNIQGLPADDLSTENGVIVTQGSR
> FT WPLIIDPQMQANNWIKNMEERNQLMTLDFGMADYLRQLERALKEGLPVLLQNVGEYLDQ
> FT AINPILRQSFTIQSGERLLKFNDKYISYNNSFRFYITTKISNPHYPPEISSKTTIVNFA
> FT LKQDGLEAQLLGIIVRKEKPALEEQKDELVMTIARNKRTLIDLDNEILRLLNESRGSLL
> FT DDDELFSTLQKSRQTSVLVKESLSIAEVTEVEIDAARQEYKPASERASILFFVLMDMSK
> FT IDPMYVFSLAAYILLFTQSIERSPRNQLVHERIQNINEYHSYAVYRNTCRGLFERHKLL
> FT FSIHMTAKILSNAGKLLEEEYDFILKGGIVLDKLGQAPNPAPWWISEQNWDNITELDKV
> FT SGFHGIIDSFEQHYKAWNGWYATTFPEQEDLVGEWNDKLTDFQKICVLRSLRPDRISFC
> FT LTQFIITKLGPRYVDPPVLDLKATFDESISQTPLIFVLSPGVDPAQSLISLSESVKMAQ
> FT RMYSLSLGQGQAPIATKLIMDGIKDGNWVFLANCHLSLSWMPTLDKMIATMQSMKLHKK
> FT FRLWLSSSPHPDFPISILQTSIKMTTEPPRGIKSNMKRLYNNINEANMENCSEPSKYKK
> FT LLFALCFFHTVLLERKKFLELGWNVIYSFNDSDFEVSEILLLLYLNEYEDTPWGALKYL
> FT IAGVNYGGHITDDWDRRLLITYINQFFCDQALQTRKFRLSTLPNYFIPDDGDVQSYLDQ
> FT IQMFPNFDKPDAFGQHSNADIASLIGETRMLFEALLSMQVQTNSTSSNENGETKVFDLA
> FT KEILMNTPDEINYEQTAKIIGINRTPLEVVLLQEIERYNKLLVDMSTQLRDLRRGIQGL
> FT VVMSSDLEDIYLAVSEGRVPLQWLKAYNSLKPLAAWARDLIHRVGHFNSWAKTLRPPIL
> FT FWLAAYTFPTGFVTAVLQTSARATKTPIDELSWDFYVFVEEDTAAARIIREGGGVYIRS
> FT LFLEGGGWLRKNQCLQDPLPMELICPLPVIHFKPVENLKKRCRGVYQCPAYYYPVRSGS
> FT FVIAVDLKSGNEKADYWIKRGTALLLSLAN"
>
>
> Which gives:
>
>> FBtr0300022__FBgn0001313
> LQGLNAQLDQVDVQQIIRVLRSTHSVYIKQIDELIFESTHELMEAMENIKFLHLLMQPCSQLDFSESPTFVSQLIPRTIHLIRFIWLNSEQYNRRDLITGIFRNLSNQIIRFCTEKVNVEKILSGSSRFGIKICNMCIDCCLTYKGIYDIMSKTHAKINIRIGWSLDNAMIFNHVDAFMERLNDVIDICESMMVFGRLDESESIPKPQFGGTSGTEFEATADNVENEFLVTLTALCTDSKEIILNVHKNEWYEEVIKYRRTVQSMEETVQRLMSNVFQHICNVEEALESLNVMIFYSYRSTIRKTFLRQVSSAWVFFSNEIDSSVHMLMDRSKMHESWVPYYASRALGYRVHLDRLVWLCNRLNSSDWLPNVSEASVVLKKFESVRREFDKEVKKSFDEWQKNCCSLLLNQKLDRYLLIRSKKKKGLIECNIDRTILTICEQAQHFERLGLGVPGMVRKIYEKHETLRFVYNSVVQVCLNYNHILSALSEQERKLFRALIQACDRKIAPGVFKLTYGGELSDAYIADCAKHTNKLQETMDIYKRAIQNIARFCEKICDTPMLKFNFSGAVTISIFENHLSSYLRRVSNILRGFYSTITDLIFAVFKEFQAVIEDMPIEWYGFVNVFDDMLATAFLTSSKNSLNMLTNALHRDPDMAAAPILVMESDVRERCIVLTPDIDVIANLLSGYIDRIHNILEQFPRIGIKMKLPKEHQYESFSKAFLEDSESTQLICNIEAEINHEREEIDGYITFWNSHRMLWETTELEFTKRVKATQMTADIFEASIEYYSAMADDISYVDAITHVYFILMNQNYIKSSILDCIEKWQALNIKILLSHSFSLIRAIYRYMRKNERKMMMVPRTLKESLLAKQFFERIINEVPLKQAGFPPTLELFAILDKYQVEIPEEIRVKVIGLEAAWHHYLKRLGEADEMLDNNREEFKKILVQQAEKFKIILKEFLDDFFLKLPTSANINPRIALKFLRIIALKIEDC!
> FTFEESLMRDLAVFNVNQPESIDLRKLDFEVRIVKNIWELIFEWQTNWEGWKKGYFWKMNINEMEDTALNLYKEFTTLNKKFYDRHWEMLEATTKNVDSFRRTLPLITALKNPCMRERHWNRVRDVIHVNFDENSKNFTLELIINLDFQAFSEDIQDISNPATMELQIENSIKNIATIGKNKVLKWLLS*WYL*NKKR*GLFSAP*RTHGTNIGYESNSFC*AIYNHC*LLGKNTIVHK*DSGKGFNCSAPMALPRKYIPRRRHKKTTSRRGKTFCNNN*RVSNNIKQNVPGKDSRKSH*LTPSTVFIKPF*SNGRKTGTYSTCLRNLS*G*TTTFSKILFYF***PFRNFRKF*AAGLSSNPP*EVI**FIQA*AQARWENFKSVASFWNAFRRWRIC*VHDGYLYRWTIGALAKTSRRVHACCYERDA*TYSRIS*KTCREQRKMDFALARTNGANHSSDPMDN*VYA*PNSL*YG*SKKTPTQAKEKANKSSF*IIRNESKRPNKNNAP*SKYPHNA*NTWS*CYRKNV*IKL*GYGPF*MVFTTQILLAP*IGTMCNKADKHRALGMFR*V*SNKY*SALSRGTTNNVYNGSAFYKGVGAYVRGSNDKVKAHSWSIHYYESWICRTD*TS**FKVNV*THINDGT**YNYCGKFTFFGWFY*YKKLGPKGIYVV*AG*AATFKAISL*FWSSLNGGFASLRGSKKTSITKYY*RRNCLFGNERYECCEINS**FTPF*WYYV*HISWC*LTNYRLQ*I*YCDL*RI*GGGSPTNYHSRKKSN*AF*NKKL*ALSYDHRGYGNSQISYMENITKLFLSNE*SKIFRMGSSHRLPSKSKSIESSRALWGIQLVDW*MA*RSFKFYYANNLWR*RADSEMVVV*WTCGCSMD*KHELSNG***TSYACK*RTYNHASSSIAIV*SRRPGCCFTSNCFPMWNGL*RLQ*LGMETFCKLMVTAPKN*GVR*FFTNTF*LHGAKNTGF*T!
> NEVQRACKDK*VKWSCVAL*IARNIWHKGKWDKSH*FRTS*GDD*IVVYVLFSMVNLFKCG*RQSPKTR*
> LYTGTRKLLSNKRYCV*LFCGSQ*TNLFTMG*QAVEQLEMRFRISFLQDYCSYWRHCSL*ICCFKTSC*RISCDACWKCWYRKNVNGYKCNGGL**K*ILHFSCEHVSTDNSSRVTRINRKSD*ETYKNAICTYRWQTDDMFYGRL*YACKRHLWISATFGAYSAMDRLQVLV**KNSTKNICAKHIINGCDGTAWRGQTNNFQSNSKSVCFIKLNFSFTRNNYSHIWNDALSKTRVIPK*SS*DVATYNPLYH*LICIND**NVTDAK*ISLLI*S*RYIQSLSRTIKK*KRTSKQKKFFFTALGS*VF*SVQRPIG*RLRSVLVCKYY**YTW*TF*SYFSQSLSFKGSTIFR*LCSPSRVLRRSTGRFLKNIYEKST*GI*QLSRND*NEPSIF*RSYRTYCSNPESYFPTAWTHFKYGDRWIRPTSINQVSCVYFGNGSFPN*GYQKIQNRRLSRRPKKLIQSNWN*TETNDFYI*QRPNSRSLISRNNKQYAKYWRNKLI*IR*IRRAKA*T*TPGKKKWGSANN*STIFLFYFKCARLPACCALF*PNRRKFSKLYKTISGFVKFNNSKLV*ILATRSPFGSSFAFSNRISIKRSGFWKRGRKTSRKFGYKHRSHSSTRYCLCIFSNSLKCC*NVGKYVCRS*AL*LCNLTKLFAACKWF*KTIRKEKIRSINCFQ*ITQWAFKNF*NSGKSILNVRRA*S*L*TS*NTC*RM*RFYIHD*NSKE*SNGTKGKSGCRSRAY*KG*NNLS*ISSYSSCGLGGGNAYDRCCCKSIRCIE*ERHFRS*IIWTAANENRKGYGSCIDLTWKRTNMGKC*KSFK*INIFERPKKL**RSYFR*NS*TYCNLYKKS*VRAR*SGCCIACVQIIDAMDNGHRKLRKSLPNSRSKAGKIR*CNEVT*RKASCFSCGKKKTRRASGCH*RTLPAA*RKN*PS**IACQGRTT*KATGACHYFGRIAFWRERKVD*NG*SVGLIL*KTSR*LL!
> AFCCVYVVLRGF*HQIPRRITCKMVFIN*RSFNTSNFGA*GYVFSSRCCFDSRMEYSRSTC**FKY*KRSNSYSR*SLASYY*PSNAS**LDKKYGRA*SINDTRFRYGRLLTSARTSSKRRFACIVAKRGGILRSSY*SNFAAELYHSKWRKVIKI**QVYFIQ*FVQILHNDKNIKSTLPTGNLIKNYYCKFCTKARWA*SPTTRNYCSKRKTRPRRTKRRTGNDNSSKQTDINRSR**DSTAT**KSRFLIR*R*VIFNFTKIPSDISAS*GVA*HCRGN*SRN*CGPTRIQTSIGTRIHFILCFNGYV*N*SNVCFFSGSVYIIIHTVY*AKSS*SASP*KNSKY**ISFLCGLPKYLSWAFRAT*ATIFNSYDSKDSFKRWKAFGRRV*FYSERRYSIR*TGTSAQPGTMVDK*AKLG*YNRIR*SFWISWDNRFF*ATLQGLEWLVCHDLPRTRRSRWRME**TYRFSKNLCFTFTSTG*NFFLFDTIYYYQTWASIC*SASS*SQGNF**IDFTDSPHIRIITRCRSSPISHITIRIS*NGTTNVLT*LGSRTSTYCNKAYNGWHQGW*LGIFSKLSFVS*LDAYS*QDDSHYAVHETT*KISTVAKLKPSSGLSNIYFANQY*DDN*TSSWNQIKYETSI*QHK*G*YGKL**TQQV*EVIIRFVLLSYSPTRTKKIFRTWLECYLQL*RF*F*SFRNTTIIVS**I*RHSLGSFKVSHSRSKLRRTHYRRLGSPTINNLYKPIFL*PSIAD*KV*IINPSKLFYSR*RRCAIIFRPNTNVSQF**A*CFWTTFKCRYSVINRRNKNAF*GSAFYASPD**HK***KR*DKSI*SR*RNFNEYTG*DKL*TDGKNYWNQSNSLRSCLTSRN*AL**TSR*HVHSIT*LKTWNTGTCCNEFGLRGYLSSCL*RKGAITMVKSI*FIETISGMG*RLNTSCRTF**LGENTPPSNIILACSLHVSNWICYSSTTNFSSSYQN!
> TN**TLLGFLCFC*RRYCRSSYNKGRRRRLHSKFVFGGWRMVEEKPMPSGSTTDGTNLSITSNTL*ASRK
> PKKTMSWCLPVSRILLSR*VRIICNSRGLKVW**KG*LLDKARYCTFIKFSKL
>
>
> The start is good, but then it gets bad...
> The problem seems the same as this old one:
> http://bioperl.org/pipermail/bioperl-l/2004-August/016735.html
>
> I must be missing something....
>
> Thanks for your help!
>
> --
> Tristan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list