[Bioperl-l] Re: [Bioperl-guts-l] Notification: incoming/930 (fwd)
gert thijs
gert.thijs@esat.kuleuven.ac.be
Wed, 28 Mar 2001 18:02:59 +0200
I have been trying to solve this problem for a few days now and I think I
finally have found the real cause.
I have written a script that first downloads a genbank entry and than writes
this entry to a flat file using Bio::SeqIO. If I compare the original entry
and the one written in the file there is a important difference. Here is an
example of the sequence with accession umber AP000413, where I have selected
the description of the second CDS.
In the original sequence the location is given as
complement(join(6390..6420,...)), while in the written entry the same location
is represented as: complement(join(complement(6390..6420),...))
Parsing the original sequence is no problem, but when reading the sequence
from flat file an warning message is printed:
-------------------- WARNING ---------------------
MSG: unable to parse location successfully out of
complement(join(complement(60316..60427), ignoring feature (seqid=AP000413)
---------------------------------------------------
Original entry:
--
CDS complement(join(6390..6420,6562..6717,6805..6929,
7023..7155,7233..7363,7518..7682,7771..7863,8124..8391,
8467..8595,8778..8857,8947..9015,9102..9218,9350..9418,
9888..9941,10054..10091,10184..10295,10447..10569,
10651..10678,10938..10994,11149..11243))
/note="gb|AAF34831.1
gene_id:K7L4.2"
/codon_start=1
/evidence=not_experimental
/product="MAP kinase"
/protein_id="BAB02151.1"
/db_xref="GI:9294249"
/translation="MDDVAGLQEAAGARFSQIELIGRGSFGDVYKAFDKDLNKEVAIK
VIDLEESEDEIEDIQKEISVLSQCRCPYITEYYGSYLHQTKLWIIMEYMAGGSVADLL
QSNNPLDETSIACITRDLLHAVEYLHNEGKIHRDIKAANILLSENGDVKVADFGVSAQ
LTRTISRRKTFVGTPFWMAPEVIQNSEGYNEKADIWSLGITVIEMAKGEPPLADLHPM
RVLFIIPRETPPQLDEHFSRQVKEFVSLCLKKAPAERPSAKELIKHRFIKNARKSPKL
LERIRERPKYQVKEDEETPRNGAKAPVESSGTVRIARDERSQGAPGYSFQGNTVKNAG
WDFTVGGSQSIGTVRALKPPQARERRQEVSPNRISQRTTRPSGNQWSSATGSTISEAS
EGGFVRRHPFQNDHEDGFHEEDDSSLSGSGTVVIRTPRSSQSSSVFREPSSGSSGRYA
AFDDASASGTVVVRGQYDDSGSPRTPKSRLGIQERTSSASEDSNANLAEAKAALDAGF
RRGKARERLGMGNNNNDGKVNRRREQMADDSDYSRNSGDKSSKQKVVPRSEQVSDEED
DSIWESLPASLSVLLIPSLKEALGDDSKESTVRTVSRSLVMMEREKPGSCEAFVAKLI
ELLGSSKEASVKELHDMAVCVFAKTTPDNAENKMKQANKEFSSNTNVSPLGRFLLSRW
LGQSSRDL"
--
The same entry as written in the flatfile:
--
CDS complement(join(complement(6390..6420),6562..6717,
6805..6929,7023..7155,7233..7363,7518..7682,7771..7863,
8124..8391,8467..8595,8778..8857,8947..9015,9102..9218,
9350..9418,9888..9941,10054..10091,10184..10295,
10447..10569,10651..10678,10938..10994,11149..11243))
/product="MAP kinase"
/protein_id="BAB02151.1"
/codon_start=1
/translation="MDDVAGLQEAAGARFSQIELIGRGSFGDVYKAFDKDLNKEVAIKV
IDLEESEDEIEDIQKEISVLSQCRCPYITEYYGSYLHQTKLWIIMEYMAGGSVADLLQS
NNPLDETSIACITRDLLHAVEYLHNEGKIHRDIKAANILLSENGDVKVADFGVSAQLTR
TISRRKTFVGTPFWMAPEVIQNSEGYNEKADIWSLGITVIEMAKGEPPLADLHPMRVLF
IIPRETPPQLDEHFSRQVKEFVSLCLKKAPAERPSAKELIKHRFIKNARKSPKLLERIR
ERPKYQVKEDEETPRNGAKAPVESSGTVRIARDERSQGAPGYSFQGNTVKNAGWDFTVG
GSQSIGTVRALKPPQARERRQEVSPNRISQRTTRPSGNQWSSATGSTISEASEGGFVRR
HPFQNDHEDGFHEEDDSSLSGSGTVVIRTPRSSQSSSVFREPSSGSSGRYAAFDDASAS
GTVVVRGQYDDSGSPRTPKSRLGIQERTSSASEDSNANLAEAKAALDAGFRRGKARERL
GMGNNNNDGKVNRRREQMADDSDYSRNSGDKSSKQKVVPRSEQVSDEEDDSIWESLPAS
LSVLLIPSLKEALGDDSKESTVRTVSRSLVMMEREKPGSCEAFVAKLIELLGSSKEASV
KELHDMAVCVFAKTTPDNAENKMKQANKEFSSNTNVSPLGRFLLSRWLGQSSRDL"
/db_xref="GI:9294249"
/evidence="not_experimental"
/note="gb|AAF34831.1gene_id:K7L4.2"
--
Jason Stajich wrote:
>
> Gert, Petre :
>
> (Petre I *think* this is the same type of bug you were getting)
>
> I fixed this bug on the 07 branch and main trunk - added your suggested
> protection to test that the seqfeature was actually still existant (this
> was a bad code decision, I think the rationale was to die when an
> unparseable location line was read, but that is not very smart IMHO). I
> don't know if you are working off CVS or just downloading bioperl
> packages. If you are working off CVS, anonymous cvs will have these
> changes in about 3 hours. We can generate patches for you if you want to
> patch your local distribution, but CVS will be easier for you in the long
> run as we continue to shake out the bugs (assuming firewalls are not an
> issue for you).
>
> Please let me know if these changes do not fix your situation.
> -Jason
>
> On Wed, 21 Mar 2001, gert thijs wrote:
>
> > Jason Stajich wrote:
> > >
> > > Please send me the accession number of the offending genbank file.
> > >
> >
> > The accession number was AP000413
> >
> >
> > Gert Thijs
> >
>
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center
> http://www.chg.duke.edu/
--
+ Gert Thijs
+
+ email: gert.thijs@esat.kuleuven.ac.be
+ homepage: http://www.esat.kuleuven.ac.be/~thijs
+
+ K.U.Leuven
+ ESAT-SISTA
+ Kasteelpark Arenberg 10
+ B-3001 Leuven-Heverlee
+ Belgium
+ Tel : +32 16 32 18 84
+ Fax : +32 16 32 19 70