[Bioperl-l] *major* error in genbank parser or am i just insane?
Jason Stajich
jason@cgt.mc.duke.edu
Tue, 6 Aug 2002 18:13:32 -0400 (EDT)
I would really love it for someone to do an overhaul on this, if you have
use cases which break for you then something is wrong. I think the
Location objects were messed with recently, and I don't remember how
we were setting strand originally or in FTHelper.
The regular expressions currently cannot parse all possible cases of crazy
join(complement(..)) and would need to be addressed with a grammar or a
more formal regular expression.
-jason
On Tue, 6 Aug 2002, Chris Mungall wrote:
>
> maybe i'm just hugely confused about split locations, but i think there is
> something deeply terribly wrong with how the genbank parser is dealing
> with revcomped split locations.
>
> it seems that if you parse this
>
> mRNA complement(join(1..100,201..200,
>
> then use a seqio stream of format genbank to spit it out again you get
> this:
>
> mRNA join(1..100,201..200,
>
> which is highly disturbing
>
> looking at FTHelper it seems that when a split location object is created,
> the strand is set in the parent splitlocation, but not in the individual
> simple sublocations.
>
> I'm about to commit a fix for this, but I just need a sanity check first:
> surely this is one of the most commonly used modules in bioperl? someone
> would have noticed this by now? I mean this is 50% of mRNAs on a genomic
> entry, and orientation is kind of important in the scheme of things. Elia
> - haven't you populated a biosql instance from genbank? Didn't all your
> mRNAs come out on the forward strand? Or were you only doing cDNA records?
>
> Looking at SeqIO, it seems to test for a similar case, using
> testfuzzy.genbank (although this seems to be a weird made up example of a
> different case altogether) and this doesn't round trip correctly.
>
> I'm not sure if this is a recently introduced bug (surely it must be?) or
> something that's been around for a while. i only tested 1.0.2 and the main
> cvs branch.
>
> hmm, not sure if i know how to fix this - i can set the strand of the
> sublocations but this has weird results when you try and export in genbank
> format again. i'll go ahead and commit this anyway sometime today unless
> anyone has any objections, as it can't be worse than the current handling.
>
> i committed a test datafile in t/data/revcomp_mrna.gb
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu