[Bioperl-l] *major* error in genbank parser or am i just insane?

Chris Mungall cjm@fruitfly.org
Tue, 6 Aug 2002 15:36:57 -0700 (PDT)


On Tue, 6 Aug 2002, Jason Stajich wrote:

> I would really love it for someone to do an overhaul on this, if you have
> use cases which break for you then something is wrong.  I think the
> Location objects were messed with recently, and I don't remember how
> we were setting strand originally or in FTHelper.

the semantics of start/end seem a bit odd too - one would expect start/end
on a split location to be the min/max from the individual sublocs, this
doesnt seem to be the case

> The regular expressions currently cannot parse all possible cases of crazy
> join(complement(..)) and would need to be addressed with a grammar or a
> more formal regular expression.

ok; this is just the standard case of a revcomped mrna, eg half the
features on a genbank NT record

> -jason
>
> On Tue, 6 Aug 2002, Chris Mungall wrote:
>
> >
> > maybe i'm just hugely confused about split locations, but i think there is
> > something deeply terribly wrong with how the genbank parser is dealing
> > with revcomped split locations.
> >
> > it seems that if you parse this
> >
> >  mRNA    complement(join(1..100,201..200,
> >
> > then use a seqio stream of format genbank to spit it out again you get
> > this:
> >
> >  mRNA    join(1..100,201..200,
> >
> > which is highly disturbing
> >
> > looking at FTHelper it seems that when a split location object is created,
> > the strand is set in the parent splitlocation, but not in the individual
> > simple sublocations.
> >
> > I'm about to commit a fix for this, but I just need a sanity check first:
> > surely this is one of the most commonly used modules in bioperl? someone
> > would have noticed this by now? I mean this is 50% of mRNAs on a genomic
> > entry, and orientation is kind of important in the scheme of things. Elia
> > - haven't you populated a biosql instance from genbank? Didn't all your
> > mRNAs come out on the forward strand? Or were you only doing cDNA records?
> >
> > Looking at SeqIO, it seems to test for a similar case, using
> > testfuzzy.genbank (although this seems to be a weird made up example of a
> > different case altogether) and this doesn't round trip correctly.
> >
> > I'm not sure if this is a recently introduced bug (surely it must be?) or
> > something that's been around for a while. i only tested 1.0.2 and the main
> > cvs branch.
> >
> > hmm, not sure if i know how to fix this - i can set the strand of the
> > sublocations but this has weird results when you try and export in genbank
> > format again. i'll go ahead and commit this anyway sometime today unless
> > anyone has any objections, as it can't be worse than the current handling.
> >
> > i committed a test datafile in t/data/revcomp_mrna.gb
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
>