[Bioperl-l] Re: *major* error in genbank parser or am i just insane?

Lincoln Stein lstein@cshl.org
Fri, 9 Aug 2002 16:06:22 -0400


I'll grant that CDS without exons can be dealt with semi-rationally, but if 
the exon and CDS entries are out of sync (i.e. they show different internal 
splice sites), then the parser should not try to resolve that!

The biggest problem with  Genbank entries is that the submitter sometimes says 
exon when he means CDS and sometimes CDS when he means exon.  I've also seen 
cases of intron and exon swapped.

Lincoln

On Friday 09 August 2002 02:38 pm, Lin, Xiaoying J. wrote:
> Lincoln,
>
> i agree that the code should not be do the guessing game for human
> mistake like out of sync mRNA + CDS joins.
>
> but for CDS features but no exon features, I am not sure I understand
> you correctly. there are lots submissions in Genbank, which only comes
> with CDS (join) features, but no separate exon features. If that is a
> mistake, it is a systematic mistake then. How does the current parser
> handle a record like
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=nucleotide
> &list_uids=1458097&dopt=GenBank
>
> I have not finished the older e-mails on this subject, so I may have
> missed something here.  thought everyone was busy having fun at
> Edmonton, when did you guys get time to flood everyone's e-mail box ;-).
>
>
> BTW, enjoyed your and other's talks at the BOSC.
>
> Thanks.
>
> Xiaoying
>
> > -----Original Message-----
> > From: Lincoln Stein [mailto:lstein@cshl.org]
> > Sent: Friday, August 09, 2002 1:28 PM
> > To: brian.king@animorphics.net; Brian King; Ewan Birney
> > Cc: bioperl-l@bioperl.org
> > Subject: Re: [Bioperl-l] Re: *major* error in genbank parser or am i
> > just insane?
> >
> >
> > Here's my 2c:
> >
> > If the genbank entry has CDS features but no exons, or an
> > mRNA join operator
> > which is out of sync with the CDS join, then in my opinion
> > the quality of the
> > annotation is so questionable that BioSQL should throw up its
> > hands and seek
> > human assistance in interpretation.  Asking the import
> > software to read the
> > minds of the submitters is beyond what can be reasonably
> > expected, and only
> > ends up propagating errors.
> >
> > Lincoln
> >
> > On Friday 09 August 2002 04:49 am, Brian King wrote:
> > > > This is very hard to do because you have to handle:
> > > >
> > > >
> > > >    (a) CDS with no Exons
> > > >
> > > > and, my particular favourite
> > > >
> > > >    (b) a mRNA join operator which is out of sync
> > > > with the CDS join
> > > > operator (!)
> > >
> > > For (a) I'd put generic sub-features in the CDS to
> > > hold the places of the presumed exons, and for (b) use
> > > generic sub-features for the CDS and the mRNA joins
> > > and just let them be out of sync.  I surrender on
> > > remote joins!  I'd keep the location string in
> > > documentation in the data, but not try to interpret
> > > it.  Ideally the parser would download the remote
> > > record, but...
> > >
> > > Regards,
> > > Brian
> > >
> > >
> > >
> > > __________________________________________________
> > > Do You Yahoo!?
> > > HotJobs - Search Thousands of New Jobs
> > > http://www.hotjobs.com
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> > --
> > ==============================================================
> > ==========
> > Lincoln D. Stein                           Cold Spring Harbor
> > Laboratory
> > lstein@cshl.org			                  Cold
> > Spring Harbor, NY
> > ==============================================================
> > ==========
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================