[Biojava-l] Genbank parsing problem

Cox, Greg gcox@netgenics.com
Wed, 1 May 2002 10:34:06 -0400


For our purposes, it's important to be able to reconstruct the Genbank
record from a BioJava sequence.  I wish Genbank didn't allow this
construction, but since it does we have to deal with it.  Even though this
isn't a BioJava-type feature, I'd rather see BioJava's definition changed to
fit Genbank/EMBL rather than vice-versa.  

Looking at the docs, I'd rather have this mapped to a fuzzy point location,
and relax the restriction on where features can be constructed.  

Greg Cox

> -----Original Message-----
> From: Schreiber, Mark [mailto:mark.schreiber@agresearch.co.nz]
> Sent: Tuesday, April 30, 2002 5:33 PM
> To: Thomas Down; Simon Foote
> Cc: biojava-l@biojava.org
> Subject: RE: [Biojava-l] Genbank parsing problem
> 
> 
> To my mind a wholey remote feature is not really a Feature in the
> biojava sense and might be best handled as an Annotation. Perhaps a
> special kind of value (with a nice toString() method) could be
> constructed for it.
> 
> - Mark
> 
> 
> > -----Original Message-----
> > From: Thomas Down [mailto:td2@sanger.ac.uk] 
> > Sent: Wednesday, 1 May 2002 3:26 a.m.
> > To: Simon Foote
> > Cc: biojava-l@biojava.org
> > Subject: Re: [Biojava-l] Genbank parsing problem
> > 
> > 
> > On Tue, Apr 30, 2002 at 09:12:59AM -0400, Simon Foote wrote:
> > > I've recently run across a problem with parsing of Genbank files
> > > containing unbounded locations.
> > > Anyone have any idea what's causing it.  I tried to trace it back 
> > > through but got lost.  But I think it has to do with the 
> > single <1 for 
> > > the -35_signal as shown in the example.
> > >
> > >      -35_signal      <1
> > >                       /gene="entD"
> > 
> > The default Feature implementations in the BioJava 
> > development tree explicitly forbid construction of Features 
> > with locations which aren't contained by the sequence to 
> > which they're attached. As a quick fix, you can just remove 
> > the check from the constructor of 
> > org.biojava.bio.seq.impl.SimpleFeature (lines 281--283 in my copy).
> > 
> > I'm not sure what the proper solution for this problem is.  
> > Normally, features which extend beyond the sequence can be 
> > transformed into RemoteFeatures.  However, this particular 
> > feature is nasty in that it doesn't even partially overlap 
> > the sequence.  To my mind, it's actually pretty much 
> > meaningless, and the best thing to do would be to drop it.  
> > But some people like to be able to represent the whole of Genbank.
> > 
> > Does anyone know how many more `wholly remote' features there 
> > are in the databases?  And any great ideas about how they 
> > could be usefully represented?
> > 
> >    Thomas.
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l@biojava.org 
> > http://biojava.org/mailman/listinfo/biojava-l
> > 
> ==============================================================
> =========
> Attention: The information contained in this message and/or 
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or 
> privileged
> material. Any review, retransmission, dissemination or other 
> use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by 
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> ==============================================================
> =========
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>