[Bioperl-l] Fuzzy Locations and GenBank

Chris Fields cjfields at uiuc.edu
Mon Aug 21 21:56:37 UTC 2006


I don't think the <1..30 type is going away this time around.  The only
changes noted for GenBank and EMBL were of the 1.10..30 variety (still
fuzzy, but within a region).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, August 21, 2006 3:05 PM
> To: lincoln.lstein at gmail.com
> Cc: Steve Chervitz; Chris Fields; Bioperl List
> Subject: Re: [Bioperl-l] Fuzzy Locations and GenBank
> 
> I'm not sure. It sounded more like it was the most rare variant.
> 
> Aside from Genbank, Swissprot used to use the sort of <1..30 location
> a lot for feature annotation when they only had a partial peptide,
> i.e., primarily in TREMBL. I'm not sure whether that's changed in
> Uniprot.
> 
> Also note that the location for insertion features uses (or no
> longer?) the 10.11 notation (for an insertion between bases 10 and
> 11). In Bioperl, that's a fuzzy location too. In this case though,
> you can I guess blame it on Bioperl for using the wrong coordinate
> system, as in reality there's nothing fuzzy about where the insertion
> is.
> 
> 	-hilmar
> 
> On Aug 21, 2006, at 3:18 PM, Lincoln Stein wrote:
> 
> > This was the most common variant, right?
> >
> > Lincoln
> >
> > On 8/21/06, Hilmar Lapp <hlapp at gmx.net> wrote:
> > Well, they're actually not dead yet. Just one variant died. I'm
> > hoping though that this is just a step on the road that indeed ends
> > in their death.
> >
> >         -hilmar
> >
> > On Aug 21, 2006, at 1:34 PM, Lincoln Stein wrote:
> >
> > > I am tempted to start dancing around my office singing "Ding dong
> > > the fuzzy
> > > feature is dead!" Break out the champagne!!
> > >
> > > Lincoln
> > >
> > > On 8/21/06, Chris Fields < cjfields at uiuc.edu> wrote:
> > >>
> > >> Steve
> > >>
> > >> There is this the EMBL Release 87 notes:
> > >>
> > >>
> > >> http://www.ebi.ac.uk/embl/Documentation/Release_notes/current/
> > >> relnotes.html
> > >>
> > >> ..
> > >> 2 CHANGES IN THIS RELEASE
> > >>
> > >> 2.1 Changes to the Feature Table Document: Chapter 3.5 "Location"
> > >>
> > >> The use of range (.) descriptor within location spans is no longer
> > >> legal.
> > >> ..
> > >>
> > >> So, yes, looks like EMBL is doing the same thing.  I am guessing
> > >> DDBJ is
> > >> also.
> > >>
> > >> I didn't see anything in the recent revision for the INSDSeqXML
> > >> DTD, but I
> > >> don't think a change in the DTD would be needed to accommodate the
> > >> removal
> > >> of 'fuzzy' locations of X.Y type, unless the DTD has specific
> > >> rules on how
> > >> to format fuzzy location data.  Same for the other formats
> > >> (EMBLXML, etc)
> > >> as
> > >> the change is rather small (but very significant).
> > >>
> > >> I'm guessing changes to other formats (game, etc) that rely on
> > >> GenBank/EMBL
> > >> will occur if they specifically deal with these in some way.
> > >>
> > >> It is nice to know that that BioPerl won't be seriously affected
> > >> by this.
> > >> As you noted, we'll have to keep X.Y fuzzy functionality around to
> > >> accommodate legacy data, but should we add warnings for this?
> > >>
> > >> Chris
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: Steve Chervitz [mailto:sac at open-bio.org]
> > >>> Sent: Sunday, August 20, 2006 10:56 PM
> > >>> To: Hilmar Lapp
> > >>> Cc: Chris Fields; Bioperl List
> > >>> Subject: Re: [Bioperl-l] Fuzzy Locations and GenBank
> > >>>
> > >>> Ah, one of the banes of bioinformatics data modeling is finally
> > >>> being
> > >>> laid to rest. Those who have struggled with it (myself included)
> > >>> should not let this occasion pass without notice. Here are some
> > >>> reflections.
> > >>>
> > >>> Check out the captions under photo's #2 and 3 here:
> > >>> http://gallery.open-bio.org/gallery2/v/hackathon2002/dagphotos/ ?
> > >>> g2_page=7
> > >>>
> > >>> Isn't it fitting, now that the open-bio.org toolkits have
> > systems in
> > >>> place to deal with fuzzy locations, the NCBi says, "well, their
> > not
> > >>> really used all that much, and so are not worth the trouble".
> > >>> This is
> > >>> perhaps something we all knew in our hearts, but nevertheless felt
> > >>> compulsion to tackle anyway, right?
> > >>>
> > >>> The amount of fuzzy location-related cycles the open-bio community
> > >>> has collectively burned over the years perhaps isn't for naught:
> > >>> There will still be legacy data to deal with, and perhaps other
> > >>> feature annotation data models still use them. EMBLxml does. I
> > know
> > >>> DAS/2 does not and has no plans to, and looks like GAME XML also
> > >>> does
> > >>> not. Anyone else?
> > >>>
> > >>> I imagine EMBL and DDBJ will follow suit in banishing fuzzy
> > >>> locations
> > >>> as well. Anyone know?
> > >>>
> > >>> Steve
> > >>>
> > >>> On Aug 18, 2006, at 9:08 PM, Hilmar Lapp wrote:
> > >>>
> > >>>> Great, the fewer fuzzy locations the better. -hilmar
> > >>>>
> > >>>> On Aug 19, 2006, at 12:03 AM, Chris Fields wrote:
> > >>>>
> > >>>>> Don't know how much this will affect Bio::Location::Fuzzy, but I
> > >>>>> thought it might be worth a heads-up in case something pops up:
> > >>>>>
> > >>>>>  From the latest GenBank release (154.0):
> > >>>>>
> > >>>>> ...
> > >>>>>
> > >>>>> 1.4.6 Feature location syntax X.Y to be discontinued
> > >>>>>
> > >>>>>    The Feature Table currently supports feature locations of the
> > >>>>> format X.Y, to represent a base position which is greater or
> > >>>>> equal to X, and less than or equal to Y. For example:
> > >>>>>
> > >>>>>    misc_feature    1.10..20
> > >>>>>    misc_feature    join(100..150, 200.210..250)
> > >>>>>
> > >>>>>    In the first example, the misc_feature starts somewhere
> > between
> > >>>>> bases 1 and 10 (inclusive), and ends at basepair 20. In the
> > >>>>> second,
> > >>>>> the 51 bases from 100..150 are joined together with a second
> > >>>>> basepair
> > >>>>> interval, which could be anywhere from 200..250 to 210..250 .
> > >>>>>
> > >>>>>    Although this syntax seems like a reasonable way to
> > capture an
> > >>>>> uncertain interval, it is used for features on a vanishingly
> > small
> > >>>>> number of sequence records, most database submission mechanisms
> > >>>>> don't support it, and the meaning of its use in a join() context
> > >>>>> is not entirely clear.
> > >>>>>
> > >>>>>    As of October 2006, this type of location will no longer be
> > >>>>> supported. Those records with features which utilize X.Y
> > locations
> > >>>>> will be reviewed and converted to a non-uncertain format
> > prior to
> > >>>>> that date.
> > >>>>>
> > >>>>>
> > >>>>> Christopher Fields
> > >>>>> Postdoctoral Researcher
> > >>>>> Lab of Dr. Robert Switzer
> > >>>>> Dept of Biochemistry
> > >>>>> University of Illinois Urbana-Champaign
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Bioperl-l mailing list
> > >>>>> Bioperl-l at lists.open-bio.org
> > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>
> > >>>>
> > >>>> --
> > >>>> ===========================================================
> > >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > >>>> ===========================================================
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> Bioperl-l mailing list
> > >>>> Bioperl-l at lists.open-bio.org
> > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >
> > >
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list