[Bioperl-l] Fuzzy Locations and GenBank

Lincoln Stein lstein at cshl.edu
Mon Aug 21 17:34:10 UTC 2006


I am tempted to start dancing around my office singing "Ding dong the fuzzy
feature is dead!" Break out the champagne!!

Lincoln

On 8/21/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
> Steve
>
> There is this the EMBL Release 87 notes:
>
>
> http://www.ebi.ac.uk/embl/Documentation/Release_notes/current/relnotes.html
>
> ..
> 2 CHANGES IN THIS RELEASE
>
> 2.1 Changes to the Feature Table Document: Chapter 3.5 "Location"
>
> The use of range (.) descriptor within location spans is no longer legal.
> ..
>
> So, yes, looks like EMBL is doing the same thing.  I am guessing DDBJ is
> also.
>
> I didn't see anything in the recent revision for the INSDSeqXML DTD, but I
> don't think a change in the DTD would be needed to accommodate the removal
> of 'fuzzy' locations of X.Y type, unless the DTD has specific rules on how
> to format fuzzy location data.  Same for the other formats (EMBLXML, etc)
> as
> the change is rather small (but very significant).
>
> I'm guessing changes to other formats (game, etc) that rely on
> GenBank/EMBL
> will occur if they specifically deal with these in some way.
>
> It is nice to know that that BioPerl won't be seriously affected by this.
> As you noted, we'll have to keep X.Y fuzzy functionality around to
> accommodate legacy data, but should we add warnings for this?
>
> Chris
>
>
> > -----Original Message-----
> > From: Steve Chervitz [mailto:sac at open-bio.org]
> > Sent: Sunday, August 20, 2006 10:56 PM
> > To: Hilmar Lapp
> > Cc: Chris Fields; Bioperl List
> > Subject: Re: [Bioperl-l] Fuzzy Locations and GenBank
> >
> > Ah, one of the banes of bioinformatics data modeling is finally being
> > laid to rest. Those who have struggled with it (myself included)
> > should not let this occasion pass without notice. Here are some
> > reflections.
> >
> > Check out the captions under photo's #2 and 3 here:
> > http://gallery.open-bio.org/gallery2/v/hackathon2002/dagphotos/?
> > g2_page=7
> >
> > Isn't it fitting, now that the open-bio.org toolkits have systems in
> > place to deal with fuzzy locations, the NCBi says, "well, their not
> > really used all that much, and so are not worth the trouble". This is
> > perhaps something we all knew in our hearts, but nevertheless felt
> > compulsion to tackle anyway, right?
> >
> > The amount of fuzzy location-related cycles the open-bio community
> > has collectively burned over the years perhaps isn't for naught:
> > There will still be legacy data to deal with, and perhaps other
> > feature annotation data models still use them. EMBLxml does. I know
> > DAS/2 does not and has no plans to, and looks like GAME XML also does
> > not. Anyone else?
> >
> > I imagine EMBL and DDBJ will follow suit in banishing fuzzy locations
> > as well. Anyone know?
> >
> > Steve
> >
> > On Aug 18, 2006, at 9:08 PM, Hilmar Lapp wrote:
> >
> > > Great, the fewer fuzzy locations the better. -hilmar
> > >
> > > On Aug 19, 2006, at 12:03 AM, Chris Fields wrote:
> > >
> > >> Don't know how much this will affect Bio::Location::Fuzzy, but I
> > >> thought it might be worth a heads-up in case something pops up:
> > >>
> > >>  From the latest GenBank release (154.0):
> > >>
> > >> ...
> > >>
> > >> 1.4.6 Feature location syntax X.Y to be discontinued
> > >>
> > >>    The Feature Table currently supports feature locations of the
> > >> format X.Y, to represent a base position which is greater or
> > >> equal to X, and less than or equal to Y. For example:
> > >>
> > >>    misc_feature    1.10..20
> > >>    misc_feature    join(100..150,200.210..250)
> > >>
> > >>    In the first example, the misc_feature starts somewhere between
> > >> bases 1 and 10 (inclusive), and ends at basepair 20. In the second,
> > >> the 51 bases from 100..150 are joined together with a second basepair
> > >> interval, which could be anywhere from 200..250 to 210..250 .
> > >>
> > >>    Although this syntax seems like a reasonable way to capture an
> > >> uncertain interval, it is used for features on a vanishingly small
> > >> number of sequence records, most database submission mechanisms
> > >> don't support it, and the meaning of its use in a join() context
> > >> is not entirely clear.
> > >>
> > >>    As of October 2006, this type of location will no longer be
> > >> supported. Those records with features which utilize X.Y locations
> > >> will be reviewed and converted to a non-uncertain format prior to
> > >> that date.
> > >>
> > >>
> > >> Christopher Fields
> > >> Postdoctoral Researcher
> > >> Lab of Dr. Robert Switzer
> > >> Dept of Biochemistry
> > >> University of Illinois Urbana-Champaign
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >
> > > --
> > > ===========================================================
> > > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > > ===========================================================
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu



More information about the Bioperl-l mailing list