[Bioperl-l] Fuzzy Locations and GenBank

Chris Fields cjfields at uiuc.edu
Mon Aug 21 17:44:31 UTC 2006


Glad to be the bearer of good news!

 

Chris

 

  _____  

From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of
Lincoln Stein
Sent: Monday, August 21, 2006 12:34 PM
To: Chris Fields
Cc: Steve Chervitz; Hilmar Lapp; Bioperl List
Subject: Re: [Bioperl-l] Fuzzy Locations and GenBank

 

I am tempted to start dancing around my office singing "Ding dong the fuzzy
feature is dead!" Break out the champagne!!

Lincoln

On 8/21/06, Chris Fields <cjfields at uiuc.edu> wrote:

Steve

There is this the EMBL Release 87 notes:

http://www.ebi.ac.uk/embl/Documentation/Release_notes/current/relnotes.html 

..
2 CHANGES IN THIS RELEASE

2.1 Changes to the Feature Table Document: Chapter 3.5 "Location"

The use of range (.) descriptor within location spans is no longer legal.
..

So, yes, looks like EMBL is doing the same thing.  I am guessing DDBJ is
also.

I didn't see anything in the recent revision for the INSDSeqXML DTD, but I
don't think a change in the DTD would be needed to accommodate the removal 
of 'fuzzy' locations of X.Y type, unless the DTD has specific rules on how
to format fuzzy location data.  Same for the other formats (EMBLXML, etc) as
the change is rather small (but very significant).

I'm guessing changes to other formats (game, etc) that rely on GenBank/EMBL 
will occur if they specifically deal with these in some way.

It is nice to know that that BioPerl won't be seriously affected by this.
As you noted, we'll have to keep X.Y fuzzy functionality around to
accommodate legacy data, but should we add warnings for this? 

Chris


> -----Original Message-----
> From: Steve Chervitz [mailto:sac at open-bio.org]
> Sent: Sunday, August 20, 2006 10:56 PM
> To: Hilmar Lapp
> Cc: Chris Fields; Bioperl List
> Subject: Re: [Bioperl-l] Fuzzy Locations and GenBank
>
> Ah, one of the banes of bioinformatics data modeling is finally being
> laid to rest. Those who have struggled with it (myself included) 
> should not let this occasion pass without notice. Here are some
> reflections.
>
> Check out the captions under photo's #2 and 3 here:
> http://gallery.open-bio.org/gallery2/v/hackathon2002/dagphotos/?
> g2_page=7
>
> Isn't it fitting, now that the open-bio.org toolkits have systems in
> place to deal with fuzzy locations, the NCBi says, "well, their not 
> really used all that much, and so are not worth the trouble". This is
> perhaps something we all knew in our hearts, but nevertheless felt
> compulsion to tackle anyway, right?
>
> The amount of fuzzy location-related cycles the open-bio community 
> has collectively burned over the years perhaps isn't for naught:
> There will still be legacy data to deal with, and perhaps other
> feature annotation data models still use them. EMBLxml does. I know 
> DAS/2 does not and has no plans to, and looks like GAME XML also does
> not. Anyone else?
>
> I imagine EMBL and DDBJ will follow suit in banishing fuzzy locations
> as well. Anyone know?
>
> Steve
>
> On Aug 18, 2006, at 9:08 PM, Hilmar Lapp wrote:
>
> > Great, the fewer fuzzy locations the better. -hilmar
> >
> > On Aug 19, 2006, at 12:03 AM, Chris Fields wrote: 
> >
> >> Don't know how much this will affect Bio::Location::Fuzzy, but I
> >> thought it might be worth a heads-up in case something pops up:
> >>
> >>  From the latest GenBank release ( 154.0):
> >>
> >> ...
> >>
> >> 1.4.6 Feature location syntax X.Y to be discontinued
> >>
> >>    The Feature Table currently supports feature locations of the 
> >> format X.Y, to represent a base position which is greater or
> >> equal to X, and less than or equal to Y. For example:
> >>
> >>    misc_feature    1.10..20
> >>    misc_feature    join(100..150, 200.210..250)
> >>
> >>    In the first example, the misc_feature starts somewhere between
> >> bases 1 and 10 (inclusive), and ends at basepair 20. In the second,
> >> the 51 bases from 100..150 are joined together with a second basepair 
> >> interval, which could be anywhere from 200..250 to 210..250 .
> >>
> >>    Although this syntax seems like a reasonable way to capture an
> >> uncertain interval, it is used for features on a vanishingly small 
> >> number of sequence records, most database submission mechanisms
> >> don't support it, and the meaning of its use in a join() context
> >> is not entirely clear.
> >>
> >>    As of October 2006, this type of location will no longer be
> >> supported. Those records with features which utilize X.Y locations
> >> will be reviewed and converted to a non-uncertain format prior to 
> >> that date.
> >>
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> 
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > =========================================================== 
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l 




-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu 




More information about the Bioperl-l mailing list