[Bioperl-l] Fuzzy Locations and GenBank
Lincoln Stein
lstein at cshl.edu
Mon Aug 21 19:18:52 UTC 2006
This was the most common variant, right?
Lincoln
On 8/21/06, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> Well, they're actually not dead yet. Just one variant died. I'm
> hoping though that this is just a step on the road that indeed ends
> in their death.
>
> -hilmar
>
> On Aug 21, 2006, at 1:34 PM, Lincoln Stein wrote:
>
> > I am tempted to start dancing around my office singing "Ding dong
> > the fuzzy
> > feature is dead!" Break out the champagne!!
> >
> > Lincoln
> >
> > On 8/21/06, Chris Fields <cjfields at uiuc.edu> wrote:
> >>
> >> Steve
> >>
> >> There is this the EMBL Release 87 notes:
> >>
> >>
> >> http://www.ebi.ac.uk/embl/Documentation/Release_notes/current/
> >> relnotes.html
> >>
> >> ..
> >> 2 CHANGES IN THIS RELEASE
> >>
> >> 2.1 Changes to the Feature Table Document: Chapter 3.5 "Location"
> >>
> >> The use of range (.) descriptor within location spans is no longer
> >> legal.
> >> ..
> >>
> >> So, yes, looks like EMBL is doing the same thing. I am guessing
> >> DDBJ is
> >> also.
> >>
> >> I didn't see anything in the recent revision for the INSDSeqXML
> >> DTD, but I
> >> don't think a change in the DTD would be needed to accommodate the
> >> removal
> >> of 'fuzzy' locations of X.Y type, unless the DTD has specific
> >> rules on how
> >> to format fuzzy location data. Same for the other formats
> >> (EMBLXML, etc)
> >> as
> >> the change is rather small (but very significant).
> >>
> >> I'm guessing changes to other formats (game, etc) that rely on
> >> GenBank/EMBL
> >> will occur if they specifically deal with these in some way.
> >>
> >> It is nice to know that that BioPerl won't be seriously affected
> >> by this.
> >> As you noted, we'll have to keep X.Y fuzzy functionality around to
> >> accommodate legacy data, but should we add warnings for this?
> >>
> >> Chris
> >>
> >>
> >>> -----Original Message-----
> >>> From: Steve Chervitz [mailto:sac at open-bio.org]
> >>> Sent: Sunday, August 20, 2006 10:56 PM
> >>> To: Hilmar Lapp
> >>> Cc: Chris Fields; Bioperl List
> >>> Subject: Re: [Bioperl-l] Fuzzy Locations and GenBank
> >>>
> >>> Ah, one of the banes of bioinformatics data modeling is finally
> >>> being
> >>> laid to rest. Those who have struggled with it (myself included)
> >>> should not let this occasion pass without notice. Here are some
> >>> reflections.
> >>>
> >>> Check out the captions under photo's #2 and 3 here:
> >>> http://gallery.open-bio.org/gallery2/v/hackathon2002/dagphotos/?
> >>> g2_page=7
> >>>
> >>> Isn't it fitting, now that the open-bio.org toolkits have systems in
> >>> place to deal with fuzzy locations, the NCBi says, "well, their not
> >>> really used all that much, and so are not worth the trouble".
> >>> This is
> >>> perhaps something we all knew in our hearts, but nevertheless felt
> >>> compulsion to tackle anyway, right?
> >>>
> >>> The amount of fuzzy location-related cycles the open-bio community
> >>> has collectively burned over the years perhaps isn't for naught:
> >>> There will still be legacy data to deal with, and perhaps other
> >>> feature annotation data models still use them. EMBLxml does. I know
> >>> DAS/2 does not and has no plans to, and looks like GAME XML also
> >>> does
> >>> not. Anyone else?
> >>>
> >>> I imagine EMBL and DDBJ will follow suit in banishing fuzzy
> >>> locations
> >>> as well. Anyone know?
> >>>
> >>> Steve
> >>>
> >>> On Aug 18, 2006, at 9:08 PM, Hilmar Lapp wrote:
> >>>
> >>>> Great, the fewer fuzzy locations the better. -hilmar
> >>>>
> >>>> On Aug 19, 2006, at 12:03 AM, Chris Fields wrote:
> >>>>
> >>>>> Don't know how much this will affect Bio::Location::Fuzzy, but I
> >>>>> thought it might be worth a heads-up in case something pops up:
> >>>>>
> >>>>> From the latest GenBank release (154.0):
> >>>>>
> >>>>> ...
> >>>>>
> >>>>> 1.4.6 Feature location syntax X.Y to be discontinued
> >>>>>
> >>>>> The Feature Table currently supports feature locations of the
> >>>>> format X.Y, to represent a base position which is greater or
> >>>>> equal to X, and less than or equal to Y. For example:
> >>>>>
> >>>>> misc_feature 1.10..20
> >>>>> misc_feature join(100..150,200.210..250)
> >>>>>
> >>>>> In the first example, the misc_feature starts somewhere between
> >>>>> bases 1 and 10 (inclusive), and ends at basepair 20. In the
> >>>>> second,
> >>>>> the 51 bases from 100..150 are joined together with a second
> >>>>> basepair
> >>>>> interval, which could be anywhere from 200..250 to 210..250 .
> >>>>>
> >>>>> Although this syntax seems like a reasonable way to capture an
> >>>>> uncertain interval, it is used for features on a vanishingly small
> >>>>> number of sequence records, most database submission mechanisms
> >>>>> don't support it, and the meaning of its use in a join() context
> >>>>> is not entirely clear.
> >>>>>
> >>>>> As of October 2006, this type of location will no longer be
> >>>>> supported. Those records with features which utilize X.Y locations
> >>>>> will be reviewed and converted to a non-uncertain format prior to
> >>>>> that date.
> >>>>>
> >>>>>
> >>>>> Christopher Fields
> >>>>> Postdoctoral Researcher
> >>>>> Lab of Dr. Robert Switzer
> >>>>> Dept of Biochemistry
> >>>>> University of Illinois Urbana-Champaign
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>
> >>>> --
> >>>> ===========================================================
> >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu
More information about the Bioperl-l
mailing list