[Bioperl-l] Fuzzy Locations and GenBank
Hilmar Lapp
hlapp at gmx.net
Mon Aug 21 18:59:57 UTC 2006
Well, they're actually not dead yet. Just one variant died. I'm
hoping though that this is just a step on the road that indeed ends
in their death.
-hilmar
On Aug 21, 2006, at 1:34 PM, Lincoln Stein wrote:
> I am tempted to start dancing around my office singing "Ding dong
> the fuzzy
> feature is dead!" Break out the champagne!!
>
> Lincoln
>
> On 8/21/06, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>> Steve
>>
>> There is this the EMBL Release 87 notes:
>>
>>
>> http://www.ebi.ac.uk/embl/Documentation/Release_notes/current/
>> relnotes.html
>>
>> ..
>> 2 CHANGES IN THIS RELEASE
>>
>> 2.1 Changes to the Feature Table Document: Chapter 3.5 "Location"
>>
>> The use of range (.) descriptor within location spans is no longer
>> legal.
>> ..
>>
>> So, yes, looks like EMBL is doing the same thing. I am guessing
>> DDBJ is
>> also.
>>
>> I didn't see anything in the recent revision for the INSDSeqXML
>> DTD, but I
>> don't think a change in the DTD would be needed to accommodate the
>> removal
>> of 'fuzzy' locations of X.Y type, unless the DTD has specific
>> rules on how
>> to format fuzzy location data. Same for the other formats
>> (EMBLXML, etc)
>> as
>> the change is rather small (but very significant).
>>
>> I'm guessing changes to other formats (game, etc) that rely on
>> GenBank/EMBL
>> will occur if they specifically deal with these in some way.
>>
>> It is nice to know that that BioPerl won't be seriously affected
>> by this.
>> As you noted, we'll have to keep X.Y fuzzy functionality around to
>> accommodate legacy data, but should we add warnings for this?
>>
>> Chris
>>
>>
>>> -----Original Message-----
>>> From: Steve Chervitz [mailto:sac at open-bio.org]
>>> Sent: Sunday, August 20, 2006 10:56 PM
>>> To: Hilmar Lapp
>>> Cc: Chris Fields; Bioperl List
>>> Subject: Re: [Bioperl-l] Fuzzy Locations and GenBank
>>>
>>> Ah, one of the banes of bioinformatics data modeling is finally
>>> being
>>> laid to rest. Those who have struggled with it (myself included)
>>> should not let this occasion pass without notice. Here are some
>>> reflections.
>>>
>>> Check out the captions under photo's #2 and 3 here:
>>> http://gallery.open-bio.org/gallery2/v/hackathon2002/dagphotos/?
>>> g2_page=7
>>>
>>> Isn't it fitting, now that the open-bio.org toolkits have systems in
>>> place to deal with fuzzy locations, the NCBi says, "well, their not
>>> really used all that much, and so are not worth the trouble".
>>> This is
>>> perhaps something we all knew in our hearts, but nevertheless felt
>>> compulsion to tackle anyway, right?
>>>
>>> The amount of fuzzy location-related cycles the open-bio community
>>> has collectively burned over the years perhaps isn't for naught:
>>> There will still be legacy data to deal with, and perhaps other
>>> feature annotation data models still use them. EMBLxml does. I know
>>> DAS/2 does not and has no plans to, and looks like GAME XML also
>>> does
>>> not. Anyone else?
>>>
>>> I imagine EMBL and DDBJ will follow suit in banishing fuzzy
>>> locations
>>> as well. Anyone know?
>>>
>>> Steve
>>>
>>> On Aug 18, 2006, at 9:08 PM, Hilmar Lapp wrote:
>>>
>>>> Great, the fewer fuzzy locations the better. -hilmar
>>>>
>>>> On Aug 19, 2006, at 12:03 AM, Chris Fields wrote:
>>>>
>>>>> Don't know how much this will affect Bio::Location::Fuzzy, but I
>>>>> thought it might be worth a heads-up in case something pops up:
>>>>>
>>>>> From the latest GenBank release (154.0):
>>>>>
>>>>> ...
>>>>>
>>>>> 1.4.6 Feature location syntax X.Y to be discontinued
>>>>>
>>>>> The Feature Table currently supports feature locations of the
>>>>> format X.Y, to represent a base position which is greater or
>>>>> equal to X, and less than or equal to Y. For example:
>>>>>
>>>>> misc_feature 1.10..20
>>>>> misc_feature join(100..150,200.210..250)
>>>>>
>>>>> In the first example, the misc_feature starts somewhere between
>>>>> bases 1 and 10 (inclusive), and ends at basepair 20. In the
>>>>> second,
>>>>> the 51 bases from 100..150 are joined together with a second
>>>>> basepair
>>>>> interval, which could be anywhere from 200..250 to 210..250 .
>>>>>
>>>>> Although this syntax seems like a reasonable way to capture an
>>>>> uncertain interval, it is used for features on a vanishingly small
>>>>> number of sequence records, most database submission mechanisms
>>>>> don't support it, and the meaning of its use in a join() context
>>>>> is not entirely clear.
>>>>>
>>>>> As of October 2006, this type of location will no longer be
>>>>> supported. Those records with features which utilize X.Y locations
>>>>> will be reviewed and converted to a non-uncertain format prior to
>>>>> that date.
>>>>>
>>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>> --
>>>> ===========================================================
>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list