[Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl
Chris Mungall
cjm at berkeleybop.org
Tue Sep 9 22:56:55 UTC 2008
I think I am happy with the modulo approach.
Though I believe we first of all need for a formal specification of
genome interval semantics that is independent of any particular syntax
or implementation. This can be a fairly short specification - along
the lines of what Lincoln has written below (although I would
naturally prefer the normative version to be interbase - this doesn't
preclude derived axioms in GFF coordinates).
This spec should also define and standardize the terminology used:
Lincoln draws a distinction between 'stop' and 'end'. I'm relatively
happy with these terms - however, the choice we makes need to become
enshrined otherwise we'll end up with confusion and mismatches between
software and specification.
One clarification:
> revolutions = int (length/genome)
This axiom is presumaby contextual on the genome being circular, which
will have to be indicated using a new flag, as Jim suggest, yep?
So the context independent axiom would be:
> revolutions = IF src_is_circular THEN int (length/genome) ELSE 0
On Sep 9, 2008, at 10:52 AM, Lincoln Stein wrote:
> It seems to me that the proposed modulus syntax handles multiple
> revolutions. Consider a 100 bp genome (to make it simple) and a
> feature that
> starts at 50, goes around twice, and ends at position 60:
>
> start = 50
> end = 260
>
> length = end - start + 1
> revolutions = int (length/genome)
> stop position = length % genome + 1
>
> Lincoln
>
> On Mon, Sep 8, 2008 at 3:57 PM, Aaron Mackey <ajmackey at gmail.com>
> wrote:
>
>> How can you handle features that may cross the origin more than once?
>> The modulus, though simple, seems to be only half the solution. It
>> also makes it difficult to place features in the genome "by eye"
>> (having to do the modulus subtraction in my head), or in
>> sorting/filtering operations.
>>
>> I have an alternative that I wondered if you considered: allow the
>> start/end to have an additional "circular revolution" prefix:
>>
>> a typical range tuple like: 100 200 -
>> is thus shorthand for: 0:100 0:200 -
>> (i.e. both the 100 and 200 are in the same "revolution" around the
>> genome)
>>
>> and is then distinguishable from an "around the genome + 100"
>> feature of:
>> 1:100 0:200 -
>>
>> Just an alternative to consider (if you haven't already). I'm not
>> wedded to the syntax, but I wouldn't want to see new columns in GFF
>> just for this. Essentially, what you want is some form of compound
>> polar coordinates, it seems.
>>
>> -Aaron
>>
>> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu <jimhu at tamu.edu> wrote:
>>> In discussions with GMOD about Gbrowse, we've come up with a
>>> proposal for
>>> handling circular genomes and features that cross the origin in such
>>> genomes. This applies to lots of prokaryotic and viral genomes, and
>> might
>>> be valuable for some ways of representing terminally redundant
>>> linear
>>> genomes.
>>> 1) Keep the requirement that start < end
>>> 2) allow end > parent feature length
>>> 3) parent feature gets an is_circular boolean
>>> 4) use modular arithmetic to calculate the real position of end on
>>> the
>>> parent feature.
>>> We'd like to do this in a way that will be consistent with Chado and
>> BioPerl
>>> representation of features as much as possible (realizing that
>>> there is
>> the
>>> usual interbase or not coordinate issue). What do people think?
>>> Lincoln
>> is
>>> on board for modifying the GFF3 spec.
>>> Thanks!
>>> Jim Hu
>>>
>>> =====================================
>>>
>>> Jim Hu
>>>
>>> Associate Professor
>>>
>>> Dept. of Biochemistry and Biophysics
>>>
>>> 2128 TAMU
>>>
>>> Texas A&M Univ.
>>>
>>> College Station, TX 77843-2128
>>>
>>> 979-862-4054
>>>
>>>
>>> -------------------------------------------------------------------------
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>> challenge
>>> Build the coolest Linux based applications with Moblin SDK & win
>>> great
>>> prizes
>>> Grand prize is a trip for two to an Open Source event anywhere in
>>> the
>> world
>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> Gmod-schema at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> Lincoln D. Stein
>
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Stacey Quinn <Stacey.Quinn at oicr.on.ca>
>
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724 USA
> (516) 367-8380
> Assistant: Sandra Michelsen <michelse at cshl.edu>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list