[Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl

Lincoln Stein lincoln.stein at gmail.com
Tue Sep 9 17:52:36 UTC 2008


It seems to me that the proposed modulus syntax handles multiple
revolutions. Consider a 100 bp genome (to make it simple) and a feature that
starts at 50, goes around twice, and ends at position 60:

  start = 50
  end  = 260

length = end - start + 1
revolutions = int (length/genome)
stop position = length % genome + 1

Lincoln

On Mon, Sep 8, 2008 at 3:57 PM, Aaron Mackey <ajmackey at gmail.com> wrote:

> How can you handle features that may cross the origin more than once?
> The modulus, though simple, seems to be only half the solution.  It
> also makes it difficult to place features in the genome "by eye"
> (having to do the modulus subtraction in my head), or in
> sorting/filtering operations.
>
> I have an alternative that I wondered if you considered: allow the
> start/end to have an additional "circular revolution" prefix:
>
> a typical range tuple like: 100 200 -
> is thus shorthand for: 0:100 0:200 -
> (i.e. both the 100 and 200 are in the same "revolution" around the genome)
>
> and is then distinguishable from an "around the genome + 100" feature of:
> 1:100 0:200 -
>
> Just an alternative to consider (if you haven't already).  I'm not
> wedded to the syntax, but I wouldn't want to see new columns in GFF
> just for this.  Essentially, what you want is some form of compound
> polar coordinates, it seems.
>
> -Aaron
>
> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu <jimhu at tamu.edu> wrote:
> > In discussions with GMOD about Gbrowse, we've come up with a proposal for
> > handling circular genomes and features that cross the origin in such
> > genomes.  This applies to lots of prokaryotic and viral genomes, and
> might
> > be valuable for some ways of representing terminally redundant linear
> > genomes.
> > 1) Keep the requirement that start < end
> > 2) allow end > parent feature length
> > 3) parent feature gets an is_circular boolean
> > 4) use modular arithmetic to calculate the real position of end on the
> > parent feature.
> > We'd like to do this in a way that will be consistent with Chado and
> BioPerl
> > representation of features as much as possible (realizing that there is
> the
> > usual interbase or not coordinate issue).  What do people think?  Lincoln
> is
> > on board for modifying the GFF3 spec.
> > Thanks!
> > Jim Hu
> >
> > =====================================
> >
> > Jim Hu
> >
> > Associate Professor
> >
> > Dept. of Biochemistry and Biophysics
> >
> > 2128 TAMU
> >
> > Texas A&M Univ.
> >
> > College Station, TX 77843-2128
> >
> > 979-862-4054
> >
> >
> > -------------------------------------------------------------------------
> > This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> > Build the coolest Linux based applications with Moblin SDK & win great
> > prizes
> > Grand prize is a trip for two to an Open Source event anywhere in the
> world
> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > _______________________________________________
> > Gmod-schema mailing list
> > Gmod-schema at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
Lincoln D. Stein

Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Stacey Quinn <Stacey.Quinn at oicr.on.ca>

Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724 USA
(516) 367-8380
Assistant: Sandra Michelsen <michelse at cshl.edu>



More information about the Bioperl-l mailing list