[Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl

Scott Cain cain.cshl at gmail.com
Tue Sep 9 17:33:12 UTC 2008


Hi Jim and All,

While I agree with Aaron's point that it is not easy to place features
by visual inspection, this seems like a fairly minor point.  The vast
majority of GFF3 manipulation will be done in software, so as long as
the API handles everything correctly, life is good.  If we discount
that objection, there doesn't seem to be much advantage of using
Aaron's suggested method over Jim's.

(As a side note--I have the same complaint about anything in XML--it
is awful for a human to read.  I still live with XML when I have to
though :-)

Additionally, the fact that Ensembl is using the same method as what
Jim describes is a fairly powerful argument for doing the same.
Hopefully there can be some code reuse.

Scott


On Tue, Sep 9, 2008 at 12:05 PM, Jim Hu <jimhu at tamu.edu> wrote:
> Hi Aaron,
> I was thinking this would be handled by making the end=parent feature length
> x 2 + end coord.  end/parent length = number of times crosses origin.
> Jim
> On Sep 8, 2008, at 2:57 PM, Aaron Mackey wrote:
>
> How can you handle features that may cross the origin more than once?
> The modulus, though simple, seems to be only half the solution.  It
> also makes it difficult to place features in the genome "by eye"
> (having to do the modulus subtraction in my head), or in
> sorting/filtering operations.
>
> I have an alternative that I wondered if you considered: allow the
> start/end to have an additional "circular revolution" prefix:
>
> a typical range tuple like: 100 200 -
> is thus shorthand for: 0:100 0:200 -
> (i.e. both the 100 and 200 are in the same "revolution" around the genome)
>
> and is then distinguishable from an "around the genome + 100" feature of:
> 1:100 0:200 -
>
> Just an alternative to consider (if you haven't already).  I'm not
> wedded to the syntax, but I wouldn't want to see new columns in GFF
> just for this.  Essentially, what you want is some form of compound
> polar coordinates, it seems.
>
> -Aaron
>
> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu <jimhu at tamu.edu> wrote:
>
> In discussions with GMOD about Gbrowse, we've come up with a proposal for
>
> handling circular genomes and features that cross the origin in such
>
> genomes.  This applies to lots of prokaryotic and viral genomes, and might
>
> be valuable for some ways of representing terminally redundant linear
>
> genomes.
>
> 1) Keep the requirement that start < end
>
> 2) allow end > parent feature length
>
> 3) parent feature gets an is_circular boolean
>
> 4) use modular arithmetic to calculate the real position of end on the
>
> parent feature.
>
> We'd like to do this in a way that will be consistent with Chado and BioPerl
>
> representation of features as much as possible (realizing that there is the
>
> usual interbase or not coordinate issue).  What do people think?  Lincoln is
>
> on board for modifying the GFF3 spec.
>
> Thanks!
>
> Jim Hu
>
> =====================================
>
> Jim Hu
>
> Associate Professor
>
> Dept. of Biochemistry and Biophysics
>
> 2128 TAMU
>
> Texas A&M Univ.
>
> College Station, TX 77843-2128
>
> 979-862-4054
>
>
> -------------------------------------------------------------------------
>
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>
> Build the coolest Linux based applications with Moblin SDK & win great
>
> prizes
>
> Grand prize is a trip for two to an Open Source event anywhere in the world
>
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>
> _______________________________________________
>
> Gmod-schema mailing list
>
> Gmod-schema at lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
>
> =====================================
>
> Jim Hu
>
> Associate Professor
>
> Dept. of Biochemistry and Biophysics
>
> 2128 TAMU
>
> Texas A&M Univ.
>
> College Station, TX 77843-2128
>
> 979-862-4054
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Gmod-schema mailing list
> Gmod-schema at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list