[Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl

Chris Fields cjfields at illinois.edu
Tue Sep 9 18:24:49 UTC 2008


Is there any particular reason we don't treat this similarly to the  
way BioPerl does, which is to simply treat the origin-overlapping  
feature as a split location?  GenBank treats this similarly.  For an  
faux example, the bug I just fixed for bugzilla has one:

http://bugzilla.open-bio.org/show_bug.cgi?id=2579

An actual GenBank case is the Sulfolobus solfataricus genome  
(NC_002754), and I'm sure Jim could come up with more.  The only  
caveat is whether we should represent this

As for multiple revolutions, I'm not sure the hand-wringing about  
specifics is worth it unless we have explicit workable examples to  
test against (preferably examples which would potentially pop up), but  
Lincoln's proposal sounds fine.

chris

On Sep 9, 2008, at 11:05 AM, Jim Hu wrote:

> Hi Aaron,
>
> I was thinking this would be handled by making the end=parent  
> feature length x 2 + end coord.  end/parent length = number of times  
> crosses origin.
>
> Jim
>
> On Sep 8, 2008, at 2:57 PM, Aaron Mackey wrote:
>
>> How can you handle features that may cross the origin more than once?
>> The modulus, though simple, seems to be only half the solution.  It
>> also makes it difficult to place features in the genome "by eye"
>> (having to do the modulus subtraction in my head), or in
>> sorting/filtering operations.
>>
>> I have an alternative that I wondered if you considered: allow the
>> start/end to have an additional "circular revolution" prefix:
>>
>> a typical range tuple like: 100 200 -
>> is thus shorthand for: 0:100 0:200 -
>> (i.e. both the 100 and 200 are in the same "revolution" around the  
>> genome)
>>
>> and is then distinguishable from an "around the genome + 100"  
>> feature of:
>> 1:100 0:200 -
>>
>> Just an alternative to consider (if you haven't already).  I'm not
>> wedded to the syntax, but I wouldn't want to see new columns in GFF
>> just for this.  Essentially, what you want is some form of compound
>> polar coordinates, it seems.
>>
>> -Aaron
>>
>> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu <jimhu at tamu.edu> wrote:
>>> In discussions with GMOD about Gbrowse, we've come up with a  
>>> proposal for
>>> handling circular genomes and features that cross the origin in such
>>> genomes.  This applies to lots of prokaryotic and viral genomes,  
>>> and might
>>> be valuable for some ways of representing terminally redundant  
>>> linear
>>> genomes.
>>> 1) Keep the requirement that start < end
>>> 2) allow end > parent feature length
>>> 3) parent feature gets an is_circular boolean
>>> 4) use modular arithmetic to calculate the real position of end on  
>>> the
>>> parent feature.
>>> We'd like to do this in a way that will be consistent with Chado  
>>> and BioPerl
>>> representation of features as much as possible (realizing that  
>>> there is the
>>> usual interbase or not coordinate issue).  What do people think?   
>>> Lincoln is
>>> on board for modifying the GFF3 spec.
>>> Thanks!
>>> Jim Hu
>>>
>>> =====================================
>>>
>>> Jim Hu
>>>
>>> Associate Professor
>>>
>>> Dept. of Biochemistry and Biophysics
>>>
>>> 2128 TAMU
>>>
>>> Texas A&M Univ.
>>>
>>> College Station, TX 77843-2128
>>>
>>> 979-862-4054
>>>
>>>
>>> -------------------------------------------------------------------------
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's  
>>> challenge
>>> Build the coolest Linux based applications with Moblin SDK & win  
>>> great
>>> prizes
>>> Grand prize is a trip for two to an Open Source event anywhere in  
>>> the world
>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> Gmod-schema at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>
> =====================================
> Jim Hu
> Associate Professor
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4054
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign







More information about the Bioperl-l mailing list