[Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates??

Tue May 22 00:59:20 UTC 2007

You can add the necessary patch to the bug report when it's ready; no  
need to close it out.

The most complete file format to parse seems to be the details file;  
it contains the sequence length:

 >BCTDNA
Sequence length = 29940

which can be used for the split location.  As Torsten points out, use  
of -X could also potentially produce fuzzy locations.

Since the parser currently only parses predict files, you could  
optionally supply the parser with the seq length and emit a warning  
if seqfeatures requiring it are produced, such as the sporadic ones  
which wrap around.  If one were using the bioperl-run module this  
could be automated a bit by passing the seq length in to the parser  
object by adding the seq length to the constructor argument list.

chris

On May 21, 2007, at 6:57 PM, Mark Johnson wrote:

>     Alrighty then.  That's a feature, not a bug.  Hmmmm.  How about
> this for a fix?  For plus strand predictions with start > end, use a
> split location.  For minus strand predictions with start < end, use a
> split location.  Without knowing the length of the sequence, that's
> the best that can be done, I think.
>     Unless there are objections, I'll go code that up.  Close that bug
> out as 'requester is an idiot'.  8)
>
> On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> glimmer2/3 both assume the genome is circular by default (I'm
>> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
>> the Glimmer3 release notes the detail file has the information in the
>> header; from the Glimmer3 data used for tests:
>>
>> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
>> Glimmer3.icm Glimmer3
>>
>> Sequence file = ../BCTDNA
>> ICM model file = Glimmer3.icm
>> Excluded regions file = none
>> List of orfs file = none
>> Truncated orfs = false
>> Circular genome = true
>> ...
>>
>> There are options available for glimmer3 (-L, -X) that specify a
>> linear sequence or allow ORFs to extend past the end of the sequence
>> analyzed (the latter assumes a linear sequence).
>>
>> chris
>>
>> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>>
>>>     That makes sense.  Is that behavior documented anywhere?  I'll
>>> feel like less of an idiot if it's not.  8)  Either way, if you're
>>> sure that's whats going on, I'll fix up the parser to handle that  
>>> as a
>>> split location.
>>>
>>>> I think I know what it is.  If you mean these predictions:
>>>>
>>>> Glimmer2:
>>>>
>>>>     27    29263        6  [+1 L= 684 r=-1.187]
>>>>
>>>> Glimmer3:
>>>>
>>>> orf00001    29263        9  +1     9.60
>>>>
>>>> Glimmer2/3 are predicting a gene for a circular chromosome that
>>>> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
>>>> the stop codon).  Note in Glimmer2 detailed output the end is 29946
>>>> and the length of the sequence is 29940, so Glimmer2 artificially
>>>> extends the end of the sequence with part of the start.
>>>>
>>>> This is handled as a split location in bioperl and in most GenBank
>>>> files; the above would be a location string like 'join
>>>> (29263..29940,1..9)'.  If you switched the start and stop the
>>>> location would be '9..29263' which wouldn't be correct (and  
>>>> would be
>>>> a huge gene).
>>>>
>>>> chris
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign