[Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates??

Chris Fields cjfields at uiuc.edu
Mon May 21 21:17:37 UTC 2007


On May 21, 2007, at 3:48 PM, Mark Johnson wrote:

> Check the test data for Glimmer2 and Glimmer3.  They both predict one
> large gene, I'd guess covering most of the sequence, in frame +1.
> That's probably a bogus prediction, but that's not up to the parser to
> decide.  I hadn't noticed it until recently.
>
> I sent a patch via bugzilla to swap the coordinates if start > end and
> strand > 0.

I think I know what it is.  If you mean these predictions:

Glimmer2:

    27    29263        6  [+1 L= 684 r=-1.187]

Glimmer3:

orf00001    29263        9  +1     9.60

Glimmer2/3 are predicting a gene for a circular chromosome that  
starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off  
the stop codon).  Note in Glimmer2 detailed output the end is 29946  
and the length of the sequence is 29940, so Glimmer2 artificially  
extends the end of the sequence with part of the start.

This is handled as a split location in bioperl and in most GenBank  
files; the above would be a location string like 'join 
(29263..29940,1..9)'.  If you switched the start and stop the  
location would be '9..29263' which wouldn't be correct (and would be  
a huge gene).

chris



More information about the Bioperl-l mailing list