[Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates??
cjfields at uiuc.edu
Mon May 21 21:17:37 UTC 2007
On May 21, 2007, at 3:48 PM, Mark Johnson wrote:
> Check the test data for Glimmer2 and Glimmer3. They both predict one
> large gene, I'd guess covering most of the sequence, in frame +1.
> That's probably a bogus prediction, but that's not up to the parser to
> decide. I hadn't noticed it until recently.
> I sent a patch via bugzilla to swap the coordinates if start > end and
> strand > 0.
I think I know what it is. If you mean these predictions:
27 29263 6 [+1 L= 684 r=-1.187]
orf00001 29263 9 +1 9.60
Glimmer2/3 are predicting a gene for a circular chromosome that
starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
the stop codon). Note in Glimmer2 detailed output the end is 29946
and the length of the sequence is 29940, so Glimmer2 artificially
extends the end of the sequence with part of the start.
This is handled as a split location in bioperl and in most GenBank
files; the above would be a location string like 'join
(29263..29940,1..9)'. If you switched the start and stop the
location would be '9..29263' which wouldn't be correct (and would be
a huge gene).
More information about the Bioperl-l