[Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates??
Mark Johnson
johnsonm at gmail.com
Mon May 21 23:57:03 UTC 2007
Alrighty then. That's a feature, not a bug. Hmmmm. How about
this for a fix? For plus strand predictions with start > end, use a
split location. For minus strand predictions with start < end, use a
split location. Without knowing the length of the sequence, that's
the best that can be done, I think.
Unless there are objections, I'll go code that up. Close that bug
out as 'requester is an idiot'. 8)
On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
> glimmer2/3 both assume the genome is circular by default (I'm
> assuming since Glimmer2/3 are used for bacterial genomes). Acc. to
> the Glimmer3 release notes the detail file has the information in the
> header; from the Glimmer3 data used for tests:
>
> Command: /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
> Glimmer3.icm Glimmer3
>
> Sequence file = ../BCTDNA
> ICM model file = Glimmer3.icm
> Excluded regions file = none
> List of orfs file = none
> Truncated orfs = false
> Circular genome = true
> ...
>
> There are options available for glimmer3 (-L, -X) that specify a
> linear sequence or allow ORFs to extend past the end of the sequence
> analyzed (the latter assumes a linear sequence).
>
> chris
>
> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>
> > That makes sense. Is that behavior documented anywhere? I'll
> > feel like less of an idiot if it's not. 8) Either way, if you're
> > sure that's whats going on, I'll fix up the parser to handle that as a
> > split location.
> >
> >> I think I know what it is. If you mean these predictions:
> >>
> >> Glimmer2:
> >>
> >> 27 29263 6 [+1 L= 684 r=-1.187]
> >>
> >> Glimmer3:
> >>
> >> orf00001 29263 9 +1 9.60
> >>
> >> Glimmer2/3 are predicting a gene for a circular chromosome that
> >> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
> >> the stop codon). Note in Glimmer2 detailed output the end is 29946
> >> and the length of the sequence is 29940, so Glimmer2 artificially
> >> extends the end of the sequence with part of the start.
> >>
> >> This is handled as a split location in bioperl and in most GenBank
> >> files; the above would be a location string like 'join
> >> (29263..29940,1..9)'. If you switched the start and stop the
> >> location would be '9..29263' which wouldn't be correct (and would be
> >> a huge gene).
> >>
> >> chris
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
More information about the Bioperl-l
mailing list