[Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates??

Mark Johnson johnsonm at gmail.com
Mon May 21 23:57:03 UTC 2007


    Alrighty then.  That's a feature, not a bug.  Hmmmm.  How about
this for a fix?  For plus strand predictions with start > end, use a
split location.  For minus strand predictions with start < end, use a
split location.  Without knowing the length of the sequence, that's
the best that can be done, I think.
    Unless there are objections, I'll go code that up.  Close that bug
out as 'requester is an idiot'.  8)

On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
> glimmer2/3 both assume the genome is circular by default (I'm
> assuming since Glimmer2/3 are used for bacterial genomes).  Acc. to
> the Glimmer3 release notes the detail file has the information in the
> header; from the Glimmer3 data used for tests:
>
> Command:  /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
> Glimmer3.icm Glimmer3
>
> Sequence file = ../BCTDNA
> ICM model file = Glimmer3.icm
> Excluded regions file = none
> List of orfs file = none
> Truncated orfs = false
> Circular genome = true
> ...
>
> There are options available for glimmer3 (-L, -X) that specify a
> linear sequence or allow ORFs to extend past the end of the sequence
> analyzed (the latter assumes a linear sequence).
>
> chris
>
> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>
> >     That makes sense.  Is that behavior documented anywhere?  I'll
> > feel like less of an idiot if it's not.  8)  Either way, if you're
> > sure that's whats going on, I'll fix up the parser to handle that as a
> > split location.
> >
> >> I think I know what it is.  If you mean these predictions:
> >>
> >> Glimmer2:
> >>
> >>     27    29263        6  [+1 L= 684 r=-1.187]
> >>
> >> Glimmer3:
> >>
> >> orf00001    29263        9  +1     9.60
> >>
> >> Glimmer2/3 are predicting a gene for a circular chromosome that
> >> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
> >> the stop codon).  Note in Glimmer2 detailed output the end is 29946
> >> and the length of the sequence is 29940, so Glimmer2 artificially
> >> extends the end of the sequence with part of the start.
> >>
> >> This is handled as a split location in bioperl and in most GenBank
> >> files; the above would be a location string like 'join
> >> (29263..29940,1..9)'.  If you switched the start and stop the
> >> location would be '9..29263' which wouldn't be correct (and would be
> >> a huge gene).
> >>
> >> chris
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>



More information about the Bioperl-l mailing list