[Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates??
Chris Fields
cjfields at uiuc.edu
Tue May 22 00:59:20 UTC 2007
You can add the necessary patch to the bug report when it's ready; no
need to close it out.
The most complete file format to parse seems to be the details file;
it contains the sequence length:
>BCTDNA
Sequence length = 29940
which can be used for the split location. As Torsten points out, use
of -X could also potentially produce fuzzy locations.
Since the parser currently only parses predict files, you could
optionally supply the parser with the seq length and emit a warning
if seqfeatures requiring it are produced, such as the sporadic ones
which wrap around. If one were using the bioperl-run module this
could be automated a bit by passing the seq length in to the parser
object by adding the seq length to the constructor argument list.
chris
On May 21, 2007, at 6:57 PM, Mark Johnson wrote:
> Alrighty then. That's a feature, not a bug. Hmmmm. How about
> this for a fix? For plus strand predictions with start > end, use a
> split location. For minus strand predictions with start < end, use a
> split location. Without knowing the length of the sequence, that's
> the best that can be done, I think.
> Unless there are objections, I'll go code that up. Close that bug
> out as 'requester is an idiot'. 8)
>
> On 5/21/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> glimmer2/3 both assume the genome is circular by default (I'm
>> assuming since Glimmer2/3 are used for bacterial genomes). Acc. to
>> the Glimmer3 release notes the detail file has the information in the
>> header; from the Glimmer3 data used for tests:
>>
>> Command: /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA
>> Glimmer3.icm Glimmer3
>>
>> Sequence file = ../BCTDNA
>> ICM model file = Glimmer3.icm
>> Excluded regions file = none
>> List of orfs file = none
>> Truncated orfs = false
>> Circular genome = true
>> ...
>>
>> There are options available for glimmer3 (-L, -X) that specify a
>> linear sequence or allow ORFs to extend past the end of the sequence
>> analyzed (the latter assumes a linear sequence).
>>
>> chris
>>
>> On May 21, 2007, at 4:21 PM, Mark Johnson wrote:
>>
>>> That makes sense. Is that behavior documented anywhere? I'll
>>> feel like less of an idiot if it's not. 8) Either way, if you're
>>> sure that's whats going on, I'll fix up the parser to handle that
>>> as a
>>> split location.
>>>
>>>> I think I know what it is. If you mean these predictions:
>>>>
>>>> Glimmer2:
>>>>
>>>> 27 29263 6 [+1 L= 684 r=-1.187]
>>>>
>>>> Glimmer3:
>>>>
>>>> orf00001 29263 9 +1 9.60
>>>>
>>>> Glimmer2/3 are predicting a gene for a circular chromosome that
>>>> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off
>>>> the stop codon). Note in Glimmer2 detailed output the end is 29946
>>>> and the length of the sequence is 29940, so Glimmer2 artificially
>>>> extends the end of the sequence with part of the start.
>>>>
>>>> This is handled as a split location in bioperl and in most GenBank
>>>> files; the above would be a location string like 'join
>>>> (29263..29940,1..9)'. If you switched the start and stop the
>>>> location would be '9..29263' which wouldn't be correct (and
>>>> would be
>>>> a huge gene).
>>>>
>>>> chris
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list