[Bioperl-l] bp_genbank2gff3.pl error with circular genomes
David Breimann
david.breimann at gmail.com
Wed Aug 18 06:46:58 UTC 2010
Dear Chris's,
I tested the updated version on multiple genomes that previously
returned errors (for future reference: NC_005707, NC_006578,
NC_007103, NC_007104, NC_007106, NC_007107, NC_008573, NC_008762,
NC_008763, NC_008785, NC_009457, NC_012040). The script now ends
normally on all of them. However, as you mentioned, the result GFF3
file does not comply with GFF3 specifications for circular genomes.
This in turn causes some unexpected results in other applications.
Best,
Dave
On Wed, Aug 18, 2010 at 6:42 AM, Chris Fields <cjfields at illinois.edu> wrote:
> Chris, David,
>
> The branch is now merged back to trunk. David, let us know if this helps.
>
> chris (f)
>
> On Aug 17, 2010, at 2:24 PM, Chris Fields wrote:
>
>> On Aug 17, 2010, at 10:53 AM, Chris Mungall wrote:
>>
>>> You can merge this in. It should allow David to proceed.
>>
>> Will do. I'll go ahead and delete the remote branch as well.
>>
>>> I haven't kept up on synchrony between bioperl and GFF on circular genomes. The above fix is conservative in that essentially preserves the genbank coordinates even when the origin is crossed:
>>>
>>> http://github.com/bioperl/bioperl-live/commit/d752a4cb5168d1bb01f8c80247a57f66b2bd9daf
>>>
>>> However, if this is to conform to GFF3 then the resulting coordinates that cross the origin should have start/end incremented by the genome length
>>
>> Yes, that is a problem that needs to be addressed. Might be worth filing a bug report for tracking this; we can use David's example, or the one I recently added for phi-X174.
>>
>> chris
>>
>>> On Aug 17, 2010, at 6:51 AM, Chris Fields wrote:
>>>
>>>> I think Chris Mungall has a branch set up for this in bioperl:
>>>>
>>>> http://github.com/bioperl/bioperl-live/tree/circular
>>>>
>>>> Is that correct? Should we merge that code into the master branch?
>>>>
>>>> chris
>>>>
>>>> On Aug 17, 2010, at 8:44 AM, David Breimann wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> The following genbank has a gene that runs over the 'end" of the
>>>>> chromosome and into its "beginning", and the script generates an
>>>>> error.
>>>>>
>>>>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk
>>>>>
>>>>> NC_005707 Unflattening error:
>>>>> Details:
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: PROBLEM, SEVERITY==2
>>>>> Ranges not in correct order. Strange ensembl genbank entry? Range:
>>>>> [207497,208369] [1,687]
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::problem
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
>>>>> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>>>>> /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
>>>>> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
>>>>> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
>>>>> -----------------------------------------------------------
>>>>>
>>>>> Best,
>>>>> Dave
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list