[Bioperl-l] bp_genbank2gff3.pl - circular genomes, origin-spanning features, and GFF3
Chris Fields
cjfields at illinois.edu
Fri Apr 9 13:29:36 UTC 2010
Leighton,
Didn't see the GFF3 in question.
chris
On Apr 9, 2010, at 8:06 AM, Leighton Pritchard wrote:
> Hi,
>
> (cc'd to Lincoln due to GFF3 relevance)
>
> I've recently been trying to use BioPerl, CHADO and GBROWSE to represent
> bacterial genome sequences. In doing this, I've been testing with GenBank
> genome/feature files, converting these to GFF3 with bp_genbank2gff3.pl to
> get a CHADO-friendly gene model. There appears to be an issue when
> converting GenBank files that contain features which span the genomic
> origin.
>
> For example, the GenBank file NC_002127.gbk describes a plasmid from E.coli
> O157H7. This contains the following feature which spans the reference
> sequence origin:
>
> gene join(92527..92721,1..2502)
> /gene="tagA"
> /locus_tag="pO157p01"
> /db_xref="GeneID:1789672"
> CDS join(92527..92721,1..2502)
> /gene="tagA"
> /locus_tag="pO157p01"
> /codon_start=1
> /transl_table=11
> /product="ToxR-regulated lipoprotein"
> /protein_id="NP_052607.1"
> /db_xref="GI:10955349"
> /db_xref="GeneID:1789672"
>
> When using the bp_genbank2gff3.pl script (either from bioperl-live or
> release 1.6.1) to convert NC_002128.gbk to GFF3 with the command-line
>
> $ bp_genbank2gff3.pl ./Escherichia_coli_O157H7/NC_002128.gbk -out stdout >
> test.gff3
>
> This produces the following, non-sequence ontology-compatible GFF:
??????
> --
> Dr Leighton Pritchard MRSC
> D131, Plant Pathology Programme, SCRI
> Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
> e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard
> gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405
More information about the Bioperl-l
mailing list