[Bioperl-l] Error parsing Genbank file
Jason Stajich
jason.stajich at duke.edu
Wed Jan 5 16:36:55 EST 2005
We can't parse WGS files. The fix it needs is very similar to how we
handle CONTIG entries if you want to have a go at fixing it.
On Jan 5, 2005, at 3:41 PM, Ryan Golhar wrote:
> Hi all,
>
> I have a Genbank file that Bio::SeqIO:genbank.pm is choking on. The
> entry is just a WGS entry referencing a bunch of other entries. It
> does
> on line 492 with the error "Unexpected error in feature table for
> Skipping feature, attempting to recover".
>
> I'm using the following code:
>
> #!/usr/bin/perl
>
> use strict;
> use Bio::SeqIO;
>
> my $usage = "$0 <genbank file> <fasta file>\n";
> my $file = shift or die $usage;
> my $outfilename = shift or die $usage;
>
> my $infile = Bio::SeqIO->new('-file' => "<$file",
> '-format' => "genbank");
>
> my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename",
> '-format' => "fasta");
>
> while (my $seq = $infile->next_seq) {
> # print STDERR $seq->accession_number,"\n";
>
> $outfile->write_seq($seq);
> }
>
> Here is the contents of the genbank entry:
>
> LOCUS CAAB01000000 12381 rc DNA linear VRT
> 22-AUG-2002
> DEFINITION Takifugu rubripes whole genome shotgun sequencing project.
> ACCESSION CAAB00000000
> VERSION CAAB00000000.1 GI:22418063
> KEYWORDS WGS.
> SOURCE Takifugu rubripes (Fugu rubripes)
> ORGANISM Takifugu rubripes
> Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
> Euteleostomi;
> Actinopterygii; Neopterygii; Teleostei; Euteleostei;
> Neoteleostei;
> Acanthomorpha; Acanthopterygii; Percomorpha;
> Tetraodontiformes;
> Tetradontoidea; Tetraodontidae; Takifugu.
> REFERENCE 1 (bases 1 to 12381)
> AUTHORS The Fugu Genome Sequencing Consortium.
> TITLE Direct Submission
> JOURNAL Submitted (01-JUL-2002) The Fugu Genome Sequencing
> Consortium,
> http://www.fugubase.org/ http://www.jgi.doe.gov/fugu
> COMMENT The Takifugu rubripes whole genome shotgun (WGS) project
> has
> the
> project accession CAAB00000000. This version of the
> project
> (01)
> has the accession number CAAB01000000, and consists of
> sequences
> CAAB01000001-CAAB01012381.
> FEATURES Location/Qualifiers
> source 1..12381
> /organism="Takifugu rubripes"
> /mol_type="genomic DNA"
> /db_xref="taxon:31033"
> WGS CAAB01000001-CAAB01012381
> //
>
>
>
> -----
> Ryan Golhar
> Computational Biologist
> The Informatics Institute at
> The University of Medicine & Dentistry of NJ
>
> Phone: 973-972-5034
> Fax: 973-972-7412
> Email: golharam at umdnj.edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
More information about the Bioperl-l
mailing list