[Bioperl-l] Error parsing Genbank file
Jason Stajich
jason.stajich at duke.edu
Thu Jan 6 17:14:04 EST 2005
Fixed in CVS. You can grab the changes from http://cvs.open-bio.org/
Index: Bio/SeqIO/genbank.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqIO/genbank.pm,v
retrieving revision 1.116
diff -r1.116 genbank.pm
71a72
> wgs - Should contain a Bio::Annotation::SimpleValue
object
465,466c466
< last if(($buffer =~ /^BASE/o) || ($buffer =~ /^ORIGIN/o)
||
< ($buffer =~ /^CONTIG/o) );
---
> last if( $buffer =~ /^BASE|ORIGIN|CONTIG|WGS/o);
517a518,522
> } elsif( s/^WGS\s+// ) {
> chomp;
> $annotation->add_Annotation(
> 'wgs',
> Bio::Annotation::SimpleValue->new(-value => $_));
522c527,528
< }
---
>
> } else { warn($_); }
775a782,788
> # deal with WGS
> foreach my $wgs ( $seq->annotation->get_Annotations('wgs') ) {
> $self->_print(sprintf ("%-11s %s\n",'WGS',
> $wgs->value));
> $self->_show_dna(0);
> }
>
On Jan 6, 2005, at 4:21 PM, Ryan Golhar wrote:
> What is the fix for CONTIG entries....
>
> BTW- I'm new to bioperl...
>
> Ryan
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Wednesday, January 05, 2005 4:37 PM
> To: golharam at umdnj.edu
> Cc: 'Bioperl List'
> Subject: Re: [Bioperl-l] Error parsing Genbank file
>
>
> We can't parse WGS files. The fix it needs is very similar to how we
> handle CONTIG entries if you want to have a go at fixing it.
>
> On Jan 5, 2005, at 3:41 PM, Ryan Golhar wrote:
>
>> Hi all,
>>
>> I have a Genbank file that Bio::SeqIO:genbank.pm is choking on. The
>> entry is just a WGS entry referencing a bunch of other entries. It
>> does on line 492 with the error "Unexpected error in feature table for
>> Skipping feature, attempting to recover".
>>
>> I'm using the following code:
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use Bio::SeqIO;
>>
>> my $usage = "$0 <genbank file> <fasta file>\n";
>> my $file = shift or die $usage;
>> my $outfilename = shift or die $usage;
>>
>> my $infile = Bio::SeqIO->new('-file' => "<$file",
>> '-format' => "genbank");
>>
>> my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename",
>> '-format' => "fasta");
>>
>> while (my $seq = $infile->next_seq) {
>> # print STDERR $seq->accession_number,"\n";
>>
>> $outfile->write_seq($seq);
>> }
>>
>> Here is the contents of the genbank entry:
>>
>> LOCUS CAAB01000000 12381 rc DNA linear VRT
>> 22-AUG-2002
>> DEFINITION Takifugu rubripes whole genome shotgun sequencing project.
>> ACCESSION CAAB00000000
>> VERSION CAAB00000000.1 GI:22418063
>> KEYWORDS WGS.
>> SOURCE Takifugu rubripes (Fugu rubripes)
>> ORGANISM Takifugu rubripes
>> Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>> Euteleostomi;
>> Actinopterygii; Neopterygii; Teleostei; Euteleostei;
>> Neoteleostei;
>> Acanthomorpha; Acanthopterygii; Percomorpha;
>> Tetraodontiformes;
>> Tetradontoidea; Tetraodontidae; Takifugu.
>> REFERENCE 1 (bases 1 to 12381)
>> AUTHORS The Fugu Genome Sequencing Consortium.
>> TITLE Direct Submission
>> JOURNAL Submitted (01-JUL-2002) The Fugu Genome Sequencing
>> Consortium,
>> http://www.fugubase.org/ http://www.jgi.doe.gov/fugu
>> COMMENT The Takifugu rubripes whole genome shotgun (WGS) project
>> has
>> the
>> project accession CAAB00000000. This version of the
>> project
>> (01)
>> has the accession number CAAB01000000, and consists of
>> sequences
>> CAAB01000001-CAAB01012381.
>> FEATURES Location/Qualifiers
>> source 1..12381
>> /organism="Takifugu rubripes"
>> /mol_type="genomic DNA"
>> /db_xref="taxon:31033"
>> WGS CAAB01000001-CAAB01012381
>> //
>>
>>
>>
>> -----
>> Ryan Golhar
>> Computational Biologist
>> The Informatics Institute at
>> The University of Medicine & Dentistry of NJ
>>
>> Phone: 973-972-5034
>> Fax: 973-972-7412
>> Email: golharam at umdnj.edu
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
More information about the Bioperl-l
mailing list