[Bioperl-l] Problem writing sequence features for GenBank/GenPept sequences (file: Bio/SeqIO/genbank.pm)

sgoegel at gmail.com sgoegel at gmail.com
Sat Apr 16 13:12:50 EDT 2005


write_sequence does not write genbank sequences correctly.
Specifically, features appear as shown below.

 I traced the problem write_seq in SeqIO::genbank.pm,
and then to the $self->_print_GenBank_FTHelper($fth); line within the foreach
my $fth ( @fth ) loop.

Here's what I'm actually doing within my script:
use Bio::Perl;
use Bio::DB::GenBank;
use Bio::DB::GenPept;
use Bio::Seq;
use Bio::SeqIO;
# --- cut ---
my $seq = $gp->get_Seq_by_gi($id); # where $gp is a Bio::DB::GenPept object
and $id is a valid gi num
# ...
write_sequence(">$filename", "genbank", $seq);

this results in a mostly-correct looking genpept output file, but the feature
section looks like so:
FEATURES             Location/Qualifiers
     source          1..194
                     /db_xref="Bio::Annotation::SimpleValue=HASH(0x87eecbc)"
                     /organism="Bio::Annotation::SimpleValue=HASH(0x87eee00)"
     gene            1..194

 /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87eefe0)"
 /gene="Bio::Annotation::SimpleValue=HASH(0x87f0728)" Protein         1..194

 /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f07ac)"
 /gene="Bio::Annotation::SimpleValue=HASH(0x87f0908)"
 /product="Bio::Annotation::SimpleValue=HASH(0x87f0914)" Region
 15..42

 /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f0a04)"
 /region_name="Bio::Annotation::SimpleValue=HASH(0x87f1c4c) "
                     /evidence=Bio::Annotation::SimpleValue=HASH(0x87f1cb8)
                     /gene="Bio::Annotation::SimpleValue=HASH(0x87f1d00)"
                     /note="Bio::Annotation::SimpleValue=HASH(0x87f1d48)"
     Region          50..99

 /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f1dd8)"
 /region_name="Bio::Annotation::SimpleValue=HASH(0x87f1ed4) "
                     /evidence=Bio::Annotation::SimpleValue=HASH(0x87f1f40)
                     /gene="Bio::Annotation::SimpleValue=HASH(0x87f2e2c)"
                     /note="Bio::Annotation::SimpleValue=HASH(0x87f2e74)"
     Site            118

 /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f2f04)"
 /evidence=Bio::Annotation::SimpleValue=HASH(0x87f3000)
 /gene="Bio::Annotation::SimpleValue=HASH(0x87f306c)"
 /site_type="Bio::Annotation::SimpleValue=HASH(0x87f30fc)"
 /note="Bio::Annotation::SimpleValue=HASH(0x87f30b4)"


!!

As I said, I traced this problem to write_seq in the Bio/SeqIO/genbank.pm
file, which calls $self->_print_GenBank_FTHelper($fth);

This happened for all of about 4000 nucleotide sequences I downloaded in
genbank format, as well as a protein sequence I tested.

I fixed the problem with the following modification to
 _print_GenBank_FTHelper: (modified lines marked with ### comments)

sub _print_GenBank_FTHelper {
   my ($self,$fth) = @_;

   if( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {
       $fth->warn("$fth is not a FTHelper class. Attempting to print, but
 there could be tears!");
   }
   if( defined $fth->key &&
       $fth->key eq 'CONTIG' ) {
       $self->_show_dna(0);
       $self->_write_line_GenBank_regex(sprintf("%-12s",$fth->key),
					' 'x12,$fth->loc,"\,\|\$",80);
   } else {
       $self->_write_line_GenBank_regex(sprintf("     %-16s",$fth->key),
					" "x21,
					$fth->loc,"\,\|\$",80);
   }

   foreach my $tag ( keys %{$fth->field} ) {
       foreach my $value ( @{$fth->field->{$tag}} ) {
       ####### I added the next 3 lines.
	   if (ref ($value) =~ /Bio::Annotation::SimpleValue/) {
	       $value = $value->value;
	   }

## --- cut ---

I don't know whether my fix is sufficient or more should be changed to
 maintain coherence... however, my fix seems to work.


More information about the Bioperl-l mailing list