[Bioperl-l] Problem writing sequence features for GenBank/GenPept
sequences (file: Bio/SeqIO/genbank.pm)
sgoegel at gmail.com
sgoegel at gmail.com
Sat Apr 16 13:12:50 EDT 2005
write_sequence does not write genbank sequences correctly.
Specifically, features appear as shown below.
I traced the problem write_seq in SeqIO::genbank.pm,
and then to the $self->_print_GenBank_FTHelper($fth); line within the foreach
my $fth ( @fth ) loop.
Here's what I'm actually doing within my script:
use Bio::Perl;
use Bio::DB::GenBank;
use Bio::DB::GenPept;
use Bio::Seq;
use Bio::SeqIO;
# --- cut ---
my $seq = $gp->get_Seq_by_gi($id); # where $gp is a Bio::DB::GenPept object
and $id is a valid gi num
# ...
write_sequence(">$filename", "genbank", $seq);
this results in a mostly-correct looking genpept output file, but the feature
section looks like so:
FEATURES Location/Qualifiers
source 1..194
/db_xref="Bio::Annotation::SimpleValue=HASH(0x87eecbc)"
/organism="Bio::Annotation::SimpleValue=HASH(0x87eee00)"
gene 1..194
/locus_tag="Bio::Annotation::SimpleValue=HASH(0x87eefe0)"
/gene="Bio::Annotation::SimpleValue=HASH(0x87f0728)" Protein 1..194
/locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f07ac)"
/gene="Bio::Annotation::SimpleValue=HASH(0x87f0908)"
/product="Bio::Annotation::SimpleValue=HASH(0x87f0914)" Region
15..42
/locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f0a04)"
/region_name="Bio::Annotation::SimpleValue=HASH(0x87f1c4c) "
/evidence=Bio::Annotation::SimpleValue=HASH(0x87f1cb8)
/gene="Bio::Annotation::SimpleValue=HASH(0x87f1d00)"
/note="Bio::Annotation::SimpleValue=HASH(0x87f1d48)"
Region 50..99
/locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f1dd8)"
/region_name="Bio::Annotation::SimpleValue=HASH(0x87f1ed4) "
/evidence=Bio::Annotation::SimpleValue=HASH(0x87f1f40)
/gene="Bio::Annotation::SimpleValue=HASH(0x87f2e2c)"
/note="Bio::Annotation::SimpleValue=HASH(0x87f2e74)"
Site 118
/locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f2f04)"
/evidence=Bio::Annotation::SimpleValue=HASH(0x87f3000)
/gene="Bio::Annotation::SimpleValue=HASH(0x87f306c)"
/site_type="Bio::Annotation::SimpleValue=HASH(0x87f30fc)"
/note="Bio::Annotation::SimpleValue=HASH(0x87f30b4)"
!!
As I said, I traced this problem to write_seq in the Bio/SeqIO/genbank.pm
file, which calls $self->_print_GenBank_FTHelper($fth);
This happened for all of about 4000 nucleotide sequences I downloaded in
genbank format, as well as a protein sequence I tested.
I fixed the problem with the following modification to
_print_GenBank_FTHelper: (modified lines marked with ### comments)
sub _print_GenBank_FTHelper {
my ($self,$fth) = @_;
if( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {
$fth->warn("$fth is not a FTHelper class. Attempting to print, but
there could be tears!");
}
if( defined $fth->key &&
$fth->key eq 'CONTIG' ) {
$self->_show_dna(0);
$self->_write_line_GenBank_regex(sprintf("%-12s",$fth->key),
' 'x12,$fth->loc,"\,\|\$",80);
} else {
$self->_write_line_GenBank_regex(sprintf(" %-16s",$fth->key),
" "x21,
$fth->loc,"\,\|\$",80);
}
foreach my $tag ( keys %{$fth->field} ) {
foreach my $value ( @{$fth->field->{$tag}} ) {
####### I added the next 3 lines.
if (ref ($value) =~ /Bio::Annotation::SimpleValue/) {
$value = $value->value;
}
## --- cut ---
I don't know whether my fix is sufficient or more should be changed to
maintain coherence... however, my fix seems to work.
More information about the Bioperl-l
mailing list