[Bioperl-l] Problem writing sequence features for GenBank/GenPept
sequences (file: Bio/SeqIO/genbank.pm)
Hilmar Lapp
hlapp at gmx.net
Sat Apr 16 15:23:59 EDT 2005
This is a known problem in the 1.5.0 release. Do one of the following:
- downgrade to 1.4.0 (preferably using the 1.4 branch CVS head, but
depending on what you want to do the stock 1.4.0 may do fine), or
- upgrade to the main trunk CVS head.
Hth,
-hilmar
On Saturday, April 16, 2005, at 10:12 AM, sgoegel at gmail.com wrote:
> write_sequence does not write genbank sequences correctly.
> Specifically, features appear as shown below.
>
> I traced the problem write_seq in SeqIO::genbank.pm,
> and then to the $self->_print_GenBank_FTHelper($fth); line within the
> foreach
> my $fth ( @fth ) loop.
>
> Here's what I'm actually doing within my script:
> use Bio::Perl;
> use Bio::DB::GenBank;
> use Bio::DB::GenPept;
> use Bio::Seq;
> use Bio::SeqIO;
> # --- cut ---
> my $seq = $gp->get_Seq_by_gi($id); # where $gp is a Bio::DB::GenPept
> object
> and $id is a valid gi num
> # ...
> write_sequence(">$filename", "genbank", $seq);
>
> this results in a mostly-correct looking genpept output file, but the
> feature
> section looks like so:
> FEATURES Location/Qualifiers
> source 1..194
>
> /db_xref="Bio::Annotation::SimpleValue=HASH(0x87eecbc)"
>
> /organism="Bio::Annotation::SimpleValue=HASH(0x87eee00)"
> gene 1..194
>
> /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87eefe0)"
> /gene="Bio::Annotation::SimpleValue=HASH(0x87f0728)" Protein
> 1..194
>
> /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f07ac)"
> /gene="Bio::Annotation::SimpleValue=HASH(0x87f0908)"
> /product="Bio::Annotation::SimpleValue=HASH(0x87f0914)" Region
> 15..42
>
> /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f0a04)"
> /region_name="Bio::Annotation::SimpleValue=HASH(0x87f1c4c) "
>
> /evidence=Bio::Annotation::SimpleValue=HASH(0x87f1cb8)
>
> /gene="Bio::Annotation::SimpleValue=HASH(0x87f1d00)"
>
> /note="Bio::Annotation::SimpleValue=HASH(0x87f1d48)"
> Region 50..99
>
> /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f1dd8)"
> /region_name="Bio::Annotation::SimpleValue=HASH(0x87f1ed4) "
>
> /evidence=Bio::Annotation::SimpleValue=HASH(0x87f1f40)
>
> /gene="Bio::Annotation::SimpleValue=HASH(0x87f2e2c)"
>
> /note="Bio::Annotation::SimpleValue=HASH(0x87f2e74)"
> Site 118
>
> /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f2f04)"
> /evidence=Bio::Annotation::SimpleValue=HASH(0x87f3000)
> /gene="Bio::Annotation::SimpleValue=HASH(0x87f306c)"
> /site_type="Bio::Annotation::SimpleValue=HASH(0x87f30fc)"
> /note="Bio::Annotation::SimpleValue=HASH(0x87f30b4)"
>
>
> !!
>
> As I said, I traced this problem to write_seq in the
> Bio/SeqIO/genbank.pm
> file, which calls $self->_print_GenBank_FTHelper($fth);
>
> This happened for all of about 4000 nucleotide sequences I downloaded
> in
> genbank format, as well as a protein sequence I tested.
>
> I fixed the problem with the following modification to
> _print_GenBank_FTHelper: (modified lines marked with ### comments)
>
> sub _print_GenBank_FTHelper {
> my ($self,$fth) = @_;
>
> if( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {
> $fth->warn("$fth is not a FTHelper class. Attempting to print,
> but
> there could be tears!");
> }
> if( defined $fth->key &&
> $fth->key eq 'CONTIG' ) {
> $self->_show_dna(0);
> $self->_write_line_GenBank_regex(sprintf("%-12s",$fth->key),
> ' 'x12,$fth->loc,"\,\|\$",80);
> } else {
> $self->_write_line_GenBank_regex(sprintf("
> %-16s",$fth->key),
> " "x21,
> $fth->loc,"\,\|\$",80);
> }
>
> foreach my $tag ( keys %{$fth->field} ) {
> foreach my $value ( @{$fth->field->{$tag}} ) {
> ####### I added the next 3 lines.
> if (ref ($value) =~ /Bio::Annotation::SimpleValue/) {
> $value = $value->value;
> }
>
> ## --- cut ---
>
> I don't know whether my fix is sufficient or more should be changed to
> maintain coherence... however, my fix seems to work.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list