[Bioperl-l] Parsing Blast reports: Length of sequence of a hit ?

Edith Schlagenhauf ediths at unizh.ch
Thu Oct 28 09:44:30 EDT 2004


1) with the deprecated Bio::Tools::Blast module one could get
the total length of the hit sequence as given in the Genbank
(and in the Blast report) file by using the length() method

Is there any equivalent functionality in Bio::SearchIO ?

2) I use the GenBank.pm module to get from hit accession as
given in a Blast report to gi number,

use strict;
use Bio::DB::GenBank;

my $gb_hit_accession = "AF091802";
my $ref_hit_accession = "XM_480600";


my $seq_obj = $gb->get_Seq_by_acc($ref_hit_accession);
my $primary_id = $seq_obj->primary_id();

print STDOUT "\$primary_id is: $primary_id\n";

a) RefSeq sequences exit with :

-------------------- WARNING ---------------------
MSG: acc (gb|XM_480600) does not exist
Can't call method "primary_id" on an undefined value at ./gbTest.pl line

the reason being that ref| is replaced with gb|.

when I changed the following line of code in GenBank.pm

sub get_Seq_by_acc {
   my ($self,$seqid) = @_;

to :

sub get_Seq_by_acc {
   my ($self,$seqid) = @_;

ie, omitting the "gb|" string, the script proceeded for all seqs
(also for gb| seqs) without problems.
Thus, what for is this "gb|" added?

b) is there a more convenient way to get gi numbers from accession
numbers using Bioperl?

Thanks for your input,

Dr Edith Schlagenhauf
Institute of Plant Biology
University of Zurich
Zollikerstrasse 107
CH-8008 Zurich

e-mail: ediths AT botinst DOT unizh DOT ch
Tel.:	+41 1 634 82 78
Fax :	+41 1 634 82 04

