[Bioperl-l] Parsing Blast reports: Length of sequence of a hit ?
Edith Schlagenhauf
ediths at unizh.ch
Thu Oct 28 09:44:30 EDT 2004
Hi,
1) with the deprecated Bio::Tools::Blast module one could get
the total length of the hit sequence as given in the Genbank
(and in the Blast report) file by using the length() method
($hit->length()).
Is there any equivalent functionality in Bio::SearchIO ?
2) I use the GenBank.pm module to get from hit accession as
given in a Blast report to gi number,
ie:
-------------------------------------------------------
use strict;
use Bio::DB::GenBank;
my $gb_hit_accession = "AF091802";
my $ref_hit_accession = "XM_480600";
# get PID for NCBI ENTREZ
my $seq_obj = $gb->get_Seq_by_acc($ref_hit_accession);
my $primary_id = $seq_obj->primary_id();
print STDOUT "\$primary_id is: $primary_id\n";
-------------------------------------------------------
a) RefSeq sequences exit with :
-------------------- WARNING ---------------------
MSG: acc (gb|XM_480600) does not exist
---------------------------------------------------
Can't call method "primary_id" on an undefined value at ./gbTest.pl line
23.
the reason being that ref| is replaced with gb|.
when I changed the following line of code in GenBank.pm
sub get_Seq_by_acc {
my ($self,$seqid) = @_;
$self->SUPER::get_Seq_by_acc("gb|$seqid");
}
to :
sub get_Seq_by_acc {
my ($self,$seqid) = @_;
$self->SUPER::get_Seq_by_acc("$seqid");
}
ie, omitting the "gb|" string, the script proceeded for all seqs
(also for gb| seqs) without problems.
Thus, what for is this "gb|" added?
b) is there a more convenient way to get gi numbers from accession
numbers using Bioperl?
Thanks for your input,
Edith
******************************************
Dr Edith Schlagenhauf
Bioinformatics
Institute of Plant Biology
University of Zurich
Zollikerstrasse 107
CH-8008 Zurich
SWITZERLAND
e-mail: ediths AT botinst DOT unizh DOT ch
Tel.: +41 1 634 82 78
Fax : +41 1 634 82 04
******************************************
More information about the Bioperl-l
mailing list