[Bioperl-l] Parsing Blast reports: Length of sequence of a hit ?

Jason Stajich jason.stajich at duke.edu
Thu Oct 28 15:27:14 EDT 2004


On Oct 28, 2004, at 9:44 AM, Edith Schlagenhauf wrote:

> Hi,
>
> 1) with the deprecated Bio::Tools::Blast module one could get
> the total length of the hit sequence as given in the Genbank
> (and in the Blast report) file by using the length() method
> ($hit->length()).
>
> Is there any equivalent functionality in Bio::SearchIO ?
>
$hit->length() does this.

See the SearchIO HOWTO for an explanation - 
http://www.bioperl.org/HOWTOs/
>
> 2) I use the GenBank.pm module to get from hit accession as
> given in a Blast report to gi number,
> ie:
>
> -------------------------------------------------------
> use strict;
> use Bio::DB::GenBank;
>
> my $gb_hit_accession = "AF091802";
> my $ref_hit_accession = "XM_480600";
>
> # get PID for NCBI ENTREZ
>
> my $seq_obj = $gb->get_Seq_by_acc($ref_hit_accession);
> my $primary_id = $seq_obj->primary_id();
>
> print STDOUT "\$primary_id is: $primary_id\n";
> -------------------------------------------------------
>
>
> a) RefSeq sequences exit with :
>
> -------------------- WARNING ---------------------
> MSG: acc (gb|XM_480600) does not exist
> ---------------------------------------------------
> Can't call method "primary_id" on an undefined value at ./gbTest.pl 
> line
> 23.
>
> the reason being that ref| is replaced with gb|.
>
> when I changed the following line of code in GenBank.pm
>
> sub get_Seq_by_acc {
>    my ($self,$seqid) = @_;
>    $self->SUPER::get_Seq_by_acc("gb|$seqid");
> }
> to :
>
> sub get_Seq_by_acc {
>    my ($self,$seqid) = @_;
>    $self->SUPER::get_Seq_by_acc("$seqid");
> }
>
> ie, omitting the "gb|" string, the script proceeded for all seqs
> (also for gb| seqs) without problems.
> Thus, what for is this "gb|" added?
>
NCBI required to specify that we are looking for an accession number it 
if I remember correctly, don't really remember though to be honest. 
Perhaps this hack can be removed now, not sure...

>
> b) is there a more convenient way to get gi numbers from accession
> numbers using Bioperl?

Well NCBI provides the resource so would be their call.  you can 
generate a accession to gi lookup table by grepping through your blast 
db and pulling out the gi number and accession, save this in a file and 
build a persistent lookup DB with DB_File or something equivalent.  
That is what I'd do if I am doing lots of these lookups.

Of course if you are trying to do this for BLAST results you can have 
the GI number in the report if you specify just add the
-I T option in your blastall command -- from the blastall docs:

   -I  Show GI's in deflines [T/F]

> Thanks for your input,
> Edith
>
>
>
> ******************************************
> Dr Edith Schlagenhauf
> Bioinformatics
> Institute of Plant Biology
> University of Zurich
> Zollikerstrasse 107
> CH-8008 Zurich
> SWITZERLAND
>
> e-mail: ediths AT botinst DOT unizh DOT ch
> Tel.:	+41 1 634 82 78
> Fax :	+41 1 634 82 04
> ******************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/



More information about the Bioperl-l mailing list