[Bioperl-l] SearchIO parsing of RPS-BLAST misses last hit within each result?

M L grecian_urn2002 at yahoo.com
Mon May 12 14:56:30 EDT 2003


Hi all,

I modified the example script in the SearchIO HOWTO to
parse a RPS-BLAST report and list query names and all
of their associated hits.  My script (below) misses
the last hit of each query.  I've attached a sample
report with one query and two hits--the script only
lists the first hit.  This also occurs in multi-query
reports.  I'm using the Bioperl 1.2.1 release.  I'm
new at this and am certain that I'm missing the
obvious.  Any help would be greatly appreciated.

Michael       


########################
#Code follows:
########################

#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   =>
'rpsblast_test_seq');

while( my $result = $in->next_result ) {
   while( my $hit = $result->next_hit ) {
      print "Query= ",     $result->query_name,
            ",Hit= ",       $hit->name, 
            "\n";
      }
}

###################
# sample rpsblast report follows
###################

RPS-BLAST 2.2.2 [Jan-08-2002]

Query= ORFP:YAL001C TFC3 SGDID:S0000001, Chr I from
151167-151098,151007-147595, reverse complement
         (1160 letters)


                                                      
           Score     E
Sequences producing significant alignments:           
            (bits)  Value

gnl|CDD|3998 smart00686, DM13, Domain present in fly
proteins (C...    27  0.90
gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
accessory ...    26  1.5

>gnl|CDD|3998 smart00686, DM13, Domain present in fly
proteins (CG14681, CG12492,
           CG6217), worm H06A10.1 and Arabidopsis
thaliana MBG8.9
          Length = 108

 Score = 26.6 bits (57), Expect = 0.90
 Identities = 11/41 (26%), Positives = 18/41 (43%),
Gaps = 4/41 (9%)

Query: 440 TLNEDNFVALNNT----VRFTTDSDGQDIFFWHGELKIPPN
476
            ++ DN   ++        F+ D +G D +FW G    P N
Sbjct: 5   GVSSDNVEIVDAKTLRIPNFSYDGEGPDAYFWVGAGSRPDN
45


>gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
accessory domain
          (presumed); Diacylglycerol (DAG) is a second
messenger
          that acts as a protein kinase C activator.
DAG can be
          produced from the hydrolysis of
phosphatidylinositol
          4,5-bisphosphate (PIP2) by a
phosphoinositide-specific
          phospholipase C and by the degradation of
          phosphatidylcholine (PC) by a phospholipase
C or the
          concerted actions of phospholipase D and
phosphatidate
          phosphohydrolase. This domain might either
be an
          accessory domain or else contribute to the
catalytic
          domain. Bacterial homologues are known
          Length = 160

 Score = 25.8 bits (55), Expect = 1.5
 Identities = 15/40 (37%), Positives = 21/40 (52%),
Gaps = 7/40 (17%)

Query: 26 TLNQLWDISGKYFDLSDKKVKQFVLSCVILKKDIEVYCDG 65
            N+LW     YF L  K++  F  +C  L + IE+ CDG
Sbjct: 33 LKNKLW-----YFKLGTKEL--FFRTCKDLHERIELECDG 65

Lambda     K      H
   0.315    0.133    0.372 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 

Matrix: BLOSUM62
Gap Penalties: Existence: 1100, Extension: 100
Number of Hits to DB: 445,647
Number of Sequences: 0
Number of extensions: 46484
Number of successful extensions: 40
Number of sequences better than 10.0: 1
Number of HSP's better than 10.0 without gapping: 1
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test:
0
Number of HSP's gapped (non-prelim): 40
length of query: 66
length of database: 76,874
effective HSP length: 71
effective length of query: 66
effective length of database: 30,511
effective search space:  2013726
effective search space used: 89173840
T: 11
A: 40
X1: 1600 (727.7 bits)
X2: 3800 (1463.8 bits)
X3: 6400 (2465.3 bits)
S1: 4100 (1867.7 bits)
S2: 34 (17.7 bits)




__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com


More information about the Bioperl-l mailing list