[Bioperl-l] SearchIO parsing of RPS-BLAST misses last hit within each result?

Jason Stajich jason at cgt.duhs.duke.edu
Tue May 13 10:06:42 EDT 2003


can you post this as a bug to bugzilla.bioperl.org instead please - it
makes it easier to track down and get hold of the sample files.

On Mon, 12 May 2003, M L wrote:

> Hi all,
>
> I modified the example script in the SearchIO HOWTO to
> parse a RPS-BLAST report and list query names and all
> of their associated hits.  My script (below) misses
> the last hit of each query.  I've attached a sample
> report with one query and two hits--the script only
> lists the first hit.  This also occurs in multi-query
> reports.  I'm using the Bioperl 1.2.1 release.  I'm
> new at this and am certain that I'm missing the
> obvious.  Any help would be greatly appreciated.
>
> Michael
>
>
> ########################
> #Code follows:
> ########################
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
>
> my $in = new Bio::SearchIO(-format => 'blast',
>                            -file   =>
> 'rpsblast_test_seq');
>
> while( my $result = $in->next_result ) {
>    while( my $hit = $result->next_hit ) {
>       print "Query= ",     $result->query_name,
>             ",Hit= ",       $hit->name,
>             "\n";
>       }
> }
>
> ###################
> # sample rpsblast report follows
> ###################
>
> RPS-BLAST 2.2.2 [Jan-08-2002]
>
> Query= ORFP:YAL001C TFC3 SGDID:S0000001, Chr I from
> 151167-151098,151007-147595, reverse complement
>          (1160 letters)
>
>
>
>            Score     E
> Sequences producing significant alignments:
>             (bits)  Value
>
> gnl|CDD|3998 smart00686, DM13, Domain present in fly
> proteins (C...    27  0.90
> gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
> accessory ...    26  1.5
>
> >gnl|CDD|3998 smart00686, DM13, Domain present in fly
> proteins (CG14681, CG12492,
>            CG6217), worm H06A10.1 and Arabidopsis
> thaliana MBG8.9
>           Length = 108
>
>  Score = 26.6 bits (57), Expect = 0.90
>  Identities = 11/41 (26%), Positives = 18/41 (43%),
> Gaps = 4/41 (9%)
>
> Query: 440 TLNEDNFVALNNT----VRFTTDSDGQDIFFWHGELKIPPN
> 476
>             ++ DN   ++        F+ D +G D +FW G    P N
> Sbjct: 5   GVSSDNVEIVDAKTLRIPNFSYDGEGPDAYFWVGAGSRPDN
> 45
>
>
> >gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
> accessory domain
>           (presumed); Diacylglycerol (DAG) is a second
> messenger
>           that acts as a protein kinase C activator.
> DAG can be
>           produced from the hydrolysis of
> phosphatidylinositol
>           4,5-bisphosphate (PIP2) by a
> phosphoinositide-specific
>           phospholipase C and by the degradation of
>           phosphatidylcholine (PC) by a phospholipase
> C or the
>           concerted actions of phospholipase D and
> phosphatidate
>           phosphohydrolase. This domain might either
> be an
>           accessory domain or else contribute to the
> catalytic
>           domain. Bacterial homologues are known
>           Length = 160
>
>  Score = 25.8 bits (55), Expect = 1.5
>  Identities = 15/40 (37%), Positives = 21/40 (52%),
> Gaps = 7/40 (17%)
>
> Query: 26 TLNQLWDISGKYFDLSDKKVKQFVLSCVILKKDIEVYCDG 65
>             N+LW     YF L  K++  F  +C  L + IE+ CDG
> Sbjct: 33 LKNKLW-----YFKLGTKEL--FFRTCKDLHERIELECDG 65
>
> Lambda     K      H
>    0.315    0.133    0.372
>
> Gapped
> Lambda     K      H
>    0.267   0.0410    0.140
>
> Matrix: BLOSUM62
> Gap Penalties: Existence: 1100, Extension: 100
> Number of Hits to DB: 445,647
> Number of Sequences: 0
> Number of extensions: 46484
> Number of successful extensions: 40
> Number of sequences better than 10.0: 1
> Number of HSP's better than 10.0 without gapping: 1
> Number of HSP's successfully gapped in prelim test: 0
> Number of HSP's that attempted gapping in prelim test:
> 0
> Number of HSP's gapped (non-prelim): 40
> length of query: 66
> length of database: 76,874
> effective HSP length: 71
> effective length of query: 66
> effective length of database: 30,511
> effective search space:  2013726
> effective search space used: 89173840
> T: 11
> A: 40
> X1: 1600 (727.7 bits)
> X2: 3800 (1463.8 bits)
> X3: 6400 (2465.3 bits)
> S1: 4100 (1867.7 bits)
> S2: 34 (17.7 bits)
>
>
>
>
> __________________________________
> Do you Yahoo!?
> The New Yahoo! Search - Faster. Easier. Bingo.
> http://search.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://pw600a.bioperl.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list