[Bioperl-l] SearchIO parsing of RPS-BLAST misses last hit within
each result?
Jason Stajich
jason at cgt.duhs.duke.edu
Tue May 13 10:06:42 EDT 2003
can you post this as a bug to bugzilla.bioperl.org instead please - it
makes it easier to track down and get hold of the sample files.
On Mon, 12 May 2003, M L wrote:
> Hi all,
>
> I modified the example script in the SearchIO HOWTO to
> parse a RPS-BLAST report and list query names and all
> of their associated hits. My script (below) misses
> the last hit of each query. I've attached a sample
> report with one query and two hits--the script only
> lists the first hit. This also occurs in multi-query
> reports. I'm using the Bioperl 1.2.1 release. I'm
> new at this and am certain that I'm missing the
> obvious. Any help would be greatly appreciated.
>
> Michael
>
>
> ########################
> #Code follows:
> ########################
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
>
> my $in = new Bio::SearchIO(-format => 'blast',
> -file =>
> 'rpsblast_test_seq');
>
> while( my $result = $in->next_result ) {
> while( my $hit = $result->next_hit ) {
> print "Query= ", $result->query_name,
> ",Hit= ", $hit->name,
> "\n";
> }
> }
>
> ###################
> # sample rpsblast report follows
> ###################
>
> RPS-BLAST 2.2.2 [Jan-08-2002]
>
> Query= ORFP:YAL001C TFC3 SGDID:S0000001, Chr I from
> 151167-151098,151007-147595, reverse complement
> (1160 letters)
>
>
>
> Score E
> Sequences producing significant alignments:
> (bits) Value
>
> gnl|CDD|3998 smart00686, DM13, Domain present in fly
> proteins (C... 27 0.90
> gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
> accessory ... 26 1.5
>
> >gnl|CDD|3998 smart00686, DM13, Domain present in fly
> proteins (CG14681, CG12492,
> CG6217), worm H06A10.1 and Arabidopsis
> thaliana MBG8.9
> Length = 108
>
> Score = 26.6 bits (57), Expect = 0.90
> Identities = 11/41 (26%), Positives = 18/41 (43%),
> Gaps = 4/41 (9%)
>
> Query: 440 TLNEDNFVALNNT----VRFTTDSDGQDIFFWHGELKIPPN
> 476
> ++ DN ++ F+ D +G D +FW G P N
> Sbjct: 5 GVSSDNVEIVDAKTLRIPNFSYDGEGPDAYFWVGAGSRPDN
> 45
>
>
> >gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
> accessory domain
> (presumed); Diacylglycerol (DAG) is a second
> messenger
> that acts as a protein kinase C activator.
> DAG can be
> produced from the hydrolysis of
> phosphatidylinositol
> 4,5-bisphosphate (PIP2) by a
> phosphoinositide-specific
> phospholipase C and by the degradation of
> phosphatidylcholine (PC) by a phospholipase
> C or the
> concerted actions of phospholipase D and
> phosphatidate
> phosphohydrolase. This domain might either
> be an
> accessory domain or else contribute to the
> catalytic
> domain. Bacterial homologues are known
> Length = 160
>
> Score = 25.8 bits (55), Expect = 1.5
> Identities = 15/40 (37%), Positives = 21/40 (52%),
> Gaps = 7/40 (17%)
>
> Query: 26 TLNQLWDISGKYFDLSDKKVKQFVLSCVILKKDIEVYCDG 65
> N+LW YF L K++ F +C L + IE+ CDG
> Sbjct: 33 LKNKLW-----YFKLGTKEL--FFRTCKDLHERIELECDG 65
>
> Lambda K H
> 0.315 0.133 0.372
>
> Gapped
> Lambda K H
> 0.267 0.0410 0.140
>
> Matrix: BLOSUM62
> Gap Penalties: Existence: 1100, Extension: 100
> Number of Hits to DB: 445,647
> Number of Sequences: 0
> Number of extensions: 46484
> Number of successful extensions: 40
> Number of sequences better than 10.0: 1
> Number of HSP's better than 10.0 without gapping: 1
> Number of HSP's successfully gapped in prelim test: 0
> Number of HSP's that attempted gapping in prelim test:
> 0
> Number of HSP's gapped (non-prelim): 40
> length of query: 66
> length of database: 76,874
> effective HSP length: 71
> effective length of query: 66
> effective length of database: 30,511
> effective search space: 2013726
> effective search space used: 89173840
> T: 11
> A: 40
> X1: 1600 (727.7 bits)
> X2: 3800 (1463.8 bits)
> X3: 6400 (2465.3 bits)
> S1: 4100 (1867.7 bits)
> S2: 34 (17.7 bits)
>
>
>
>
> __________________________________
> Do you Yahoo!?
> The New Yahoo! Search - Faster. Easier. Bingo.
> http://search.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://pw600a.bioperl.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list