[Bioperl-l] SearchIO parsing of RPS-BLAST misses last hit within
each result?
M L
grecian_urn2002 at yahoo.com
Mon May 12 14:56:30 EDT 2003
Hi all,
I modified the example script in the SearchIO HOWTO to
parse a RPS-BLAST report and list query names and all
of their associated hits. My script (below) misses
the last hit of each query. I've attached a sample
report with one query and two hits--the script only
lists the first hit. This also occurs in multi-query
reports. I'm using the Bioperl 1.2.1 release. I'm
new at this and am certain that I'm missing the
obvious. Any help would be greatly appreciated.
Michael
########################
#Code follows:
########################
#!/usr/bin/perl -w
use strict;
use Bio::SearchIO;
my $in = new Bio::SearchIO(-format => 'blast',
-file =>
'rpsblast_test_seq');
while( my $result = $in->next_result ) {
while( my $hit = $result->next_hit ) {
print "Query= ", $result->query_name,
",Hit= ", $hit->name,
"\n";
}
}
###################
# sample rpsblast report follows
###################
RPS-BLAST 2.2.2 [Jan-08-2002]
Query= ORFP:YAL001C TFC3 SGDID:S0000001, Chr I from
151167-151098,151007-147595, reverse complement
(1160 letters)
Score E
Sequences producing significant alignments:
(bits) Value
gnl|CDD|3998 smart00686, DM13, Domain present in fly
proteins (C... 27 0.90
gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
accessory ... 26 1.5
>gnl|CDD|3998 smart00686, DM13, Domain present in fly
proteins (CG14681, CG12492,
CG6217), worm H06A10.1 and Arabidopsis
thaliana MBG8.9
Length = 108
Score = 26.6 bits (57), Expect = 0.90
Identities = 11/41 (26%), Positives = 18/41 (43%),
Gaps = 4/41 (9%)
Query: 440 TLNEDNFVALNNT----VRFTTDSDGQDIFFWHGELKIPPN
476
++ DN ++ F+ D +G D +FW G P N
Sbjct: 5 GVSSDNVEIVDAKTLRIPNFSYDGEGPDAYFWVGAGSRPDN
45
>gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
accessory domain
(presumed); Diacylglycerol (DAG) is a second
messenger
that acts as a protein kinase C activator.
DAG can be
produced from the hydrolysis of
phosphatidylinositol
4,5-bisphosphate (PIP2) by a
phosphoinositide-specific
phospholipase C and by the degradation of
phosphatidylcholine (PC) by a phospholipase
C or the
concerted actions of phospholipase D and
phosphatidate
phosphohydrolase. This domain might either
be an
accessory domain or else contribute to the
catalytic
domain. Bacterial homologues are known
Length = 160
Score = 25.8 bits (55), Expect = 1.5
Identities = 15/40 (37%), Positives = 21/40 (52%),
Gaps = 7/40 (17%)
Query: 26 TLNQLWDISGKYFDLSDKKVKQFVLSCVILKKDIEVYCDG 65
N+LW YF L K++ F +C L + IE+ CDG
Sbjct: 33 LKNKLW-----YFKLGTKEL--FFRTCKDLHERIELECDG 65
Lambda K H
0.315 0.133 0.372
Gapped
Lambda K H
0.267 0.0410 0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 1100, Extension: 100
Number of Hits to DB: 445,647
Number of Sequences: 0
Number of extensions: 46484
Number of successful extensions: 40
Number of sequences better than 10.0: 1
Number of HSP's better than 10.0 without gapping: 1
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test:
0
Number of HSP's gapped (non-prelim): 40
length of query: 66
length of database: 76,874
effective HSP length: 71
effective length of query: 66
effective length of database: 30,511
effective search space: 2013726
effective search space used: 89173840
T: 11
A: 40
X1: 1600 (727.7 bits)
X2: 3800 (1463.8 bits)
X3: 6400 (2465.3 bits)
S1: 4100 (1867.7 bits)
S2: 34 (17.7 bits)
__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com
More information about the Bioperl-l
mailing list