[Bioperl-l] SearchIO parsing of RPS-BLAST misses last hit within
each result?
M L
grecian_urn2002 at yahoo.com
Tue May 13 14:33:41 EDT 2003
Done! In addition to submitting to bugzilla, I posted my question,
script, and sample RPS-BLAST file here:
http://llama.med.harvard.edu/~mlu/bioperl_question/
Thanks and apologies about the repost,
Michael
--- Jason Stajich <jason at cgt.duhs.duke.edu> wrote:
> can you post this as a bug to bugzilla.bioperl.org instead please -
> it
> makes it easier to track down and get hold of the sample files.
>
> On Mon, 12 May 2003, M L wrote:
>
> > Hi all,
> >
> > I modified the example script in the SearchIO HOWTO to
> > parse a RPS-BLAST report and list query names and all
> > of their associated hits. My script (below) misses
> > the last hit of each query. I've attached a sample
> > report with one query and two hits--the script only
> > lists the first hit. This also occurs in multi-query
> > reports. I'm using the Bioperl 1.2.1 release. I'm
> > new at this and am certain that I'm missing the
> > obvious. Any help would be greatly appreciated.
> >
> > Michael
> >
> >
> > ########################
> > #Code follows:
> > ########################
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> > use Bio::SearchIO;
> >
> > my $in = new Bio::SearchIO(-format => 'blast',
> > -file =>
> > 'rpsblast_test_seq');
> >
> > while( my $result = $in->next_result ) {
> > while( my $hit = $result->next_hit ) {
> > print "Query= ", $result->query_name,
> > ",Hit= ", $hit->name,
> > "\n";
> > }
> > }
> >
> > ###################
> > # sample rpsblast report follows
> > ###################
> >
> > RPS-BLAST 2.2.2 [Jan-08-2002]
> >
> > Query= ORFP:YAL001C TFC3 SGDID:S0000001, Chr I from
> > 151167-151098,151007-147595, reverse complement
> > (1160 letters)
> >
> >
> >
> > Score E
> > Sequences producing significant alignments:
> > (bits) Value
> >
> > gnl|CDD|3998 smart00686, DM13, Domain present in fly
> > proteins (C... 27 0.90
> > gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
> > accessory ... 26 1.5
> >
> > >gnl|CDD|3998 smart00686, DM13, Domain present in fly
> > proteins (CG14681, CG12492,
> > CG6217), worm H06A10.1 and Arabidopsis
> > thaliana MBG8.9
> > Length = 108
> >
> > Score = 26.6 bits (57), Expect = 0.90
> > Identities = 11/41 (26%), Positives = 18/41 (43%),
> > Gaps = 4/41 (9%)
> >
> > Query: 440 TLNEDNFVALNNT----VRFTTDSDGQDIFFWHGELKIPPN
> > 476
> > ++ DN ++ F+ D +G D +FW G P N
> > Sbjct: 5 GVSSDNVEIVDAKTLRIPNFSYDGEGPDAYFWVGAGSRPDN
> > 45
> >
> >
> > >gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
> > accessory domain
> > (presumed); Diacylglycerol (DAG) is a second
> > messenger
> > that acts as a protein kinase C activator.
> > DAG can be
> > produced from the hydrolysis of
> > phosphatidylinositol
> > 4,5-bisphosphate (PIP2) by a
> > phosphoinositide-specific
> > phospholipase C and by the degradation of
> > phosphatidylcholine (PC) by a phospholipase
> > C or the
> > concerted actions of phospholipase D and
> > phosphatidate
> > phosphohydrolase. This domain might either
> > be an
> > accessory domain or else contribute to the
> > catalytic
> > domain. Bacterial homologues are known
> > Length = 160
> >
> > Score = 25.8 bits (55), Expect = 1.5
> > Identities = 15/40 (37%), Positives = 21/40 (52%),
> > Gaps = 7/40 (17%)
> >
> > Query: 26 TLNQLWDISGKYFDLSDKKVKQFVLSCVILKKDIEVYCDG 65
> > N+LW YF L K++ F +C L + IE+ CDG
> > Sbjct: 33 LKNKLW-----YFKLGTKEL--FFRTCKDLHERIELECDG 65
> >
> > Lambda K H
> > 0.315 0.133 0.372
> >
> > Gapped
> > Lambda K H
> > 0.267 0.0410 0.140
> >
> > Matrix: BLOSUM62
> > Gap Penalties: Existence: 1100, Extension: 100
> > Number of Hits to DB: 445,647
> > Number of Sequences: 0
> > Number of extensions: 46484
> > Number of successful extensions: 40
> > Number of sequences better than 10.0: 1
> > Number of HSP's better than 10.0 without gapping: 1
> > Number of HSP's successfully gapped in prelim test: 0
> > Number of HSP's that attempted gapping in prelim test:
> > 0
> > Number of HSP's gapped (non-prelim): 40
> > length of query: 66
> > length of database: 76,874
> > effective HSP length: 71
> > effective length of query: 66
> > effective length of database: 30,511
> > effective search space: 2013726
> > effective search space used: 89173840
> > T: 11
> > A: 40
> > X1: 1600 (727.7 bits)
> > X2: 3800 (1463.8 bits)
> > X3: 6400 (2465.3 bits)
> > S1: 4100 (1867.7 bits)
> > S2: 34 (17.7 bits)
> >
> >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > The New Yahoo! Search - Faster. Easier. Bingo.
> > http://search.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at bioperl.org
> > http://pw600a.bioperl.org/mailman/listinfo/bioperl-l
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com
More information about the Bioperl-l
mailing list