[Bioperl-l] SearchIO parsing of RPS-BLAST misses last hit within each result?

M L grecian_urn2002 at yahoo.com
Tue May 13 14:33:41 EDT 2003


Done!  In addition to submitting to bugzilla, I posted my question,
script, and sample RPS-BLAST file here: 

http://llama.med.harvard.edu/~mlu/bioperl_question/

Thanks and apologies about the repost,
Michael



--- Jason Stajich <jason at cgt.duhs.duke.edu> wrote:
> can you post this as a bug to bugzilla.bioperl.org instead please -
> it
> makes it easier to track down and get hold of the sample files.
> 
> On Mon, 12 May 2003, M L wrote:
> 
> > Hi all,
> >
> > I modified the example script in the SearchIO HOWTO to
> > parse a RPS-BLAST report and list query names and all
> > of their associated hits.  My script (below) misses
> > the last hit of each query.  I've attached a sample
> > report with one query and two hits--the script only
> > lists the first hit.  This also occurs in multi-query
> > reports.  I'm using the Bioperl 1.2.1 release.  I'm
> > new at this and am certain that I'm missing the
> > obvious.  Any help would be greatly appreciated.
> >
> > Michael
> >
> >
> > ########################
> > #Code follows:
> > ########################
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> > use Bio::SearchIO;
> >
> > my $in = new Bio::SearchIO(-format => 'blast',
> >                            -file   =>
> > 'rpsblast_test_seq');
> >
> > while( my $result = $in->next_result ) {
> >    while( my $hit = $result->next_hit ) {
> >       print "Query= ",     $result->query_name,
> >             ",Hit= ",       $hit->name,
> >             "\n";
> >       }
> > }
> >
> > ###################
> > # sample rpsblast report follows
> > ###################
> >
> > RPS-BLAST 2.2.2 [Jan-08-2002]
> >
> > Query= ORFP:YAL001C TFC3 SGDID:S0000001, Chr I from
> > 151167-151098,151007-147595, reverse complement
> >          (1160 letters)
> >
> >
> >
> >            Score     E
> > Sequences producing significant alignments:
> >             (bits)  Value
> >
> > gnl|CDD|3998 smart00686, DM13, Domain present in fly
> > proteins (C...    27  0.90
> > gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
> > accessory ...    26  1.5
> >
> > >gnl|CDD|3998 smart00686, DM13, Domain present in fly
> > proteins (CG14681, CG12492,
> >            CG6217), worm H06A10.1 and Arabidopsis
> > thaliana MBG8.9
> >           Length = 108
> >
> >  Score = 26.6 bits (57), Expect = 0.90
> >  Identities = 11/41 (26%), Positives = 18/41 (43%),
> > Gaps = 4/41 (9%)
> >
> > Query: 440 TLNEDNFVALNNT----VRFTTDSDGQDIFFWHGELKIPPN
> > 476
> >             ++ DN   ++        F+ D +G D +FW G    P N
> > Sbjct: 5   GVSSDNVEIVDAKTLRIPNFSYDGEGPDAYFWVGAGSRPDN
> > 45
> >
> >
> > >gnl|CDD|7288 smart00045, DAGKa, Diacylglycerol kinase
> > accessory domain
> >           (presumed); Diacylglycerol (DAG) is a second
> > messenger
> >           that acts as a protein kinase C activator.
> > DAG can be
> >           produced from the hydrolysis of
> > phosphatidylinositol
> >           4,5-bisphosphate (PIP2) by a
> > phosphoinositide-specific
> >           phospholipase C and by the degradation of
> >           phosphatidylcholine (PC) by a phospholipase
> > C or the
> >           concerted actions of phospholipase D and
> > phosphatidate
> >           phosphohydrolase. This domain might either
> > be an
> >           accessory domain or else contribute to the
> > catalytic
> >           domain. Bacterial homologues are known
> >           Length = 160
> >
> >  Score = 25.8 bits (55), Expect = 1.5
> >  Identities = 15/40 (37%), Positives = 21/40 (52%),
> > Gaps = 7/40 (17%)
> >
> > Query: 26 TLNQLWDISGKYFDLSDKKVKQFVLSCVILKKDIEVYCDG 65
> >             N+LW     YF L  K++  F  +C  L + IE+ CDG
> > Sbjct: 33 LKNKLW-----YFKLGTKEL--FFRTCKDLHERIELECDG 65
> >
> > Lambda     K      H
> >    0.315    0.133    0.372
> >
> > Gapped
> > Lambda     K      H
> >    0.267   0.0410    0.140
> >
> > Matrix: BLOSUM62
> > Gap Penalties: Existence: 1100, Extension: 100
> > Number of Hits to DB: 445,647
> > Number of Sequences: 0
> > Number of extensions: 46484
> > Number of successful extensions: 40
> > Number of sequences better than 10.0: 1
> > Number of HSP's better than 10.0 without gapping: 1
> > Number of HSP's successfully gapped in prelim test: 0
> > Number of HSP's that attempted gapping in prelim test:
> > 0
> > Number of HSP's gapped (non-prelim): 40
> > length of query: 66
> > length of database: 76,874
> > effective HSP length: 71
> > effective length of query: 66
> > effective length of database: 30,511
> > effective search space:  2013726
> > effective search space used: 89173840
> > T: 11
> > A: 40
> > X1: 1600 (727.7 bits)
> > X2: 3800 (1463.8 bits)
> > X3: 6400 (2465.3 bits)
> > S1: 4100 (1867.7 bits)
> > S2: 34 (17.7 bits)
> >
> >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > The New Yahoo! Search - Faster. Easier. Bingo.
> > http://search.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at bioperl.org
> > http://pw600a.bioperl.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com


More information about the Bioperl-l mailing list