[Bioperl-l] Blastx parser misses scores

Holland, Richard Richard.Holland at agresearch.co.nz
Thu Sep 4 18:14:23 EDT 2003


> Are you talking about the case where you have 50 hits listed in the
summary but say only 25 HSP alignments?

Not sure. There are 10 hits listed in the summary and 18 detailed below
it. We only get scores reported by the parser for the 10 in the summary.

> Can you please provide and example report and code which doesn't
behave as you would expect.

The blast report in question is at the end of this email.

Our code follows:

===========

        my $blastin =
Bio::SearchIO->new(-fh=>$fileRef,-format=>"blast");

        while (1) {
                my $result = $blastin->next_result;
                if (not $result) { last; }

                my $QueryID = $result->query_name;
                my $QueryLength = $result->query_length;

                while(my $hit = $result->next_hit()) {
                        my $hitid = $hit->name;
                        my $score = $hit->raw_score;
                        my $description = $hit->name . " " .
$hit->description;
                        while (my $hsp = $hit->next_hsp) {
                                my $expectation = $hsp->evalue;
                                my $frame = ($hsp->query->frame + 1) *
$hsp->query->strand;
                                my $strand = $hsp->strand;
                                my $hitlength = $hit->length;
                                my $identities = $hsp->num_identical;
                                my $overlaps = $hsp->length('total');
                                my $gaps = $hsp->gaps;
                                my $qstart = $hsp->start('query');
                                my $qstop = $hsp->end('query');
                                my $hstart = $hsp->start('hit');
                                my $hstop = $hsp->end('hit');
                                my $positives = $hsp->num_conserved;
			# Truncated - code goes here that processes the
results
                        }
                }
        }

===========

The blast report looks like this. In the code above, all scores
($hit->raw_score) for hits ">SW:SSRP_DROME Q05344 drosophila
melanogaster (fruit fly). single-strand recognition" onwards come out as
null:

===========

BLASTX 2.2.4 [Aug-26-2002]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= 010404CS0701000001
         (668 letters)

Database: /home/seqstore/ncbi/blast/data/swplus 
           954,989 sequences; 303,757,025 total letters

Searching..................................................done

                                                                 Score
E
Sequences producing significant alignments:                      (bits)
Value

SP_PL:O04235 O04235 vicia faba (broad bean). transcription facto...
358   3e-98
SW:SSTP_CATRO Q39601 catharanthus roseus (rosy periwinkle) (mada...
313   9e-85
SW:SSRP_ARATH Q05153 arabidopsis thaliana (mouse-ear cress). str...
309   1e-83
SP_PL:Q9LGR0 Q9lgr0 oryza sativa (rice). ests au069334(c60619). ...
306   1e-82
SP_PL:Q8LKS8 Q8lks8 oryza sativa (indica cultivar-group). early ...
306   1e-82
SP_PL:Q9LEF5 Q9lef5 zea mays (maize). ssrp1 protein. 10/2002
301   3e-81
SP_OV:Q9W602 Q9w602 xenopus laevis (african clawed frog). duf87....
120   9e-27
SP_RO:Q8CGA6 Q8cga6 mus musculus (mouse). similar to structure s...
115   5e-25
SW:SSRP_HUMAN Q08945 homo sapiens (human). structure-specific re...
114   6e-25
SW:SSRP_MOUSE Q08943 mus musculus (mouse). structure-specific re...
108   5e-23

>SP_PL:O04235 O04235 vicia faba (broad bean). transcription factor.
10/2002
          Length = 642

 Score =  358 bits (919), Expect = 3e-98
 Identities = 172/194 (88%), Positives = 184/194 (94%)
 Frame = +3

Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
           MTDGHLFNNITLG RGGTNPGQIKI+SGGILWKRQGGGK+I+VDK DI+ VTWMKVP++N
Sbjct: 1   MTDGHLFNNITLGXRGGTNPGQIKIYSGGILWKRQGGGKTIDVDKTDIMGVTWMKVPKTN
60

Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
           QLGVQIKDGL YKFTGFRDQDV+SLTNFFQNTFGI V+EKQLSV+GRNWG+VDLNGNMLA
Sbjct: 61  QLGVQIKDGLLYKFTGFRDQDVVSLTNFFQNTFGITVEEKQLSVTGRNWGEVDLNGNMLA
120

Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
620
           FMVGSKQAFEV LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLME+SFHIP+SNTQFV
Sbjct: 121 FMVGSKQAFEVSLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEMSFHIPSSNTQFV
180

Query: 621 GDENTPPXQVFRXK 662
           GDEN P  QVFR K
Sbjct: 181 GDENRPSAQVFRDK 194


>SW:SSTP_CATRO Q39601 catharanthus roseus (rosy periwinkle) (madagascar
periwinkle).
            structure-specific recognition protein 1 homolog (hmg
            protein). 9/2003
          Length = 639

 Score =  313 bits (802), Expect = 9e-85
 Identities = 153/194 (78%), Positives = 174/194 (88%)
 Frame = +3

Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
           M DGHLFNNITLGGRGGTNPGQ+++ SGGILWK+QGG K++EVDK+D+V +TWMKVPRSN
Sbjct: 1   MADGHLFNNITLGGRGGTNPGQLRVHSGGILWKKQGGAKAVEVDKSDMVGLTWMKVPRSN
60

Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
           QLGV+IKDGLFYKFTGFRDQDV SLT++ Q+T GI  +EKQLSVSG+NWG+VDLNGNML 
Sbjct: 61  QLGVRIKDGLFYKFTGFRDQDVASLTSYLQSTCGITPEEKQLSVSGKNWGEVDLNGNMLT
120

Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEF/MWMTQLEPM\EKDSLMEISFHIPNSNTQ
614
           F+VGSKQAFEV LADV+QT LQGKNDV+LEF MWM  LE M  K+SLMEISFH+PNSNTQ
Sbjct: 121 FLVGSKQAFEVSLADVAQTQLQGKNDVMLEF MWMILLEQM RKNSLMEISFHVPNSNTQ
178

Query: 615 FVGDENTPPXQVFRXK 662
           FVGDEN PP QVFR K
Sbjct: 179 FVGDENRPPAQVFRDK 194


>SW:SSRP_ARATH Q05153 arabidopsis thaliana (mouse-ear cress).
structure-specific
            recognition protein 1 homolog (hmg protein). 9/2003
          Length = 646

 Score =  309 bits (792), Expect = 1e-83
 Identities = 148/191 (77%), Positives = 167/191 (86%)
 Frame = +3

Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
           M DGH FNNI+L GRGG NPG +KI SGGI WK+QGGGK++EVD++DIVSV+W KV +SN
Sbjct: 1   MADGHSFNNISLSGRGGKNPGLLKINSGGIQWKKQGGGKAVEVDRSDIVSVSWTKVTKSN
60

Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
           QLGV+ KDGL+YKF GFRDQDV SL++FFQ+++G    EKQLSVSGRNWG+VDL+GN L 
Sbjct: 61  QLGVKTKDGLYYKFVGFRDQDVPSLSSFFQSSYGKTPDEKQLSVSGRNWGEVDLHGNTLT
120

Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
620
           F+VGSKQAFEV LADVSQT LQGKNDV LEFHVDDT GANEKDSLMEISFHIPNSNTQFV
Sbjct: 121 FLVGSKQAFEVSLADVSQTQLQGKNDVTLEFHVDDTAGANEKDSLMEISFHIPNSNTQFV
180

Query: 621 GDENTPPXQVF 653
           GDEN PP QVF
Sbjct: 181 GDENRPPSQVF 191


>SP_PL:Q9LGR0 Q9lgr0 oryza sativa (rice). ests au069334(c60619). 10/2002
          Length = 641

 Score =  306 bits (784), Expect = 1e-82
 Identities = 141/190 (74%), Positives = 164/190 (86%)
 Frame = +3

Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
           MTDGHLFNNI LGGR G+NPGQ K++SGG+ WKRQGGGK+IE++K+D+ SVTWMKVPR+ 
Sbjct: 1   MTDGHLFNNILLGGRAGSNPGQFKVYSGGLAWKRQGGGKTIEIEKSDLTSVTWMKVPRAY
60

Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
           QLGV+ KDGLFYKF GFR+QDV SLTNF Q   G++  EKQLSVSG+NWG +D+NGNML 
Sbjct: 61  QLGVRTKDGLFYKFIGFREQDVSSLTNFMQKNMGLSPDEKQLSVSGQNWGGIDINGNMLT
120

Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
620
           FMVGSKQAFEV LADVSQT +QGK DV+LEFHVDDTTG NEKDSLM++SFH+P SNTQF+
Sbjct: 121 FMVGSKQAFEVSLADVSQTQMQGKTDVLLEFHVDDTTGGNEKDSLMDLSFHVPTSNTQFL
180

Query: 621 GDENTPPXQV 650
           GDEN    QV
Sbjct: 181 GDENRTAAQV 190


>SP_PL:Q8LKS8 Q8lks8 oryza sativa (indica cultivar-group). early drought
induced
            protein. 3/2003
          Length = 641

 Score =  306 bits (784), Expect = 1e-82
 Identities = 141/190 (74%), Positives = 164/190 (86%)
 Frame = +3

Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
           MTDGHLFNNI LGGR G+NPGQ K++SGG+ WKRQGGGK+IE++K+D+ SVTWMKVPR+ 
Sbjct: 1   MTDGHLFNNILLGGRAGSNPGQFKVYSGGLAWKRQGGGKTIEIEKSDLTSVTWMKVPRAY
60

Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
           QLGV+ KDGLFYKF GFR+QDV SLTNF Q   G++  EKQLSVSG+NWG +D+NGNML 
Sbjct: 61  QLGVRTKDGLFYKFIGFREQDVSSLTNFMQKNMGLSPDEKQLSVSGQNWGGIDINGNMLT
120

Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
620
           FMVGSKQAFEV LADVSQT +QGK DV+LEFHVDDTTG NEKDSLM++SFH+P SNTQF+
Sbjct: 121 FMVGSKQAFEVSLADVSQTQMQGKTDVLLEFHVDDTTGGNEKDSLMDLSFHVPTSNTQFL
180

Query: 621 GDENTPPXQV 650
           GDEN    QV
Sbjct: 181 GDENRTAAQV 190


>SP_PL:Q9LEF5 Q9lef5 zea mays (maize). ssrp1 protein. 10/2002
          Length = 639

 Score =  301 bits (772), Expect = 3e-81
 Identities = 138/190 (72%), Positives = 162/190 (84%)
 Frame = +3

Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
           MTDGH FNNI LGGRGGTNPGQ K+ SGG+ WKRQGGGK+IE+DKAD+ +VTWMKVPR+ 
Sbjct: 1   MTDGHHFNNILLGGRGGTNPGQFKVHSGGLAWKRQGGGKTIEIDKADVTAVTWMKVPRAY
60

Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
           QLGV+IK GLFY+F GFR+QDV +LTNF Q   G+   EKQLSVSG+NWG +D++GNML 
Sbjct: 61  QLGVRIKAGLFYRFIGFREQDVSNLTNFIQKNMGVTPDEKQLSVSGQNWGGIDIDGNMLT
120

Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
620
           FMVGSKQAFEV L DV+QT +QGK DV+LE HVDDTTGANEKDSLM++SFH+P SNTQFV
Sbjct: 121 FMVGSKQAFEVSLPDVAQTQMQGKTDVLLELHVDDTTGANEKDSLMDLSFHVPTSNTQFV
180

Query: 621 GDENTPPXQV 650
           GDE+ PP  +
Sbjct: 181 GDESRPPAHI 190


>SP_OV:Q9W602 Q9w602 xenopus laevis (african clawed frog). duf87.
10/2002
          Length = 693

 Score =  120 bits (302), Expect = 9e-27
 Identities = 64/173 (36%), Positives = 100/173 (56%)
 Frame = +3

Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
           M D   FN+I    +G  N G++++   G+++K    GK   +  ADI  V W +V   +
Sbjct: 1   MADTLEFNDIYQEVKGSMNDGRLRLSRAGLMYKNNKTGKVENISAADIAEVVWRRVALGH
60

Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
            + +    G  YK+ GFR+ +   L ++F++ F + + EK L V G NWG V   G +L+
Sbjct: 61  GIKLLTNGGHVYKYDGFRETEYDKLFDYFKSHFSVELVEKDLCVKGWNWGSVRFGGQLLS
120

Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
           F +G + AFE+PL++VSQ    GKN+V LEFH +D    + + SLMEI F++P
Sbjct: 121 FDIGDQPAFELPLSNVSQCT-TGKNEVTLEFHQND----DSEVSLMEIRFYVP 168


>SP_RO:Q8CGA6 Q8cga6 mus musculus (mouse). similar to structure specific
            recognition protein 1. 3/2003
          Length = 711

 Score =  115 bits (287), Expect = 5e-25
 Identities = 59/167 (35%), Positives = 97/167 (57%)
 Frame = +3

Query: 99  FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
278
           FN+I    +G  N G++++   GI++K    GK   +   ++    W +V   + L +  
Sbjct: 7   FNDIFQEVKGSMNDGRLRLSRQGIIFKNSKTGKVDNIQAGELTEGIWRRVALGHGLKLLT
66

Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
458
           K+G  YK+ GFR+ +   L++FF+  + + + EK L V G NWG V   G +L+F +G +
Sbjct: 67  KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ
126

Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
             FE+PL++VSQ    GKN+V LEFH +D    + + SLME+ F++P
Sbjct: 127 PVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAEVSLMEVRFYVP 168


>SW:SSRP_HUMAN Q08945 homo sapiens (human). structure-specific
recognition protein 1
            (ssrp1) (recombination signal sequence recognition
            protein) (t160) (chromatin-specific transcription
            elongation factor 80 kda subunit) (fact 80 kda subunit).
            9/2003
          Length = 709

 Score =  114 bits (286), Expect = 6e-25
 Identities = 58/167 (34%), Positives = 97/167 (57%)
 Frame = +3

Query: 99  FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
278
           FN++    +G  N G++++   GI++K    GK   +   ++    W +V   + L +  
Sbjct: 7   FNDVYQEVKGSMNDGRLRLSRQGIIFKNSKTGKVDNIQAGELTEGIWRRVALGHGLKLLT
66

Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
458
           K+G  YK+ GFR+ +   L++FF+  + + + EK L V G NWG V   G +L+F +G +
Sbjct: 67  KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ
126

Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
             FE+PL++VSQ    GKN+V LEFH +D    + + SLME+ F++P
Sbjct: 127 PVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAEVSLMEVRFYVP 168


>SW:SSRP_MOUSE Q08943 mus musculus (mouse). structure-specific
recognition protein 1
            (ssrp1) (recombination signal sequence recognition
            protein) (t160). 9/2003
          Length = 708

 Score =  108 bits (270), Expect = 5e-23
 Identities = 56/167 (33%), Positives = 95/167 (56%)
 Frame = +3

Query: 99  FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
278
           FN+I    +G  N G++++   GI++K    GK   +   ++    W +V   + L +  
Sbjct: 7   FNDIFQEVKGSMNDGRLRLSPSGIIFKNSKTGKVDNIQAGELTEGIWPRVALGHGLKLLT
66

Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
458
           K+G  YK+ GFR+ +   L++FF+  + + + EK L V G NWG V   G +L+F +G +
Sbjct: 67  KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ
126

Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
             FE+PL++VS    Q + +V LEFH +D    + + SLME+ F++P
Sbjct: 127 PVFEIPLSNVSSVP-QARIEVTLEFHQND----DPEVSLMEVRFYVP 168


>SW:SSRP_DROME Q05344 drosophila melanogaster (fruit fly). single-strand
recognition
            protein (ssrp) (chorion-factor 5). 9/2003
          Length = 723

 Score =  101 bits (251), Expect = 7e-21
 Identities = 63/173 (36%), Positives = 92/173 (52%)
 Frame = +3

Query: 81  MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
           MTD   +N+I    RG    G++K+    I++K    GK  ++   DI  +   K   + 
Sbjct: 1   MTDSLEYNDINAEVRGVLCSGRLKMTEQNIIFKNTKTGKVEQISAEDIDLINSQKFVGTW
60

Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
            L V  K G+ ++FTGFRD +   L  F +  +   + EK++ V G NWG     G++L+
Sbjct: 61  GLRVFTKGGVLHRFTGFRDSEHEKLGKFIKAAYSQEMVEKEMCVKGWNWGTARFMGSVLS
120

Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
           F   SK  FEVPL+ VSQ  + GKN+V LEFH +D         L+E+ FHIP
Sbjct: 121 FDKESKTIFEVPLSHVSQC-VTGKNEVTLEFHQNDDAPV----GLLEMRFHIP 168


>SP_FUN:O94529 O94529 schizosaccharomyces pombe (fission yeast).
putative structure
            specific recognition protein. 3/2003
          Length = 512

 Score = 96.7 bits (239), Expect = 2e-19
 Identities = 48/161 (29%), Positives = 86/161 (52%), Gaps = 2/161 (1%)
 Frame = +3

Query: 138 PGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRD
317
           PG+++I   G+ WK     +   +  ++I    W +  R  +L + +K        GF  
Sbjct: 19  PGKLRIAPSGLGWKSPSLAEPFTLPISEIRRFCWSRFARGYELKIILKSKDPVSLDGFSQ
78

Query: 318 QDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQT
497
           +D+  L N  +  F + +++K+ S+ G NWG+ +  G+ L F V S+ AFE+P++ V+ T
Sbjct: 79  EDLDDLINVIKQNFDMGIEQKEFSIKGWNWGEANFLGSELVFDVNSRPAFEIPISAVTNT
138

Query: 498 NLQGKNDVILEFHV--DDTTGANEKDSLMEISFHIPNSNTQ 614
           NL GKN+V LEF    D    + + D L+E+  ++P +  +
Sbjct: 139 NLSGKNEVALEFSTTDDKQIPSAQVDELVEMRLYVPGTTAK 179


>SW:SSRP_CHICK Q04678 gallus gallus (chicken). structure-specific
recognition
            protein 1 (ssrp1) (recombination signal sequence
            recognition protein) (t160) (fragment). 9/2003
          Length = 669

 Score = 95.9 bits (237), Expect = 3e-19
 Identities = 48/131 (36%), Positives = 79/131 (59%)
 Frame = +3

Query: 207 VDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQL
386
           +  +++    W +V   + L +  K+G  YK+ GFR+ +   L++FF+  + + + EK L
Sbjct: 5   IQASELAEGVWRRVALGHGLKLLTKNGHVYKYDGFRESEFDKLSDFFKAHYRLELAEKDL
64

Query: 387 SVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEK
566
            V G NWG V   G +L+F +G +  FE+PL++VSQ    GKN+V LEFH +D    + +
Sbjct: 65  CVKGWNWGTVRFGGQLLSFDIGEQPVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAE
119

Query: 567 DSLMEISFHIP 599
            SLME+ F++P
Sbjct: 120 VSLMEVRFYVP 130


>SP_IN:Q8IL56 Q8il56 plasmodium falciparum (isolate 3d7). structure
specific
            recognition protein, putative. 3/2003
          Length = 506

 Score = 94.0 bits (232), Expect = 1e-18
 Identities = 50/170 (29%), Positives = 89/170 (51%), Gaps = 5/170 (2%)
 Frame = +3

Query: 120 GRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN-----QLGVQIKD
284
           G GG++ G  ++ +  + WK +      +   +DI    W+K   +N     +LG + K+
Sbjct: 21  GFGGSDFGSFRMSNEFLGWKNKKTNNVYQYKCSDIDEGCWIKTSYNNNRLHLKLG-ESKE
79

Query: 285 GLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQA
464
            +   F GF D++V  +T  FQ  F I +  ++++  G NWG+  L  + L F + +K A
Sbjct: 80  NIIIYFDGFPDRNVNEITQHFQKYFNIRLNNRKIATKGWNWGEFKLENSNLCFDIDNKYA
139

Query: 465 FEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQ 614
           F +P  +++Q N+Q K D+ +EF  D+      +D L EI F+ P+ N +
Sbjct: 140 FNLPTNNINQLNVQIKTDIAMEFKNDENNNKGNEDFLAEIRFYYPHENDE 189


>SW:SSRP_CAEEL P41848 caenorhabditis elegans. probable
structure-specific
            recognition protein 1 (ssrp1) (recombination signal
            sequence recognition protein). 9/2003
          Length = 697

 Score = 92.0 bits (227), Expect = 4e-18
 Identities = 48/153 (31%), Positives = 82/153 (53%)
 Frame = +3

Query: 141 GQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRDQ
320
           G +K+    + +K   GGKS+ V  +DI  + W K+     L V + DG  ++F GF+D 
Sbjct: 20  GTLKLTEKSLNFKGDKGGKSVNVTGSDIDKLKWQKLGNKPGLRVGLNDGGAHRFGGFKDT
79

Query: 321 DVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQTN
500
           D+  + +F  + +  ++ +  L + G N+G  ++ G  + F    K  FE+P  +VS   
Sbjct: 80  DLEKIQSFTSSNWSQSIDQSNLFIKGWNYGQAEVKGKTVEFSWEDKPIFEIPCTNVSNV-
138

Query: 501 LQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
           +  KN+ +LEFH +D    + K  LME+ FH+P
Sbjct: 139 IANKNEAVLEFHQND----DSKVQLMEMRFHMP 167


>SW:YMG9_YEAST Q04636 saccharomyces cerevisiae (baker's yeast).
hypothetical 63.0
            kda protein in dak1-orc1 intergenic region. 5/2000
          Length = 552

 Score = 89.0 bits (219), Expect = 4e-17
 Identities = 50/161 (31%), Positives = 80/161 (49%), Gaps = 8/161 (4%)
 Frame = +3

Query: 141 GQIKIFSGGILWK--RQGGGKSIEVDK------ADIVSVTWMKVPRSNQLGVQIKDGLFY
296
           G+ +I   G+ WK    GG  + +  K       ++ +V W +  R   L +  K+    
Sbjct: 17  GRFRIADSGLGWKISTSGGSAANQARKPFLLPATELSTVQWSRGCRGYDLKINTKNQGVI
76

Query: 297 KFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVP
476
           +  GF   D   + N F   F I V++++ S+ G NWG  DL  N + F +  K  FE+P
Sbjct: 77  QLDGFSQDDYNLIKNDFHRRFNIQVEQREHSLRGWNWGKTDLARNEMVFALNGKPTFEIP
136

Query: 477 LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
            A ++ TNL  KN+V +EF++ D       D L+E+ F+IP
Sbjct: 137 YARINNTNLTSKNEVGIEFNIQDEEYQPAGDELVEMRFYIP 177


>SP_IN:O01683 O01683 caenorhabditis elegans. c32f10.5 protein. 3/2003
          Length = 689

 Score = 86.7 bits (213), Expect = 2e-16
 Identities = 50/186 (26%), Positives = 90/186 (47%)
 Frame = +3

Query: 99  FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
278
           F  + +   G    G + +    I +    GGKS+ +   D+  + W K+     L V +
Sbjct: 6   FKGVYVEDIGHLTCGTLTLTENSINFIGDKGGKSVYITGTDVDKLKWQKLGNKPGLRVGL
65

Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
458
            DG  ++F GF D D+  + +F  + +  ++ +  L ++G N+G  D+ G  + F   ++
Sbjct: 66  SDGGAHRFGGFLDDDLQKIQSFTSSNWSKSINQSNLFINGWNYGQADVKGKNIEFSWENE
125

Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFVGDENTP
638
             FE+P  +VS   +  KN+ ILEFH ++      K  LME+ FH+P        +E+T 
Sbjct: 126 PIFEIPCTNVSNV-IANKNEAILEFHQNE----QSKVQLMEMRFHMP---VDLENEEDTD
177

Query: 639 PXQVFR 656
             + F+
Sbjct: 178 KVEEFK 183


>SP_FUN:Q9HFC4 Q9hfc4 zygosaccharomyces rouxii (candida mogii).
ssrp1-like protein
            (fragment). 10/2002
          Length = 542

 Score = 85.1 bits (209), Expect = 5e-16
 Identities = 48/165 (29%), Positives = 79/165 (47%), Gaps = 8/165 (4%)
 Frame = +3

Query: 141 GQIKIFSGGILWKRQGGGKSIE--------VDKADIVSVTWMKVPRSNQLGVQIKDGLFY
296
           G+ +I   G+ WK    G S          +   ++ +V W +  R  +L V  K+    
Sbjct: 45  GRFRIADSGLGWKSANAGGSAANQSKQPFLLPATELSTVQWSRGCRGFELKVNTKNQGVV
104

Query: 297 KFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVP
476
           +  GF   D   + N F   F + V+ K+ S+ G NWG  DL  N + F +  + +FEVP
Sbjct: 105 QLDGFAPDDFNLIKNDFHRRFNVQVEPKEHSLRGWNWGKADLARNEMVFALNGRPSFEVP
164

Query: 477 LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNT 611
            A ++ TNL  K +V +EF++ D       D L+E+  ++P + T
Sbjct: 165 YARINNTNLTSKTEVAIEFNLADENYQPAGDELVEMRLYVPGTVT 209


  Database: /home/seqstore/ncbi/blast/data/swplus
    Posted date:  Apr 15, 2003 12:04 PM
  Number of letters in database: 303,757,025
  Number of sequences in database:  954,989
  
Lambda     K      H
   0.318    0.135    0.401 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 385,793,622
Number of Sequences: 954989
Number of extensions: 8541745
Number of successful extensions: 21678
Number of sequences better than 1.0e-06: 36
Number of HSP's better than  0.0 without gapping: 21171
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 21664
length of database: 303,757,025
effective HSP length: 116
effective length of database: 192,978,301
effective search space used: 20455699906
frameshift window, decay const: 50,  0.1
T: 12
A: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)

===========

Richard Holland
Bioinformatics Database Developer
ITS, Agresearch Invermay x3279



-----Original Message-----
From: Jason Stajich [mailto:jason at cgt.duhs.duke.edu] 
Sent: Friday, 5 September 2003 9:39 a.m.
To: Holland, Richard
Cc: bioperl-l at bioperl.org; McCulloch, Alan
Subject: Re: [Bioperl-l] Blastx parser misses scores


Can you please provide and example report and code which doesn't behave
as you would expect.

Are you talking about the case where you have 50 hits listed in the
summary but say only 25 HSP alignments?


On Fri, 5 Sep 2003, Holland, Richard wrote:

> Hi,
>
> I have run into a problem with Bio::SearchIO::blast parsing blastx 
> result files. This may affect other blast outputs as well but I'm not 
> sure.
>
> At the top of a blastx output there is a summary of the best hits in 
> the results file. Then, all the hits are listed, even the ones which 
> are not in the best hits list.
>
> The Bio::Perl parser successfully parses all the hits from the file, 
> however it only returns scores for those which appear in the summary. 
> I have found the code which does this in Bio::SearchIO::blast and 
> noticed that this seems to be deliberate - in all cases, blastx or 
> not, the scores are taken from the summary, and the scores in the hit 
> details appear to be ignored.
>
> Is this a feature or a bug? We would like to be able to use Bio::Perl 
> to parse out all the results from our blast reports including all 
> their scores and details, regardless of whether or not they appear in 
> the best hits summary.
>
> Can anyone help?
>
> cheers,
> Richard 
> ======================================================================
> =
> Attention: The information contained in this message and/or
attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or
privileged
> material. Any review, retransmission, dissemination or other use of,
or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by
AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
>
=======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org 
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the Bioperl-l mailing list