[Bioperl-l] Blastx parser misses scores
Holland, Richard
Richard.Holland at agresearch.co.nz
Thu Sep 4 18:14:23 EDT 2003
> Are you talking about the case where you have 50 hits listed in the
summary but say only 25 HSP alignments?
Not sure. There are 10 hits listed in the summary and 18 detailed below
it. We only get scores reported by the parser for the 10 in the summary.
> Can you please provide and example report and code which doesn't
behave as you would expect.
The blast report in question is at the end of this email.
Our code follows:
===========
my $blastin =
Bio::SearchIO->new(-fh=>$fileRef,-format=>"blast");
while (1) {
my $result = $blastin->next_result;
if (not $result) { last; }
my $QueryID = $result->query_name;
my $QueryLength = $result->query_length;
while(my $hit = $result->next_hit()) {
my $hitid = $hit->name;
my $score = $hit->raw_score;
my $description = $hit->name . " " .
$hit->description;
while (my $hsp = $hit->next_hsp) {
my $expectation = $hsp->evalue;
my $frame = ($hsp->query->frame + 1) *
$hsp->query->strand;
my $strand = $hsp->strand;
my $hitlength = $hit->length;
my $identities = $hsp->num_identical;
my $overlaps = $hsp->length('total');
my $gaps = $hsp->gaps;
my $qstart = $hsp->start('query');
my $qstop = $hsp->end('query');
my $hstart = $hsp->start('hit');
my $hstop = $hsp->end('hit');
my $positives = $hsp->num_conserved;
# Truncated - code goes here that processes the
results
}
}
}
===========
The blast report looks like this. In the code above, all scores
($hit->raw_score) for hits ">SW:SSRP_DROME Q05344 drosophila
melanogaster (fruit fly). single-strand recognition" onwards come out as
null:
===========
BLASTX 2.2.4 [Aug-26-2002]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= 010404CS0701000001
(668 letters)
Database: /home/seqstore/ncbi/blast/data/swplus
954,989 sequences; 303,757,025 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits)
Value
SP_PL:O04235 O04235 vicia faba (broad bean). transcription facto...
358 3e-98
SW:SSTP_CATRO Q39601 catharanthus roseus (rosy periwinkle) (mada...
313 9e-85
SW:SSRP_ARATH Q05153 arabidopsis thaliana (mouse-ear cress). str...
309 1e-83
SP_PL:Q9LGR0 Q9lgr0 oryza sativa (rice). ests au069334(c60619). ...
306 1e-82
SP_PL:Q8LKS8 Q8lks8 oryza sativa (indica cultivar-group). early ...
306 1e-82
SP_PL:Q9LEF5 Q9lef5 zea mays (maize). ssrp1 protein. 10/2002
301 3e-81
SP_OV:Q9W602 Q9w602 xenopus laevis (african clawed frog). duf87....
120 9e-27
SP_RO:Q8CGA6 Q8cga6 mus musculus (mouse). similar to structure s...
115 5e-25
SW:SSRP_HUMAN Q08945 homo sapiens (human). structure-specific re...
114 6e-25
SW:SSRP_MOUSE Q08943 mus musculus (mouse). structure-specific re...
108 5e-23
>SP_PL:O04235 O04235 vicia faba (broad bean). transcription factor.
10/2002
Length = 642
Score = 358 bits (919), Expect = 3e-98
Identities = 172/194 (88%), Positives = 184/194 (94%)
Frame = +3
Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
MTDGHLFNNITLG RGGTNPGQIKI+SGGILWKRQGGGK+I+VDK DI+ VTWMKVP++N
Sbjct: 1 MTDGHLFNNITLGXRGGTNPGQIKIYSGGILWKRQGGGKTIDVDKTDIMGVTWMKVPKTN
60
Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
QLGVQIKDGL YKFTGFRDQDV+SLTNFFQNTFGI V+EKQLSV+GRNWG+VDLNGNMLA
Sbjct: 61 QLGVQIKDGLLYKFTGFRDQDVVSLTNFFQNTFGITVEEKQLSVTGRNWGEVDLNGNMLA
120
Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
620
FMVGSKQAFEV LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLME+SFHIP+SNTQFV
Sbjct: 121 FMVGSKQAFEVSLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEMSFHIPSSNTQFV
180
Query: 621 GDENTPPXQVFRXK 662
GDEN P QVFR K
Sbjct: 181 GDENRPSAQVFRDK 194
>SW:SSTP_CATRO Q39601 catharanthus roseus (rosy periwinkle) (madagascar
periwinkle).
structure-specific recognition protein 1 homolog (hmg
protein). 9/2003
Length = 639
Score = 313 bits (802), Expect = 9e-85
Identities = 153/194 (78%), Positives = 174/194 (88%)
Frame = +3
Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
M DGHLFNNITLGGRGGTNPGQ+++ SGGILWK+QGG K++EVDK+D+V +TWMKVPRSN
Sbjct: 1 MADGHLFNNITLGGRGGTNPGQLRVHSGGILWKKQGGAKAVEVDKSDMVGLTWMKVPRSN
60
Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
QLGV+IKDGLFYKFTGFRDQDV SLT++ Q+T GI +EKQLSVSG+NWG+VDLNGNML
Sbjct: 61 QLGVRIKDGLFYKFTGFRDQDVASLTSYLQSTCGITPEEKQLSVSGKNWGEVDLNGNMLT
120
Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEF/MWMTQLEPM\EKDSLMEISFHIPNSNTQ
614
F+VGSKQAFEV LADV+QT LQGKNDV+LEF MWM LE M K+SLMEISFH+PNSNTQ
Sbjct: 121 FLVGSKQAFEVSLADVAQTQLQGKNDVMLEF MWMILLEQM RKNSLMEISFHVPNSNTQ
178
Query: 615 FVGDENTPPXQVFRXK 662
FVGDEN PP QVFR K
Sbjct: 179 FVGDENRPPAQVFRDK 194
>SW:SSRP_ARATH Q05153 arabidopsis thaliana (mouse-ear cress).
structure-specific
recognition protein 1 homolog (hmg protein). 9/2003
Length = 646
Score = 309 bits (792), Expect = 1e-83
Identities = 148/191 (77%), Positives = 167/191 (86%)
Frame = +3
Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
M DGH FNNI+L GRGG NPG +KI SGGI WK+QGGGK++EVD++DIVSV+W KV +SN
Sbjct: 1 MADGHSFNNISLSGRGGKNPGLLKINSGGIQWKKQGGGKAVEVDRSDIVSVSWTKVTKSN
60
Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
QLGV+ KDGL+YKF GFRDQDV SL++FFQ+++G EKQLSVSGRNWG+VDL+GN L
Sbjct: 61 QLGVKTKDGLYYKFVGFRDQDVPSLSSFFQSSYGKTPDEKQLSVSGRNWGEVDLHGNTLT
120
Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
620
F+VGSKQAFEV LADVSQT LQGKNDV LEFHVDDT GANEKDSLMEISFHIPNSNTQFV
Sbjct: 121 FLVGSKQAFEVSLADVSQTQLQGKNDVTLEFHVDDTAGANEKDSLMEISFHIPNSNTQFV
180
Query: 621 GDENTPPXQVF 653
GDEN PP QVF
Sbjct: 181 GDENRPPSQVF 191
>SP_PL:Q9LGR0 Q9lgr0 oryza sativa (rice). ests au069334(c60619). 10/2002
Length = 641
Score = 306 bits (784), Expect = 1e-82
Identities = 141/190 (74%), Positives = 164/190 (86%)
Frame = +3
Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
MTDGHLFNNI LGGR G+NPGQ K++SGG+ WKRQGGGK+IE++K+D+ SVTWMKVPR+
Sbjct: 1 MTDGHLFNNILLGGRAGSNPGQFKVYSGGLAWKRQGGGKTIEIEKSDLTSVTWMKVPRAY
60
Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
QLGV+ KDGLFYKF GFR+QDV SLTNF Q G++ EKQLSVSG+NWG +D+NGNML
Sbjct: 61 QLGVRTKDGLFYKFIGFREQDVSSLTNFMQKNMGLSPDEKQLSVSGQNWGGIDINGNMLT
120
Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
620
FMVGSKQAFEV LADVSQT +QGK DV+LEFHVDDTTG NEKDSLM++SFH+P SNTQF+
Sbjct: 121 FMVGSKQAFEVSLADVSQTQMQGKTDVLLEFHVDDTTGGNEKDSLMDLSFHVPTSNTQFL
180
Query: 621 GDENTPPXQV 650
GDEN QV
Sbjct: 181 GDENRTAAQV 190
>SP_PL:Q8LKS8 Q8lks8 oryza sativa (indica cultivar-group). early drought
induced
protein. 3/2003
Length = 641
Score = 306 bits (784), Expect = 1e-82
Identities = 141/190 (74%), Positives = 164/190 (86%)
Frame = +3
Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
MTDGHLFNNI LGGR G+NPGQ K++SGG+ WKRQGGGK+IE++K+D+ SVTWMKVPR+
Sbjct: 1 MTDGHLFNNILLGGRAGSNPGQFKVYSGGLAWKRQGGGKTIEIEKSDLTSVTWMKVPRAY
60
Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
QLGV+ KDGLFYKF GFR+QDV SLTNF Q G++ EKQLSVSG+NWG +D+NGNML
Sbjct: 61 QLGVRTKDGLFYKFIGFREQDVSSLTNFMQKNMGLSPDEKQLSVSGQNWGGIDINGNMLT
120
Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
620
FMVGSKQAFEV LADVSQT +QGK DV+LEFHVDDTTG NEKDSLM++SFH+P SNTQF+
Sbjct: 121 FMVGSKQAFEVSLADVSQTQMQGKTDVLLEFHVDDTTGGNEKDSLMDLSFHVPTSNTQFL
180
Query: 621 GDENTPPXQV 650
GDEN QV
Sbjct: 181 GDENRTAAQV 190
>SP_PL:Q9LEF5 Q9lef5 zea mays (maize). ssrp1 protein. 10/2002
Length = 639
Score = 301 bits (772), Expect = 3e-81
Identities = 138/190 (72%), Positives = 162/190 (84%)
Frame = +3
Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
MTDGH FNNI LGGRGGTNPGQ K+ SGG+ WKRQGGGK+IE+DKAD+ +VTWMKVPR+
Sbjct: 1 MTDGHHFNNILLGGRGGTNPGQFKVHSGGLAWKRQGGGKTIEIDKADVTAVTWMKVPRAY
60
Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
QLGV+IK GLFY+F GFR+QDV +LTNF Q G+ EKQLSVSG+NWG +D++GNML
Sbjct: 61 QLGVRIKAGLFYRFIGFREQDVSNLTNFIQKNMGVTPDEKQLSVSGQNWGGIDIDGNMLT
120
Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV
620
FMVGSKQAFEV L DV+QT +QGK DV+LE HVDDTTGANEKDSLM++SFH+P SNTQFV
Sbjct: 121 FMVGSKQAFEVSLPDVAQTQMQGKTDVLLELHVDDTTGANEKDSLMDLSFHVPTSNTQFV
180
Query: 621 GDENTPPXQV 650
GDE+ PP +
Sbjct: 181 GDESRPPAHI 190
>SP_OV:Q9W602 Q9w602 xenopus laevis (african clawed frog). duf87.
10/2002
Length = 693
Score = 120 bits (302), Expect = 9e-27
Identities = 64/173 (36%), Positives = 100/173 (56%)
Frame = +3
Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
M D FN+I +G N G++++ G+++K GK + ADI V W +V +
Sbjct: 1 MADTLEFNDIYQEVKGSMNDGRLRLSRAGLMYKNNKTGKVENISAADIAEVVWRRVALGH
60
Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
+ + G YK+ GFR+ + L ++F++ F + + EK L V G NWG V G +L+
Sbjct: 61 GIKLLTNGGHVYKYDGFRETEYDKLFDYFKSHFSVELVEKDLCVKGWNWGSVRFGGQLLS
120
Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
F +G + AFE+PL++VSQ GKN+V LEFH +D + + SLMEI F++P
Sbjct: 121 FDIGDQPAFELPLSNVSQCT-TGKNEVTLEFHQND----DSEVSLMEIRFYVP 168
>SP_RO:Q8CGA6 Q8cga6 mus musculus (mouse). similar to structure specific
recognition protein 1. 3/2003
Length = 711
Score = 115 bits (287), Expect = 5e-25
Identities = 59/167 (35%), Positives = 97/167 (57%)
Frame = +3
Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
278
FN+I +G N G++++ GI++K GK + ++ W +V + L +
Sbjct: 7 FNDIFQEVKGSMNDGRLRLSRQGIIFKNSKTGKVDNIQAGELTEGIWRRVALGHGLKLLT
66
Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
458
K+G YK+ GFR+ + L++FF+ + + + EK L V G NWG V G +L+F +G +
Sbjct: 67 KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ
126
Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
FE+PL++VSQ GKN+V LEFH +D + + SLME+ F++P
Sbjct: 127 PVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAEVSLMEVRFYVP 168
>SW:SSRP_HUMAN Q08945 homo sapiens (human). structure-specific
recognition protein 1
(ssrp1) (recombination signal sequence recognition
protein) (t160) (chromatin-specific transcription
elongation factor 80 kda subunit) (fact 80 kda subunit).
9/2003
Length = 709
Score = 114 bits (286), Expect = 6e-25
Identities = 58/167 (34%), Positives = 97/167 (57%)
Frame = +3
Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
278
FN++ +G N G++++ GI++K GK + ++ W +V + L +
Sbjct: 7 FNDVYQEVKGSMNDGRLRLSRQGIIFKNSKTGKVDNIQAGELTEGIWRRVALGHGLKLLT
66
Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
458
K+G YK+ GFR+ + L++FF+ + + + EK L V G NWG V G +L+F +G +
Sbjct: 67 KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ
126
Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
FE+PL++VSQ GKN+V LEFH +D + + SLME+ F++P
Sbjct: 127 PVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAEVSLMEVRFYVP 168
>SW:SSRP_MOUSE Q08943 mus musculus (mouse). structure-specific
recognition protein 1
(ssrp1) (recombination signal sequence recognition
protein) (t160). 9/2003
Length = 708
Score = 108 bits (270), Expect = 5e-23
Identities = 56/167 (33%), Positives = 95/167 (56%)
Frame = +3
Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
278
FN+I +G N G++++ GI++K GK + ++ W +V + L +
Sbjct: 7 FNDIFQEVKGSMNDGRLRLSPSGIIFKNSKTGKVDNIQAGELTEGIWPRVALGHGLKLLT
66
Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
458
K+G YK+ GFR+ + L++FF+ + + + EK L V G NWG V G +L+F +G +
Sbjct: 67 KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ
126
Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
FE+PL++VS Q + +V LEFH +D + + SLME+ F++P
Sbjct: 127 PVFEIPLSNVSSVP-QARIEVTLEFHQND----DPEVSLMEVRFYVP 168
>SW:SSRP_DROME Q05344 drosophila melanogaster (fruit fly). single-strand
recognition
protein (ssrp) (chorion-factor 5). 9/2003
Length = 723
Score = 101 bits (251), Expect = 7e-21
Identities = 63/173 (36%), Positives = 92/173 (52%)
Frame = +3
Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN
260
MTD +N+I RG G++K+ I++K GK ++ DI + K +
Sbjct: 1 MTDSLEYNDINAEVRGVLCSGRLKMTEQNIIFKNTKTGKVEQISAEDIDLINSQKFVGTW
60
Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA
440
L V K G+ ++FTGFRD + L F + + + EK++ V G NWG G++L+
Sbjct: 61 GLRVFTKGGVLHRFTGFRDSEHEKLGKFIKAAYSQEMVEKEMCVKGWNWGTARFMGSVLS
120
Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
F SK FEVPL+ VSQ + GKN+V LEFH +D L+E+ FHIP
Sbjct: 121 FDKESKTIFEVPLSHVSQC-VTGKNEVTLEFHQNDDAPV----GLLEMRFHIP 168
>SP_FUN:O94529 O94529 schizosaccharomyces pombe (fission yeast).
putative structure
specific recognition protein. 3/2003
Length = 512
Score = 96.7 bits (239), Expect = 2e-19
Identities = 48/161 (29%), Positives = 86/161 (52%), Gaps = 2/161 (1%)
Frame = +3
Query: 138 PGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRD
317
PG+++I G+ WK + + ++I W + R +L + +K GF
Sbjct: 19 PGKLRIAPSGLGWKSPSLAEPFTLPISEIRRFCWSRFARGYELKIILKSKDPVSLDGFSQ
78
Query: 318 QDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQT
497
+D+ L N + F + +++K+ S+ G NWG+ + G+ L F V S+ AFE+P++ V+ T
Sbjct: 79 EDLDDLINVIKQNFDMGIEQKEFSIKGWNWGEANFLGSELVFDVNSRPAFEIPISAVTNT
138
Query: 498 NLQGKNDVILEFHV--DDTTGANEKDSLMEISFHIPNSNTQ 614
NL GKN+V LEF D + + D L+E+ ++P + +
Sbjct: 139 NLSGKNEVALEFSTTDDKQIPSAQVDELVEMRLYVPGTTAK 179
>SW:SSRP_CHICK Q04678 gallus gallus (chicken). structure-specific
recognition
protein 1 (ssrp1) (recombination signal sequence
recognition protein) (t160) (fragment). 9/2003
Length = 669
Score = 95.9 bits (237), Expect = 3e-19
Identities = 48/131 (36%), Positives = 79/131 (59%)
Frame = +3
Query: 207 VDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQL
386
+ +++ W +V + L + K+G YK+ GFR+ + L++FF+ + + + EK L
Sbjct: 5 IQASELAEGVWRRVALGHGLKLLTKNGHVYKYDGFRESEFDKLSDFFKAHYRLELAEKDL
64
Query: 387 SVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEK
566
V G NWG V G +L+F +G + FE+PL++VSQ GKN+V LEFH +D + +
Sbjct: 65 CVKGWNWGTVRFGGQLLSFDIGEQPVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAE
119
Query: 567 DSLMEISFHIP 599
SLME+ F++P
Sbjct: 120 VSLMEVRFYVP 130
>SP_IN:Q8IL56 Q8il56 plasmodium falciparum (isolate 3d7). structure
specific
recognition protein, putative. 3/2003
Length = 506
Score = 94.0 bits (232), Expect = 1e-18
Identities = 50/170 (29%), Positives = 89/170 (51%), Gaps = 5/170 (2%)
Frame = +3
Query: 120 GRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN-----QLGVQIKD
284
G GG++ G ++ + + WK + + +DI W+K +N +LG + K+
Sbjct: 21 GFGGSDFGSFRMSNEFLGWKNKKTNNVYQYKCSDIDEGCWIKTSYNNNRLHLKLG-ESKE
79
Query: 285 GLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQA
464
+ F GF D++V +T FQ F I + ++++ G NWG+ L + L F + +K A
Sbjct: 80 NIIIYFDGFPDRNVNEITQHFQKYFNIRLNNRKIATKGWNWGEFKLENSNLCFDIDNKYA
139
Query: 465 FEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQ 614
F +P +++Q N+Q K D+ +EF D+ +D L EI F+ P+ N +
Sbjct: 140 FNLPTNNINQLNVQIKTDIAMEFKNDENNNKGNEDFLAEIRFYYPHENDE 189
>SW:SSRP_CAEEL P41848 caenorhabditis elegans. probable
structure-specific
recognition protein 1 (ssrp1) (recombination signal
sequence recognition protein). 9/2003
Length = 697
Score = 92.0 bits (227), Expect = 4e-18
Identities = 48/153 (31%), Positives = 82/153 (53%)
Frame = +3
Query: 141 GQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRDQ
320
G +K+ + +K GGKS+ V +DI + W K+ L V + DG ++F GF+D
Sbjct: 20 GTLKLTEKSLNFKGDKGGKSVNVTGSDIDKLKWQKLGNKPGLRVGLNDGGAHRFGGFKDT
79
Query: 321 DVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQTN
500
D+ + +F + + ++ + L + G N+G ++ G + F K FE+P +VS
Sbjct: 80 DLEKIQSFTSSNWSQSIDQSNLFIKGWNYGQAEVKGKTVEFSWEDKPIFEIPCTNVSNV-
138
Query: 501 LQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
+ KN+ +LEFH +D + K LME+ FH+P
Sbjct: 139 IANKNEAVLEFHQND----DSKVQLMEMRFHMP 167
>SW:YMG9_YEAST Q04636 saccharomyces cerevisiae (baker's yeast).
hypothetical 63.0
kda protein in dak1-orc1 intergenic region. 5/2000
Length = 552
Score = 89.0 bits (219), Expect = 4e-17
Identities = 50/161 (31%), Positives = 80/161 (49%), Gaps = 8/161 (4%)
Frame = +3
Query: 141 GQIKIFSGGILWK--RQGGGKSIEVDK------ADIVSVTWMKVPRSNQLGVQIKDGLFY
296
G+ +I G+ WK GG + + K ++ +V W + R L + K+
Sbjct: 17 GRFRIADSGLGWKISTSGGSAANQARKPFLLPATELSTVQWSRGCRGYDLKINTKNQGVI
76
Query: 297 KFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVP
476
+ GF D + N F F I V++++ S+ G NWG DL N + F + K FE+P
Sbjct: 77 QLDGFSQDDYNLIKNDFHRRFNIQVEQREHSLRGWNWGKTDLARNEMVFALNGKPTFEIP
136
Query: 477 LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599
A ++ TNL KN+V +EF++ D D L+E+ F+IP
Sbjct: 137 YARINNTNLTSKNEVGIEFNIQDEEYQPAGDELVEMRFYIP 177
>SP_IN:O01683 O01683 caenorhabditis elegans. c32f10.5 protein. 3/2003
Length = 689
Score = 86.7 bits (213), Expect = 2e-16
Identities = 50/186 (26%), Positives = 90/186 (47%)
Frame = +3
Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI
278
F + + G G + + I + GGKS+ + D+ + W K+ L V +
Sbjct: 6 FKGVYVEDIGHLTCGTLTLTENSINFIGDKGGKSVYITGTDVDKLKWQKLGNKPGLRVGL
65
Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK
458
DG ++F GF D D+ + +F + + ++ + L ++G N+G D+ G + F ++
Sbjct: 66 SDGGAHRFGGFLDDDLQKIQSFTSSNWSKSINQSNLFINGWNYGQADVKGKNIEFSWENE
125
Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFVGDENTP
638
FE+P +VS + KN+ ILEFH ++ K LME+ FH+P +E+T
Sbjct: 126 PIFEIPCTNVSNV-IANKNEAILEFHQNE----QSKVQLMEMRFHMP---VDLENEEDTD
177
Query: 639 PXQVFR 656
+ F+
Sbjct: 178 KVEEFK 183
>SP_FUN:Q9HFC4 Q9hfc4 zygosaccharomyces rouxii (candida mogii).
ssrp1-like protein
(fragment). 10/2002
Length = 542
Score = 85.1 bits (209), Expect = 5e-16
Identities = 48/165 (29%), Positives = 79/165 (47%), Gaps = 8/165 (4%)
Frame = +3
Query: 141 GQIKIFSGGILWKRQGGGKSIE--------VDKADIVSVTWMKVPRSNQLGVQIKDGLFY
296
G+ +I G+ WK G S + ++ +V W + R +L V K+
Sbjct: 45 GRFRIADSGLGWKSANAGGSAANQSKQPFLLPATELSTVQWSRGCRGFELKVNTKNQGVV
104
Query: 297 KFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVP
476
+ GF D + N F F + V+ K+ S+ G NWG DL N + F + + +FEVP
Sbjct: 105 QLDGFAPDDFNLIKNDFHRRFNVQVEPKEHSLRGWNWGKADLARNEMVFALNGRPSFEVP
164
Query: 477 LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNT 611
A ++ TNL K +V +EF++ D D L+E+ ++P + T
Sbjct: 165 YARINNTNLTSKTEVAIEFNLADENYQPAGDELVEMRLYVPGTVT 209
Database: /home/seqstore/ncbi/blast/data/swplus
Posted date: Apr 15, 2003 12:04 PM
Number of letters in database: 303,757,025
Number of sequences in database: 954,989
Lambda K H
0.318 0.135 0.401
Gapped
Lambda K H
0.267 0.0410 0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 385,793,622
Number of Sequences: 954989
Number of extensions: 8541745
Number of successful extensions: 21678
Number of sequences better than 1.0e-06: 36
Number of HSP's better than 0.0 without gapping: 21171
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 21664
length of database: 303,757,025
effective HSP length: 116
effective length of database: 192,978,301
effective search space used: 20455699906
frameshift window, decay const: 50, 0.1
T: 12
A: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)
===========
Richard Holland
Bioinformatics Database Developer
ITS, Agresearch Invermay x3279
-----Original Message-----
From: Jason Stajich [mailto:jason at cgt.duhs.duke.edu]
Sent: Friday, 5 September 2003 9:39 a.m.
To: Holland, Richard
Cc: bioperl-l at bioperl.org; McCulloch, Alan
Subject: Re: [Bioperl-l] Blastx parser misses scores
Can you please provide and example report and code which doesn't behave
as you would expect.
Are you talking about the case where you have 50 hits listed in the
summary but say only 25 HSP alignments?
On Fri, 5 Sep 2003, Holland, Richard wrote:
> Hi,
>
> I have run into a problem with Bio::SearchIO::blast parsing blastx
> result files. This may affect other blast outputs as well but I'm not
> sure.
>
> At the top of a blastx output there is a summary of the best hits in
> the results file. Then, all the hits are listed, even the ones which
> are not in the best hits list.
>
> The Bio::Perl parser successfully parses all the hits from the file,
> however it only returns scores for those which appear in the summary.
> I have found the code which does this in Bio::SearchIO::blast and
> noticed that this seems to be deliberate - in all cases, blastx or
> not, the scores are taken from the summary, and the scores in the hit
> details appear to be ignored.
>
> Is this a feature or a bug? We would like to be able to use Bio::Perl
> to parse out all the results from our blast reports including all
> their scores and details, regardless of whether or not they appear in
> the best hits summary.
>
> Can anyone help?
>
> cheers,
> Richard
> ======================================================================
> =
> Attention: The information contained in this message and/or
attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or
privileged
> material. Any review, retransmission, dissemination or other use of,
or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by
AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
>
=======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
More information about the Bioperl-l
mailing list