[Bioperl-l] PSI-BLAST uncommon result
Luis M Rodriguez-R
me at miguel.weapps.com
Thu Mar 11 05:48:17 UTC 2010
Hello all,
I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round. Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception:
MSG: no data for midline ***** No hits found ******
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357
STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792
My workaround was to use the XML output, but it's still a bug (I think). I append the example PSI-BLAST output at the end of the mail.
Best regards,
Luis M. Rodriguez-R
[http://bioinf.uniandes.edu.co/~miguel/]
---------------------------------
Unidad de Bioinformática del Laboratorio de Micología y Fitopatología
Universidad de Los Andes, Colombia
[http://bioinf.uniandes.edu.co]
+ 57 1 3394949 ext 2619
luisrodr at uniandes.edu.co
me at miguel.weapps.com
BLASTP 2.2.18 [Mar-02-2008]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Reference for compositional score matrix adjustment: Altschul, Stephen F.,
John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.
Reference for composition-based statistics starting in round 2:
Schaffer, Alejandro A., L. Aravind, Thomas L. Madden,
Sergei Shavirin, John L. Spouge, Yuri I. Wolf,
Eugene V. Koonin, and Stephen F. Altschul (2001),
"Improving the accuracy of PSI-BLAST protein database searches with
composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005.
Query= eff254
(67 letters)
Database: All non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PRF excluding environmental samples
from WGS projects
10,383,435 sequences; 3,542,477,638 total letters
Searching..................................................done
Results from round 1
Score E
Sequences producing significant alignments: (bits) Value
ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se... 127 5e-28
>ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation
pathway-hrp pilin [Erwinia pyrifoliae Ep1/96]
sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin
hrpA
gb|ABA39805.1| HrpA [Erwinia pyrifoliae]
emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation
pathway-hrp pilin [Erwinia pyrifoliae Ep1/96]
emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM
12163]
Length = 67
Score = 127 bits (318), Expect = 5e-28, Method: Compositional matrix adjust.
Identities = 67/67 (100%), Positives = 67/67 (100%)
Query: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60
MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN
Sbjct: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60
Query: 61 AAKAIQF 67
AAKAIQF
Sbjct: 61 AAKAIQF 67
Searching..................................................done
***** No hits found ******
Database: All non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PRF excluding environmental samples
from WGS projects
Posted date: Jan 24, 2010 4:41 AM
Number of letters in database: 863,709,833
Number of sequences in database: 2,562,282
Database: /storage1/databases/ncbi-blast/nr.01
Posted date: Jan 24, 2010 4:41 AM
Number of letters in database: 936,189,781
Number of sequences in database: 2,674,439
Database: /storage1/databases/ncbi-blast/nr.02
Posted date: Jan 24, 2010 4:41 AM
Number of letters in database: 974,890,473
Number of sequences in database: 2,826,395
Database: /storage1/databases/ncbi-blast/nr.03
Posted date: Jan 24, 2010 4:41 AM
Number of letters in database: 767,687,551
Number of sequences in database: 2,320,319
Lambda K H
0.297 0.107 0.256
Lambda K H
0.267 0.0344 0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 480,706,425
Number of Sequences: 10383435
Number of extensions: 8598061
Number of successful extensions: 47335
Number of sequences better than 1.0e-25: 1
Number of HSP's better than 0.0 without gapping: 2
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 47333
Number of HSP's gapped (non-prelim): 2
length of query: 67
length of database: 3,542,477,638
effective HSP length: 39
effective length of query: 28
effective length of database: 3,137,523,673
effective search space: 87850662844
effective search space used: 87850662844
T: 11
A: 40
X1: 16 ( 6.9 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 43 (21.7 bits)
S2: 298 (119.7 bits)
More information about the Bioperl-l
mailing list