[Bioperl-l] WUBLASTP parsing problem
Angshu Kar
angshu96 at gmail.com
Wed Jul 26 17:15:35 UTC 2006
Hi,
Does WU-BLASTP has got something to do with the length of the
sequence names (or the sequence names)?
What is happening here is I use fasta format proteins to build the
blast (I do a distributed blastp) report. But when I parse the
report (using bioperl), the query column remains empty for some
results as :
* 328857 6.6e-135
325331 6.3e-114
325329 1.0e-113
325332 1.7e-113
325330 2.7e-113
.
.
*.
while for some its perfect as:
*267750 280003 7.5e-301
267750 348279 7.5e-301
267750 345867 2.0e-300
267750 251915 2.0e-300
267750 346539 6.7e-300
.
*.
.
Some of my sequences are as:
*IMGA|AC159872_38.1 hypothetical protein AC159872.12 35121-35051 H
EGN_Mt050401 20060209 TIGR 1671.m00013
mrsciilhnmivederdtyaqrwtefeqpggngsstpqpystelrdpdvhhklqtdlvkh
iwikfgmyrd*
*
And part of the blastp (the one where I'm facing the issue) result
is as:
*Smallest
*
* Sum
High
Probability
Sequences producing High-scoring Segment Pairs: Score
P(N) N
gi|33333045|gb|AAQ11687.1| MADS box protein [Triticum aes... 1318
6.6e-135 1
gi|47681327|gb|AAT37484.1| MADS5 protein [Dendrocalamus l... 1120
6.3e-114 1
gi|47681331|gb|AAT37486.1| MADS7 protein [Dendrocalamus l... 1118
1.0e-113 1
gi|47681325|gb|AAT37483.1| MADS4 protein [Dendrocalamus l... 1116
1.7e-113 1
gi|47681329|gb|AAT37485.1| MADS6 protein [Dendrocalamus l... 1114
2.7e-113 1
gi|47681323|gb|AAT37482.1| MADS3 protein [Dendrocalamus l... 1114
2.7e-113 1
11674.m04224|LOC_Os08g41950|protein K-box region, putative 976
1.1e-98 1
gi|28630961|gb|AAO45877.1| MADS5 [Lolium perenne] 967
1.0e-97 1
gi|44888605|gb|AAS48129.1| AGAMOUS LIKE9-like protein [Ho... 964
2.1e-97 1
11674.m04223|LOC_Os08g41950|protein K-box region, putative 899
1.6e-90 1
gi|34979580|gb|AAQ83834.1| MADS box protein [Asparagus of... 875
5.8e-88 1*
Could you please let me know if I'm missing something? Has the gi got to do
anything with this?
Thanking you,
Angshu
More information about the Bioperl-l
mailing list