[Bioperl-l] Bio::SearchIO::Writer::TextResultWriter is buggy
Lee Katz
lskatz at gmail.com
Tue Mar 13 22:06:48 UTC 2012
Hi, I am separating a blast output file into individual results, so that I
can multithread the reading of the results. I cannot pass a result object
through Perl threads because it contains code, which is not sharable via
threads::share (sharing is used internally in Thread::Queue)--therefore I
must pass a sub-file. My strategy is to read the whole file into
Bio::SearchIO and then write the result objects to a file, so that a thread
can read the file. The thread would thus read one file at a time
containing one query and all its results.
Reading the original file works, but then outputting the blast file is
buggy. The last line of the HSP is empty and has bad coordinates. I have
an example, with an error when trying to read it again with SearchIO, and
its fasta file below.
Any help debugging? Maybe I just need to update BioPerl since I installed
it around several months ago, maybe a year ago? Thanks.
MSG: In sequence lcl|R009125 residue count gives end value 341.
Overriding value [340] with value 341 for Bio::LocatableSeq::end().
ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL-----YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIFSQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN1
---------------------------------------------------
>lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion protein
[Shigella flexneri str. M90T (serotype 5a) plasmid pWR501]
Length = 342
Score = 79.3 bits (194), Expect = 2e-15
Identities = 87/360 (24%), Positives = 175/360 (48%), Gaps = 35/360 (9%)
Query: 4 GDKTEQASSQKLDKARKQGQIARSKEFSSAIMLMV----CIGYFYANADSLSGHLMQLFE 59
+KTE+ + +KL A K+GQ + K+ ++ ++++V I +F SLS ++
Sbjct: 2 ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL--- 53
Query: 60 VSFRFTAESQSDHDHILHLITQSLYLMIKVFAPLIIF-QFIASAIATCLLGGF------- 111
+R+ + + I + Y FA +I+F + I + C+L
Sbjct: 54 --YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQT 100
Query: 112 HFNLSLLAPK--FSKINPLSGIKRIFSKQTLVEFLKNVAKISLIFALLYYMISTNFHMIG 169
F L+ A K FS +NP+ G+K+IFS +T+ EF K++ + ++ Y+ + +I
Sbjct: 101 KFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF 160
Query: 170 SLVRASFQTTIHFSLQYVLELLGMLILIAILFGVIDIPYQKMTFGTQMKMTkqevkqehk 229
S V +S + +++ + +IL ++D + + + M M KQE+K+E+
Sbjct: 161 SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYI 220
Query: 230 eqeGRPEIKSRIRQIQMQNARRSASQTVPTADVVLMNPTHFAVALKYDLTKAEAPFVVAK 289
EQEG E KSR R++ ++ + + +V+MNPTH A+ + ++ A APF+
Sbjct: 221 EQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLI 280
Query: 290 GKNEVAFYIRTLAEQHQVEVLVVPEITRSIYHTTQLNQMIPNQLFLAVAQILKYVQQLKS 349
N+ A +R A + + + ++ R +Y T + + V +++ +++Q+++
Sbjct: 281 ETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN 340
Query: 350 349
Sbjct: 341 340
And the whole fasta entry is:
>lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion protein
[Shigella flexneri str. M90T (serotype 5a) plasmid pWR501]
MANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFFSLSDVMLLYRYVIINDFEINEGKYFFAVVIVFFKI
IGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF
SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEIL
SEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFV
DFEHLDEVLRLIVWLEQVENTH
--
Lee Katz, Ph.D.
More information about the Bioperl-l
mailing list