[Bioperl-l] Bio::SearchIO::Writer::TextResultWriter is buggy

Lee Katz lskatz at gmail.com
Wed Mar 14 14:35:59 UTC 2012


I just want to clarify: I have an already existing blast output.  Is there
a non-buggy way to split it?  It is in human-readable text form (-m 0).

On Tue, Mar 13, 2012 at 6:06 PM, Lee Katz <lskatz at gmail.com> wrote:

> Hi, I am separating a blast output file into individual results, so that I
> can multithread the reading of the results.  I cannot pass a result object
> through Perl threads because it contains code, which is not sharable via
> threads::share (sharing is used internally in Thread::Queue)--therefore I
> must pass a sub-file.  My strategy is to read the whole file into
> Bio::SearchIO and then write the result objects to a file, so that a thread
> can read the file.  The thread would thus read one file at a time
> containing one query and all its results.
>
> Reading the original file works, but then outputting the blast file is
> buggy.  The last line of the HSP is empty and has bad coordinates.  I have
> an example, with an error when trying to read it again with SearchIO, and
> its fasta file below.
>
> Any help debugging?  Maybe I just need to update BioPerl since I installed
> it around several months ago, maybe a year ago?   Thanks.
>
>
> MSG: In sequence lcl|R009125 residue count gives end value 341.
> Overriding value [340] with value 341 for Bio::LocatableSeq::end().
>
> ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL-----YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIFSQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN1
> ---------------------------------------------------
>
>
>
> >lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion
> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501]
>          Length = 342
>
>  Score = 79.3 bits (194), Expect = 2e-15
>  Identities = 87/360 (24%), Positives = 175/360 (48%), Gaps = 35/360 (9%)
>
>
> Query: 4   GDKTEQASSQKLDKARKQGQIARSKEFSSAIMLMV----CIGYFYANADSLSGHLMQLFE 59
>             +KTE+ + +KL  A K+GQ  + K+ ++ ++++V     I +F     SLS  ++
> Sbjct: 2   ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL--- 53
>
> Query: 60  VSFRFTAESQSDHDHILHLITQSLYLMIKVFAPLIIF-QFIASAIATCLLGGF------- 111
>              +R+   +  +       I +  Y     FA +I+F + I   +  C+L
> Sbjct: 54  --YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQT 100
>
> Query: 112 HFNLSLLAPK--FSKINPLSGIKRIFSKQTLVEFLKNVAKISLIFALLYYMISTNFHMIG 169
>             F L+  A K  FS +NP+ G+K+IFS +T+ EF K++  + ++    Y+    +  +I
> Sbjct: 101 KFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF 160
>
> Query: 170 SLVRASFQTTIHFSLQYVLELLGMLILIAILFGVIDIPYQKMTFGTQMKMTkqevkqehk 229
>            S V +S         +   +++   +  +IL  ++D   + + +   M M KQE+K+E+
> Sbjct: 161 SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYI 220
>
> Query: 230 eqeGRPEIKSRIRQIQMQNARRSASQTVPTADVVLMNPTHFAVALKYDLTKAEAPFVVAK 289
>            EQEG  E KSR R++ ++         +  + +V+MNPTH A+ + ++   A APF+
> Sbjct: 221 EQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLI 280
>
> Query: 290 GKNEVAFYIRTLAEQHQVEVLVVPEITRSIYHTTQLNQMIPNQLFLAVAQILKYVQQLKS 349
>              N+ A  +R  A +  +  +   ++ R +Y T      +  +    V +++ +++Q+++
> Sbjct: 281 ETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN 340
>
> Query: 350  349
>
> Sbjct: 341  340
>
> And the whole fasta entry is:
> >lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion
> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501]
>
> MANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFFSLSDVMLLYRYVIINDFEINEGKYFFAVVIVFFKI
>
> IGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF
>
> SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEIL
>
> SEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFV
> DFEHLDEVLRLIVWLEQVENTH
>
>
>
> --
> Lee Katz, Ph.D.
>



-- 
Lee Katz, Ph.D.




More information about the Bioperl-l mailing list