[Bioperl-l] Bio::SearchIO::Writer::TextResultWriter is buggy

Fields, Christopher J cjfields at illinois.edu
Wed Mar 14 15:21:40 UTC 2012


Can you pass a file handle?  SearchIO accepts them using '-fh', but I have no idea if this would work.  See:

http://bytes.com/topic/perl/answers/497453-howto-share-filehandles-between-threads

The problem is whether this will be at the specific file point you need or whether the file pointer will be reset to the beginning.  According to the last answer there, if the variable is sharing scope it should work, but I haven't tried it out myself to be honest.  If you try that out let us know, I would be interested to see if it works.

Re: splitting the file, this is a little tricky as plain text output has changed.  Older versions of BLAST simply concatenate output files, so the header for the file is repeated (lines up to 'Query').  Latter version leave off the repeated header.  You could simply split the file up based on the query using a regex, I believe the result object should still be generated, but the header contains information that you may want, such as BLAST version, etc.

chris

On Mar 14, 2012, at 9:35 AM, Lee Katz wrote:

> I just want to clarify: I have an already existing blast output.  Is there
> a non-buggy way to split it?  It is in human-readable text form (-m 0).
> 
> On Tue, Mar 13, 2012 at 6:06 PM, Lee Katz <lskatz at gmail.com> wrote:
> 
>> Hi, I am separating a blast output file into individual results, so that I
>> can multithread the reading of the results.  I cannot pass a result object
>> through Perl threads because it contains code, which is not sharable via
>> threads::share (sharing is used internally in Thread::Queue)--therefore I
>> must pass a sub-file.  My strategy is to read the whole file into
>> Bio::SearchIO and then write the result objects to a file, so that a thread
>> can read the file.  The thread would thus read one file at a time
>> containing one query and all its results.
>> 
>> Reading the original file works, but then outputting the blast file is
>> buggy.  The last line of the HSP is empty and has bad coordinates.  I have
>> an example, with an error when trying to read it again with SearchIO, and
>> its fasta file below.
>> 
>> Any help debugging?  Maybe I just need to update BioPerl since I installed
>> it around several months ago, maybe a year ago?   Thanks.
>> 
>> 
>> MSG: In sequence lcl|R009125 residue count gives end value 341.
>> Overriding value [340] with value 341 for Bio::LocatableSeq::end().
>> 
>> ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL-----YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIFSQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN1
>> ---------------------------------------------------
>> 
>> 
>> 
>>> lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion
>> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501]
>>         Length = 342
>> 
>> Score = 79.3 bits (194), Expect = 2e-15
>> Identities = 87/360 (24%), Positives = 175/360 (48%), Gaps = 35/360 (9%)
>> 
>> 
>> Query: 4   GDKTEQASSQKLDKARKQGQIARSKEFSSAIMLMV----CIGYFYANADSLSGHLMQLFE 59
>>            +KTE+ + +KL  A K+GQ  + K+ ++ ++++V     I +F     SLS  ++
>> Sbjct: 2   ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL--- 53
>> 
>> Query: 60  VSFRFTAESQSDHDHILHLITQSLYLMIKVFAPLIIF-QFIASAIATCLLGGF------- 111
>>             +R+   +  +       I +  Y     FA +I+F + I   +  C+L
>> Sbjct: 54  --YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQT 100
>> 
>> Query: 112 HFNLSLLAPK--FSKINPLSGIKRIFSKQTLVEFLKNVAKISLIFALLYYMISTNFHMIG 169
>>            F L+  A K  FS +NP+ G+K+IFS +T+ EF K++  + ++    Y+    +  +I
>> Sbjct: 101 KFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF 160
>> 
>> Query: 170 SLVRASFQTTIHFSLQYVLELLGMLILIAILFGVIDIPYQKMTFGTQMKMTkqevkqehk 229
>>           S V +S         +   +++   +  +IL  ++D   + + +   M M KQE+K+E+
>> Sbjct: 161 SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYI 220
>> 
>> Query: 230 eqeGRPEIKSRIRQIQMQNARRSASQTVPTADVVLMNPTHFAVALKYDLTKAEAPFVVAK 289
>>           EQEG  E KSR R++ ++         +  + +V+MNPTH A+ + ++   A APF+
>> Sbjct: 221 EQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLI 280
>> 
>> Query: 290 GKNEVAFYIRTLAEQHQVEVLVVPEITRSIYHTTQLNQMIPNQLFLAVAQILKYVQQLKS 349
>>             N+ A  +R  A +  +  +   ++ R +Y T      +  +    V +++ +++Q+++
>> Sbjct: 281 ETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN 340
>> 
>> Query: 350  349
>> 
>> Sbjct: 341  340
>> 
>> And the whole fasta entry is:
>>> lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion
>> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501]
>> 
>> MANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFFSLSDVMLLYRYVIINDFEINEGKYFFAVVIVFFKI
>> 
>> IGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF
>> 
>> SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEIL
>> 
>> SEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFV
>> DFEHLDEVLRLIVWLEQVENTH
>> 
>> 
>> 
>> --
>> Lee Katz, Ph.D.
>> 
> 
> 
> 
> -- 
> Lee Katz, Ph.D.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list