[Biopython-dev] should we make a BLAT parser?

Yair Benita y.benita at wanadoo.nl
Thu Jul 7 18:31:05 EDT 2005


Since the only differences are in the header/footer and some spaces  
and numbers, it is essentially just like parsing a BLAST output.
Tomorrow I will post all the changes needed. On my machine I just  
made a copy of the NCBIStandalone and modified it to fit the BLAT  
output but the correct way to do this is to modify the original  
NCBIStrandalone to handle all these outputs. The thing is I don't  
fully understand how this parser works (with all those uhandles,  
scanners, consumers, etc.), so I rather someone who does makes the  
changes in the CVS.

Yair

On Jul 7, 2005, at 20:30, Brandon King wrote:

> Hi Yair,
>     I'm new to the developers list, but I do think it would be a great
> idea to create a BLAT parser based on the NCBIStandalone module. I  
> have
> to do about a million BLATs soon. I have code for processing many  
> BLAST
> results from the NCBIStandalone, but I don't have anything nearly as
> good for BLAT. Being able to use the same analysis code for BLAST/BLAT
> would be great (assuming the change your talking about will return
> result objects the same way that you can with the NCBIStandalone  
> module?).
>
> -Brandon King
>
> Yair Benita wrote:
>
>
>> I noticed a while ago that someone asked for a BLAT parser.
>> I just had to do a few thousands BLATs and I don't really liked  
>> the psl
>> output format it used. It is a bit confusing in my opinion. So I  
>> used the
>> blast-like output and with minor changes to the NCBIStandalone  
>> module I was
>> able to parse it with no problems.
>>
>> Should we introduce modifications in the NCBIStrandalone file or  
>> make a new
>> separate file for parsing BLAT output?
>>
>> The main changes are in the header and footer of the file. I  
>> append examples
>> below. There were a few other minor changes.
>>
>> Yair
>>
>> ----- header blat ------
>> BLASTN 2.2.4 [blat]
>>
>> Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
>>
>> ----- header blast ------
>> BLASTX 2.2.6 [Apr-09-2003]
>>
>>
>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>> Schaffer,
>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>> "Gapped BLAST and PSI-BLAST: a new generation of protein database  
>> search
>> programs",  Nucleic Acids Res. 25:3389-3402.
>>
>> ----- footer blat ------
>>  Database: localhost:4303
>>
>> ----- footer blast ------
>>  Database: nr
>>    Posted date:  Aug 11, 2004  8:59 AM
>>  Number of letters in database: 663,053,178
>>  Number of sequences in database:  1,971,122
>>
>> Lambda     K      H
>>   0.310    0.133    0.405
>>
>> Gapped
>> Lambda     K      H
>>   0.267   0.0410    0.140
>>
>>
>> Matrix: BLOSUM62
>> Gap Penalties: Existence: 11, Extension: 1
>> Number of Hits to DB: 111,495,368
>> Number of Sequences: 1971122
>> Number of extensions: 811791
>> Number of successful extensions: 2455
>> Number of sequences better than 1.0e-01: 0
>> Number of HSP's better than  0.1 without gapping: 2446
>> Number of HSP's successfully gapped in prelim test: 0
>> Number of HSP's that attempted gapping in prelim test: 0
>> Number of HSP's gapped (non-prelim): 2455
>> length of database: 663,053,178
>> effective HSP length: 2
>> effective length of database: 659,110,934
>> effective search space used: 15818662416
>> frameshift window, decay const: 50,  0.1
>> T: 12
>> A: 40
>> X1: 16 ( 7.2 bits)
>> X2: 38 (14.6 bits)
>> X3: 64 (24.7 bits)
>> S1: 42 (21.7 bits)
>>
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at biopython.org
>> http://biopython.org/mailman/listinfo/biopython-dev
>>
>>
>>
>>
>>
>
>
>



More information about the Biopython-dev mailing list