[Bioperl-l] Blast stuff

Steve Chervitz sac at bioperl.org
Mon Jan 27 14:20:31 EST 2003


Good catch, guys. I would also expect seq_inds('conserved') to include 
identical matches, since identity definitely means conserved. It should 
also jibe with frac_conserved and num_conserved. So I'd call this a bug 
and fix seq_inds('conserved') to do what everyone expects.

As for returning the conserved but non-identical matches, I'd recommend 
defining an unambiguous argument that won't mislead anyone, such as 
seq_inds("conserved-but-not-identical"). Wordy, but explicit. It 
probably won't be of concern to the typical user since it doesn't 
correspond to any BLAST statistic.

As for backward compatibility, I consider this more of a bug than a 
feature, so a change in behavior is warranted. We can of course note it 
in the release notes, and even issue a warning when 
seq_inds("conserved") is called. The warning would be in the 1.2.1 and 
1.2.2 release only.

Steve

On Monday, Jan 27, 2003, at 05:00 US/Pacific, Jason Stajich wrote:

> Ben - That's a fine idea.  Definitely something that should be 
> applied, I
> would have expected the same thing.
>
> Will look into it today should be trivial to implement.
>
> -jason
>
> On Sun, 26 Jan 2003, Benjamin Berman wrote:
>
>>
>> Hey Jason,
>>
>> For some reason, the bioperl list seems to be rejecting posts I make 
>> from
>> this computer, so could you forward this on to steve and the rest of 
>> the list.
>>
>> I found another problem I believe.  In both the blast and psiblast
>> versions, Hit/HSP::seq_inds('conserved') seems to return the
>> similar/conserved bases alone, not including the identical ones.  
>> Common
>> intuition, as well as BLAST itself, would consider conserved (or 
>> positive)
>> matches to include both the conserved and identical residues.  While 
>> the
>> current semantics may be correct under a certain logic, I think it's 
>> very
>> misleading- especially because they contradict what is returned by
>> frac_conserved and num_conserved.  I spent a long time today tracking 
>> down
>> unexpected results I was getting, and it turned out to be because I 
>> was
>> assuming that frac_conserved and seq_inds('conserved') were returning 
>> the
>> same thing.
>>
>> I think it's useful to provide access to the conserved indices apart 
>> from
>> the identical ones.  And I guess for backwards compatibility 
>> 'conserved'
>> should continue to return what it does now (although I think it would
>> prevent future problems to alter the semantics now).  But the 
>> documentation
>> should make it abundantly clear in this case what's going on.  In 
>> addtion,
>> another class such as 'positives', could be added to return the
>> conserved+identical union.
>>
>> thanks for the consideration,
>> ben.
>>
>>
>>
>> At 05:34 PM 1/26/2003 -0500, Jason Stajich wrote:
>>> Ben -
>>>
>>> The reason stems from this: the last HSP is not in the same Frame.  
>>> So the
>>> tile_hsps puts it into a separate group - so the last 18 aa are not
>>> included in the longest contiguous alignable region which is used to
>>> calulate that frac_identical for the overall hit.  This is in lines
>>> 215-226 in Bio::Search::SearchUtils.
>>>
>>> Steve may have comments on what is the appropriate behavior here.
>>>
>>> I'm commiting some code cleanup made during my little audit but it 
>>> should
>>> not functionally change anything.
>>>
>>> Oops - actually one change - I think the logical_length of the hit
>>> should be the length / 3 for a TBLASTN so that put the percent id at 
>>> 84%
>>> which is consistent with what Steve was doing in psiblast objects.
>>> Comments welcome there as to what is the correct thing to return 
>>> there.
>>> Perhaps the user should be able to request either?
>>>
>>> I'm getting an error with psiblast parser right now because a frame 
>>> = '+1'
>>> is flipping out the SeqFeature::Generic obj.  Something to look at I
>>> guess.
>>>
>>> Superbowl will have to interrupt further debugging, sorry...
>>>
>>> [snip]
>>>
>>>  Score = 52 (220.6 bits), Expect = 0.0, Sum P(4) = 0.0, Group = 1
>>>  Identities = 52/52 (100%), Positives = 52/52 (100%), Frame = +1
>>>
>>> Query:   327 STNLAKAAFESKWYEGSLRYKKEILILMAQAQRPLEISARGVIIISLDTFKI 378
>>>              STNLAKAAFESKWYEGSLRYKKEILILMAQAQRPLEISARGVIIISLDTFKI
>>> Sbjct:  1090 STNLAKAAFESKWYEGSLRYKKEILILMAQAQRPLEISARGVIIISLDTFKI 
>>> 1245
>>>
>>>  Score = 18 (79.1 bits), Expect = 0.0, Sum P(4) = 0.0, Group = 1
>>>  Identities = 18/18 (100%), Positives = 18/18 (100%), Frame = +2
>>>
>>> Query:   379 LMTITYRFFAVIRQTVEK 396
>>>              LMTITYRFFAVIRQTVEK
>>> Sbjct:  1304 LMTITYRFFAVIRQTVEK 1357
>>>
>>>
>>> On Sun, 26 Jan 2003, Benjamin Berman wrote:
>>>
>>>> Dear Jason, Steve, others,
>>>>
>>>> I am trying to parse a wublast-2 output file using either the 
>>>> 'blast' or
>>>> 'psiblast' SearchIO modules, and getting results I don't 
>>>> understand.  I am
>>>> attaching both the blast file (test.bla) and the short test script
>>>> (for_jason.pl) to this mail.  Here are the results when I run the 
>>>> test
>>> script:
>>>>
>>>> benb at sos: ./for_jason.pl ./test.bla blast
>>>> Hit to CG13158 ==>  E=0.00e+00  query=(96.00% or 379 AAs of 
>>>> "CG13158-PA" at
>>>> 100.00% iden)  subject=(28.00% or 379 AAs
>>>> of "CG13158" at 100.00% iden)
>>>>          Query inds ==> 1-396
>>>>          Hit inds ==>
>>> 1-259,278-345,364-415,435.333333333333-452.333333333333
>>>> benb at sos:
>>>> benb at sos: ./for_jason.pl ./test.bla psiblast
>>>> Hit to CG13158 ==>  E=0.00e+00  query=(96.00% or 379 AAs of 
>>>> "CG13158-PA" at
>>>> 100.00% iden)  subject=(84.00% or 379 AAs
>>>> of "CG13158" at 100.00% iden)
>>>>          Query inds ==> 1-396
>>>>          Hit inds ==>
>>> 1-259,278-345,364-415,435.333333333333-452.333333333333
>>>>
>>>> When I look at the blast output, it is clear to me that 396 AAs are 
>>>> aligned
>>>> at 100% identity.  This is consistent with the output of seq_inds.  
>>>> But the
>>>> output of length_aln and frac_aligned seem to be wrong (both 
>>>> parsers put
>>>> this at 379, missing the last HSP of 18 AAs).  Additionally, the 
>>>> psiblast
>>>> one says that this is 84% of the subject entry, while it should 
>>>> actually be
>>>> 28% as 'blast' reports.
>>>>
>>>> I am using an up-to-date version of the bioperl-live repository.
>>>>
>>>> Am I doing something wrong?
>>>>
>>>> thanks,
>>>> ben.
>>>>
>>>>
>>>>
>>>>
>>>> At 12:20 PM 12/2/2002 -0800, Steve Chervitz wrote:
>>>>> --- Ewan Birney <birney at ebi.ac.uk> wrote:
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>> Bio::Tools::Blast.pm
>>>>>>
>>>>>>   I want to kill this - it has lots of confusing documentation; it
>>> is big;
>>>>>> I think Bio::SearchIO is the way forward and is stable enough to
>>> take it
>>>>>> on; I will bug fix Bio::SearchIO on spec for the 1.2 series
>>> (phew!). So...
>>>>>> can we remove it and put in a warning in a BEGIN block saying - 
>>>>>> it has
>>>>>> died, please try out Bio::SearchIO.
>>>>>>
>>>>>>
>>>>>>   SteveC and Jason - what are your votes here.
>>>>>
>>>>> R.I.P. ;-)
>>>>>
>>>>> All of it's functionality has been taken over by other modules for
>>> some time
>>>>> now, so I feel it's ready to go away for 1.2. The Bio/Tools/Blast
>>> directory
>>>>> goes away as well.
>>>>>
>>>>> The warning message in the BEGIN block can be a bit more detailed,
>>> something
>>>>> like a table such as:
>>>>>
>>>>>   Functionality         Module to Use
>>>>>   --------------        --------------
>>>>>   BLAST parsing         Bio::SearchIO
>>>>>   BLAST running         Bio::Tools::Run::StandAloneBlast or
>>>>>                         Bio::Tools::Run::RemoteBlast
>>>>>   BLAST HTML formating  Bio::SearchIO::Writer::HTMLResultWriter
>>>>>
>>>>> TODO: Transfer the non-BLAST-specific docs from the module-level 
>>>>> POD in
>>>>> Bio::Tools::Blast over to Bio::SearchIO, and update it for the 
>>>>> SearchIO
>>>>> system.
>>>>> This would include things like how to use the SearchIO::Writer
>>> modules. This
>>>>> would be task for Jason or I.
>>>>>
>>>>> Lately I've been working on merging my psiblast.pm SearchIO module 
>>>>> into
>>>>> Jason's
>>>>> blast.pm so that we can get down to just a single Blast parsing 
>>>>> system.
>>>>>
>>>>> Steve
>>>>>
>>>>>
>>>>> =====
>>>>> Steve Chervitz
>>>>> sac at bioperl.org
>>>>>
>>>>> __________________________________________________
>>>>> Do you Yahoo!?
>>>>> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
>>>>> http://mailplus.yahoo.com
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at bioperl.org
>>>>> http://bioperl.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> --
>>> Jason Stajich
>>> Duke University
>>> jason at cgt.mc.duke.edu
>>
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>



More information about the Bioperl-l mailing list