[Bioperl-l] Blast stuff

Jason Stajich jason at cgt.mc.duke.edu
Mon Jan 27 08:00:27 EST 2003


Ben - That's a fine idea.  Definitely something that should be applied, I
would have expected the same thing.

Will look into it today should be trivial to implement.

-jason

On Sun, 26 Jan 2003, Benjamin Berman wrote:

>
> Hey Jason,
>
> For some reason, the bioperl list seems to be rejecting posts I make from
> this computer, so could you forward this on to steve and the rest of the list.
>
> I found another problem I believe.  In both the blast and psiblast
> versions, Hit/HSP::seq_inds('conserved') seems to return the
> similar/conserved bases alone, not including the identical ones.  Common
> intuition, as well as BLAST itself, would consider conserved (or positive)
> matches to include both the conserved and identical residues.  While the
> current semantics may be correct under a certain logic, I think it's very
> misleading- especially because they contradict what is returned by
> frac_conserved and num_conserved.  I spent a long time today tracking down
> unexpected results I was getting, and it turned out to be because I was
> assuming that frac_conserved and seq_inds('conserved') were returning the
> same thing.
>
> I think it's useful to provide access to the conserved indices apart from
> the identical ones.  And I guess for backwards compatibility 'conserved'
> should continue to return what it does now (although I think it would
> prevent future problems to alter the semantics now).  But the documentation
> should make it abundantly clear in this case what's going on.  In addtion,
> another class such as 'positives', could be added to return the
> conserved+identical union.
>
> thanks for the consideration,
> ben.
>
>
>
> At 05:34 PM 1/26/2003 -0500, Jason Stajich wrote:
> >Ben -
> >
> >The reason stems from this: the last HSP is not in the same Frame.  So the
> >tile_hsps puts it into a separate group - so the last 18 aa are not
> >included in the longest contiguous alignable region which is used to
> >calulate that frac_identical for the overall hit.  This is in lines
> >215-226 in Bio::Search::SearchUtils.
> >
> >Steve may have comments on what is the appropriate behavior here.
> >
> >I'm commiting some code cleanup made during my little audit but it should
> >not functionally change anything.
> >
> >Oops - actually one change - I think the logical_length of the hit
> >should be the length / 3 for a TBLASTN so that put the percent id at 84%
> >which is consistent with what Steve was doing in psiblast objects.
> >Comments welcome there as to what is the correct thing to return there.
> >Perhaps the user should be able to request either?
> >
> >I'm getting an error with psiblast parser right now because a frame = '+1'
> >is flipping out the SeqFeature::Generic obj.  Something to look at I
> >guess.
> >
> >Superbowl will have to interrupt further debugging, sorry...
> >
> >[snip]
> >
> >  Score = 52 (220.6 bits), Expect = 0.0, Sum P(4) = 0.0, Group = 1
> >  Identities = 52/52 (100%), Positives = 52/52 (100%), Frame = +1
> >
> >Query:   327 STNLAKAAFESKWYEGSLRYKKEILILMAQAQRPLEISARGVIIISLDTFKI 378
> >              STNLAKAAFESKWYEGSLRYKKEILILMAQAQRPLEISARGVIIISLDTFKI
> >Sbjct:  1090 STNLAKAAFESKWYEGSLRYKKEILILMAQAQRPLEISARGVIIISLDTFKI 1245
> >
> >  Score = 18 (79.1 bits), Expect = 0.0, Sum P(4) = 0.0, Group = 1
> >  Identities = 18/18 (100%), Positives = 18/18 (100%), Frame = +2
> >
> >Query:   379 LMTITYRFFAVIRQTVEK 396
> >              LMTITYRFFAVIRQTVEK
> >Sbjct:  1304 LMTITYRFFAVIRQTVEK 1357
> >
> >
> >On Sun, 26 Jan 2003, Benjamin Berman wrote:
> >
> > > Dear Jason, Steve, others,
> > >
> > > I am trying to parse a wublast-2 output file using either the 'blast' or
> > > 'psiblast' SearchIO modules, and getting results I don't understand.  I am
> > > attaching both the blast file (test.bla) and the short test script
> > > (for_jason.pl) to this mail.  Here are the results when I run the test
> > script:
> > >
> > > benb at sos: ./for_jason.pl ./test.bla blast
> > > Hit to CG13158 ==>  E=0.00e+00  query=(96.00% or 379 AAs of "CG13158-PA" at
> > > 100.00% iden)  subject=(28.00% or 379 AAs
> > > of "CG13158" at 100.00% iden)
> > >          Query inds ==> 1-396
> > >          Hit inds ==>
> > 1-259,278-345,364-415,435.333333333333-452.333333333333
> > > benb at sos:
> > > benb at sos: ./for_jason.pl ./test.bla psiblast
> > > Hit to CG13158 ==>  E=0.00e+00  query=(96.00% or 379 AAs of "CG13158-PA" at
> > > 100.00% iden)  subject=(84.00% or 379 AAs
> > > of "CG13158" at 100.00% iden)
> > >          Query inds ==> 1-396
> > >          Hit inds ==>
> > 1-259,278-345,364-415,435.333333333333-452.333333333333
> > >
> > > When I look at the blast output, it is clear to me that 396 AAs are aligned
> > > at 100% identity.  This is consistent with the output of seq_inds.  But the
> > > output of length_aln and frac_aligned seem to be wrong (both parsers put
> > > this at 379, missing the last HSP of 18 AAs).  Additionally, the psiblast
> > > one says that this is 84% of the subject entry, while it should actually be
> > > 28% as 'blast' reports.
> > >
> > > I am using an up-to-date version of the bioperl-live repository.
> > >
> > > Am I doing something wrong?
> > >
> > > thanks,
> > > ben.
> > >
> > >
> > >
> > >
> > > At 12:20 PM 12/2/2002 -0800, Steve Chervitz wrote:
> > > >--- Ewan Birney <birney at ebi.ac.uk> wrote:
> > > > >
> > > > > [snip]
> > > > >
> > > > > Bio::Tools::Blast.pm
> > > > >
> > > > >   I want to kill this - it has lots of confusing documentation; it
> > is big;
> > > > > I think Bio::SearchIO is the way forward and is stable enough to
> > take it
> > > > > on; I will bug fix Bio::SearchIO on spec for the 1.2 series
> > (phew!). So...
> > > > > can we remove it and put in a warning in a BEGIN block saying - it has
> > > > > died, please try out Bio::SearchIO.
> > > > >
> > > > >
> > > > >   SteveC and Jason - what are your votes here.
> > > >
> > > >R.I.P. ;-)
> > > >
> > > >All of it's functionality has been taken over by other modules for
> > some time
> > > >now, so I feel it's ready to go away for 1.2. The Bio/Tools/Blast
> > directory
> > > >goes away as well.
> > > >
> > > >The warning message in the BEGIN block can be a bit more detailed,
> > something
> > > >like a table such as:
> > > >
> > > >   Functionality         Module to Use
> > > >   --------------        --------------
> > > >   BLAST parsing         Bio::SearchIO
> > > >   BLAST running         Bio::Tools::Run::StandAloneBlast or
> > > >                         Bio::Tools::Run::RemoteBlast
> > > >   BLAST HTML formating  Bio::SearchIO::Writer::HTMLResultWriter
> > > >
> > > >TODO: Transfer the non-BLAST-specific docs from the module-level POD in
> > > >Bio::Tools::Blast over to Bio::SearchIO, and update it for the SearchIO
> > > >system.
> > > >This would include things like how to use the SearchIO::Writer
> > modules. This
> > > >would be task for Jason or I.
> > > >
> > > >Lately I've been working on merging my psiblast.pm SearchIO module into
> > > >Jason's
> > > >blast.pm so that we can get down to just a single Blast parsing system.
> > > >
> > > >Steve
> > > >
> > > >
> > > >=====
> > > >Steve Chervitz
> > > >sac at bioperl.org
> > > >
> > > >__________________________________________________
> > > >Do you Yahoo!?
> > > >Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > > >http://mailplus.yahoo.com
> > > >_______________________________________________
> > > >Bioperl-l mailing list
> > > >Bioperl-l at bioperl.org
> > > >http://bioperl.org/mailman/listinfo/bioperl-l
> > >
> >
> >--
> >Jason Stajich
> >Duke University
> >jason at cgt.mc.duke.edu
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list