[Bioperl-l] Blast stuff

Jason Stajich jason at cgt.mc.duke.edu
Sun Jan 26 17:34:18 EST 2003


Ben -

The reason stems from this: the last HSP is not in the same Frame.  So the
tile_hsps puts it into a separate group - so the last 18 aa are not
included in the longest contiguous alignable region which is used to
calulate that frac_identical for the overall hit.  This is in lines
215-226 in Bio::Search::SearchUtils.

Steve may have comments on what is the appropriate behavior here.

I'm commiting some code cleanup made during my little audit but it should
not functionally change anything.

Oops - actually one change - I think the logical_length of the hit
should be the length / 3 for a TBLASTN so that put the percent id at 84%
which is consistent with what Steve was doing in psiblast objects.
Comments welcome there as to what is the correct thing to return there.
Perhaps the user should be able to request either?

I'm getting an error with psiblast parser right now because a frame = '+1'
is flipping out the SeqFeature::Generic obj.  Something to look at I
guess.

Superbowl will have to interrupt further debugging, sorry...

[snip]

 Score = 52 (220.6 bits), Expect = 0.0, Sum P(4) = 0.0, Group = 1
 Identities = 52/52 (100%), Positives = 52/52 (100%), Frame = +1

Query:   327 STNLAKAAFESKWYEGSLRYKKEILILMAQAQRPLEISARGVIIISLDTFKI 378
             STNLAKAAFESKWYEGSLRYKKEILILMAQAQRPLEISARGVIIISLDTFKI
Sbjct:  1090 STNLAKAAFESKWYEGSLRYKKEILILMAQAQRPLEISARGVIIISLDTFKI 1245

 Score = 18 (79.1 bits), Expect = 0.0, Sum P(4) = 0.0, Group = 1
 Identities = 18/18 (100%), Positives = 18/18 (100%), Frame = +2

Query:   379 LMTITYRFFAVIRQTVEK 396
             LMTITYRFFAVIRQTVEK
Sbjct:  1304 LMTITYRFFAVIRQTVEK 1357


On Sun, 26 Jan 2003, Benjamin Berman wrote:

> Dear Jason, Steve, others,
>
> I am trying to parse a wublast-2 output file using either the 'blast' or
> 'psiblast' SearchIO modules, and getting results I don't understand.  I am
> attaching both the blast file (test.bla) and the short test script
> (for_jason.pl) to this mail.  Here are the results when I run the test script:
>
> benb at sos: ./for_jason.pl ./test.bla blast
> Hit to CG13158 ==>  E=0.00e+00  query=(96.00% or 379 AAs of "CG13158-PA" at
> 100.00% iden)  subject=(28.00% or 379 AAs
> of "CG13158" at 100.00% iden)
>          Query inds ==> 1-396
>          Hit inds ==> 1-259,278-345,364-415,435.333333333333-452.333333333333
> benb at sos:
> benb at sos: ./for_jason.pl ./test.bla psiblast
> Hit to CG13158 ==>  E=0.00e+00  query=(96.00% or 379 AAs of "CG13158-PA" at
> 100.00% iden)  subject=(84.00% or 379 AAs
> of "CG13158" at 100.00% iden)
>          Query inds ==> 1-396
>          Hit inds ==> 1-259,278-345,364-415,435.333333333333-452.333333333333
>
> When I look at the blast output, it is clear to me that 396 AAs are aligned
> at 100% identity.  This is consistent with the output of seq_inds.  But the
> output of length_aln and frac_aligned seem to be wrong (both parsers put
> this at 379, missing the last HSP of 18 AAs).  Additionally, the psiblast
> one says that this is 84% of the subject entry, while it should actually be
> 28% as 'blast' reports.
>
> I am using an up-to-date version of the bioperl-live repository.
>
> Am I doing something wrong?
>
> thanks,
> ben.
>
>
>
>
> At 12:20 PM 12/2/2002 -0800, Steve Chervitz wrote:
> >--- Ewan Birney <birney at ebi.ac.uk> wrote:
> > >
> > > [snip]
> > >
> > > Bio::Tools::Blast.pm
> > >
> > >   I want to kill this - it has lots of confusing documentation; it is big;
> > > I think Bio::SearchIO is the way forward and is stable enough to take it
> > > on; I will bug fix Bio::SearchIO on spec for the 1.2 series (phew!). So...
> > > can we remove it and put in a warning in a BEGIN block saying - it has
> > > died, please try out Bio::SearchIO.
> > >
> > >
> > >   SteveC and Jason - what are your votes here.
> >
> >R.I.P. ;-)
> >
> >All of it's functionality has been taken over by other modules for some time
> >now, so I feel it's ready to go away for 1.2. The Bio/Tools/Blast directory
> >goes away as well.
> >
> >The warning message in the BEGIN block can be a bit more detailed, something
> >like a table such as:
> >
> >   Functionality         Module to Use
> >   --------------        --------------
> >   BLAST parsing         Bio::SearchIO
> >   BLAST running         Bio::Tools::Run::StandAloneBlast or
> >                         Bio::Tools::Run::RemoteBlast
> >   BLAST HTML formating  Bio::SearchIO::Writer::HTMLResultWriter
> >
> >TODO: Transfer the non-BLAST-specific docs from the module-level POD in
> >Bio::Tools::Blast over to Bio::SearchIO, and update it for the SearchIO
> >system.
> >This would include things like how to use the SearchIO::Writer modules. This
> >would be task for Jason or I.
> >
> >Lately I've been working on merging my psiblast.pm SearchIO module into
> >Jason's
> >blast.pm so that we can get down to just a single Blast parsing system.
> >
> >Steve
> >
> >
> >=====
> >Steve Chervitz
> >sac at bioperl.org
> >
> >__________________________________________________
> >Do you Yahoo!?
> >Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> >http://mailplus.yahoo.com
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at bioperl.org
> >http://bioperl.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list