[Bioperl-l] v1.0.1 BLAST SearchIO woes

Jason Stajich jason@cgt.mc.duke.edu
Wed, 26 Jun 2002 10:37:21 -0400 (EDT)


On Tue, 25 Jun 2002 JDiggans@genelogic.com wrote:

>
> > you are more than welcome to make suggestions or improvements.
>
> w00t. :)
>
> I'm interested in where the core sees the dual blast.pm/psiblast.pm moving
> going forward. Seems like there's quite a bit of redundancy and that both
> have certain advantages. Is there a plan w/i bioperl to merge the two,
> functionally (though this would be tough since internally they're so
> different), or will they remain distinct? Seems like this will get
> confusing to newbies, especially given psiblast's somewhat-misleading name.
> At the very least they should offer similar functionality ... at the moment

My plan is for us to remove psiblast.pm after merging all the relavent
functionality over.  This functionality has been slowly ported over
(seq_inds has been moved), need to port PSI-BLAST parsing -- iteration
hooks have been created in the Hit object, the tiling method needs to be
moved, and the found_again() method.

> I'd propose doing the following (and yes, I'm willing to work on it):


>
>       - Add a score() method to HSPI ... unless someone can fill me in if
> it was
>         left out intentionally
>
There is already a score in the GenericHSP - it comes from
Bio::SeqFeature::Similarity inheritance.

>       - Add an implementing method to BlastHSP to return scores from HSPs
>         pulled from reports
>                   (this way it won't matter if a newbie uses blast.pm or
                     psiblast.pm)

well - the plan really is to remove the BlastHSP objects because they do
lazy parsing and will not be of much use when psiblast.pm is gone.

>
>       - Fix the strand('query') access method in GenericHSP which currently
>                    returns the word  'query' or 'sbjct' instead of a strand
> ID.

yep - should be very easy I expect - I'd actually prefer if we can double
check that we are storing in 1 place --$hsp->strand('query') and
$hsp->query->strand should be pointing to the same place in memory or
else we need to be doing housekeeping when one or the other is changed.
So some more tests for these different calls would be in order.

We can un-deprecate the subject() method of SeqFeature::Similarity and
have both hit() and subject()?   This would be a main-trunk thing.


>
> Do these make sense? Also, I know there's talk of moving to a more
> RecDescent-ish scheme ... how soon is that planned?

I have serious doubts that this will work unless Parse::FastDescent is
finished and REALLY fast.  The event-based parsing push which spawned
SearchIO is meant to cover us for a while - if we want to write a
grammar-based parsing as another implementation that is fine.

One thing you'll notice is that not having a tied down API for the
Bio::Search objects has lead to a lot of tracking around to make sure
implementations agree.  But I think this system has some real advantages
to our old implementations which were BLAST-centric.  We now have HMMER,
BLAST (WU-blast, NCBI-blast, and NCBI-blast XML), and FastA parsing
producing all the same objects and pluggable into generic writers
which can produce TextBlast, HTMLBlast, and plain text Tables (probably
should have XML and CSV output as well for easy dumping into databases.


-jason


> -j
>
>
>
>
>
>                       Jason Stajich
>                       <jason@cgt.mc.duk        To:       <JDiggans@genelogic.com>
>                       e.edu>                   cc:       Bioperl <bioperl-l@bioperl.org>
>                                                Subject:  Re: [Bioperl-l] v1.0.1 BLAST SearchIO woes
>                       06/25/2002 02:28
>                       PM
>
>
>
>
>
>
> HSPs are Bio::SeqFeature::SimilarityPair objects which have the methods
> query and 'hit' (and 'subject' which is aliased to 'hit' but deprected),
>
> Now I should note that I may have used bad judgement when I migrated the
> SimilarityPair objects from subject -> hit() as this is confusing. In
> hindsight, really Hit is a fine object name but $hsp->subject/query is a
> more proper pairing that $hsp->hit/query.  I did this was in conjunction
> of migrating BPlite from $report->next_Sbjct to $report->next_Hit to me
> more explicit about what you were getting.
>
>
> The following do work:
> $hsp->query->strand
> $hsp->hit->strand (same as ) $hsp->subject->strand()
> using the Bio::Search::HSP::GenericHSP objects.
>
>
> I personally only use $hsp->query->strand and $hsp->hit->strand and that
> is what I test/use so I can't speak to the strand('query|hit') etc.  It
> should work but in fact there may be 2 separate storage slots (Data::Dump
> out the HSP object and see if you get this, I think this is still the
> case) for this information which is really a bad idea.  Grr.  Steve C
> likes $hsp->strand('query'), $hsp->length('query') etc style so the clash
> of preferences has caused the confusion.  If someone else wants to help on
> this project
>
>
>
>
> -jason
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu