[Bioperl-l] Naming consistency and Bioperl future search result parsing

Mon, 31 Dec 2001 14:39:54 -0500 (EST)

In an effort to make bioperl approachable to new developers and users
alike we are trying to establish some consistency for naming of things.
In the first wave of these changes I am working on the Search objects and
the assorted supporting cast.

Ewan, Steve, and I have talked about Steve's new Search objects and I'd
like to go with his nomeclature:

Query - the query input to a db search or the first item in a pairwise
        alignment
Hit   - a database search 'hit' (formerly a 'subject') or 2nd item in a
        pairwise alignment
HSP   - a component of a Hit that is an alignment between the query and
        subject - in most cases the best local alignment(s).
Result - a database or pairwise alignment search run (formerly 'report').
        Report can be used to refer to a

My plan is to migrate calls to 'hit' from subject and deprecate the
Bio::SeqFeature::SimilarityPair method 'subject' in favor of a new method
'hit'.

The Bio::Search and Bio::SearchIO classes and directories will be
reorganized to only contain Query, Hit, HSP, & Result in the API.

Bio::Tools::Blast is to be deprecated in 1.0 and eventually removed from
the bioperl distribution in favor of the new Bio::SearchIO system.  It is
unclear how we will proceed wrt to the Blast-Parsing Lite system (BPlite,
BPpsilite, BPbl2seq) they will certainly be part of the 1.0 release and it
remains to be seen if we will want to continue to support 2 frameworks.

Bioperl 1.0 should contain a robust event based parsing framework for
search results.  We will focus on providing simple access to report data
in the SearchIO system in a standard API for multiple search result
formats.

Additionally groundwork has been laid by Steve C to provide lazy parsing
for those with specific performance and flexibilty needs.  I am still
sifting through the options here, but I think we can meet the both goals
of supporting power users and not confusing new users or developers.

I hope to have this transition finished in the next few weeks, any
comments, volunteers, or ideas are welcomed, but unless you can help
code I intend to take the course outlined above.  This will be our 3rd
generation parsing system (3G ;) ) and work will continue to be done to
improve and integrate this system with bioperl and support future formats.

-jason

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu