[Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
Chris Fields
cjfields at uiuc.edu
Thu Jun 29 13:27:00 UTC 2006
On Jun 29, 2006, at 2:02 AM, Sendu Bala wrote:
> Chris Fields wrote:
>>
>> Personally, I don't think right now is the time to think about
>> refactoring
>> this particular module, esp. since I find it essentially works. I
>> believe
>> that energy is better spent elsewhere, such as SeqIO::genbank/
>> swiss/embl for
>> instance, or refactoring SearchIO::blast etc to use hashes instead of
>> objects to speed things up. Or creating something yourself. Or
>> doing what
>> you currently are doing (Bio::Map). In other words, areas where
>> use is
>> high, code is aging, and refactoring is more productive.
>
> Hmmer parsing happens to be important to me, in fact vital for my
> work.
> I've been using my own parser up till now, so didn't know what the
> Bioperl one was like. I'd like to use Bioperl for more things,
> preferably everything.
We're not deterring you from setting up your own parser, something
both Jason and I suggested. I just don't see what the major issue
is; hmmerpfam results never really contain the same number of hits
per query that BLAST does (I get at the very most 30-40 and that is
usually based on repeats). I believe the best place to spend this
energy first and foremost is fixing the bug.
>> I'll add that I'm not trying to dissuade you from trying to build
>> your own
>> variation of a SearchIO HMMER parser; by all means go ahead. The
>> above is
>> how I feel. You can build your own parser to do what you want;
>> you can even
>> base it off the current SearchIO HMMER parser and see if you can
>> set it up
>> to give you the results you want, using a different handler and so
>> on. Just
>> don't break the API or modify the current code based strictly on
>> what your
>> opinion of how it should work is. It was probably set up this way
>> for a
>> particular reason.
>
> Well, I don't like the idea of there being multiple SearchIO
> parsers for
> the same thing.
See, here's the thing: if the community-at-large decides to use your
version of the parser then, by default it will become the only HMMER
SearchIO parser and we'll deprecate the old one. I just don't think
this is the way I would go about it. Jason has mentioned that object
instantiation is a bigger issue with parsing (speed) than anything
else; why not, if you plan on doing this, set up a Handler to return
hashes, or do it completely under-the-hood? Have it be the 'new,
faster way to run SearchIO.' Don't rehash (pardon the bad pun) the
way things were esp. when proposals are out there to improve the
toolkit.
> [...]
>> And, frankly, it's not up to the user when using code they didn't
>> create.
>> You have to deal with it. Or code something yourself to do things
>> the way
>> you want. You have the power to do that; most bioperl users don't
>> simply
>> b/c they probably don't understand the class structure and OO
>> nature of
>> Bioperl. It's just a matter of where you want to spend your
>> energy: dealing
>> with something that interests you or fixing other's people's
>> broken code.
>
> My original question was essentially: does doing it my way make sense?
> And implicitly: would doing it my way be of any harm? Ie. can I go
> ahead
> and change how the parser reports results and groups them together? I
> don't think it will involve an API change, but the results it
> generates
> will obviously be very different.
And my point is that both ways make sense, at least to me (and it
sounds like to Jason though I could be wrong). Again, create a new
version of the parser based on what you want to do and accomplish.
Don't just modify something the community at-large uses based on your
whims. Make the changes to a new module and let the community
decide. As an example, BioPerl, for the longest time, had several
BLAST parsers; we directed everybody over to SearchIO and most people
seem to like it; hence the others are deprecated.
And changing the results returned by some could be considered
changing the API or a bug. If someone using this module has an
automated pipeline set up for annotation using Pfam, hmmpfam,
Bioperl, and a database, and their setup expects single model/domain
pairs, yeah, your changes will break that. Maybe small,
inconsequential even, but it's possible (and even true; many genome
annotation pipelines are set up exactly how I describe).
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list