[Bioperl-l] hmmer3/hmmscan parser

Tue May 25 22:29:38 UTC 2010

Thanks for the contributions, Kai.

> I've seen the repo, and forked from it already to push my changes.  
> Some
> of the folks from IRC gave me write access and Chris Fields actually
> pushed my changes.

Just saw this. Thanks for doing that, Chris.

> Most notable about the changes is probably a bit hidden by the noise,
> but I've changed the Hit->raw_score to contain the overall score, not
> the "best domain" score.

So this brings up an interesting point. At some point, we'll have to  
build out a few additional SearchIO methods to incorporate some of the  
additional information encoded in the HMMER v3 reports.  Sean talks a  
bit in the user manual about the importance of looking at both the  
full sequence and the best domain (see page 18 in the manual linked to  
on this page http://hmmer.janelia.org/#documentation).  For example,  
he mentions that one should consider the e-value of both the full  
sequence and best domain to ascertain if the query is homologous to a  
profile being considered via hmmsearch.

He also mentions that looking at the full sequence report values  
without consideration of the best domain report values can be  
misleading. I'm not saying that your approach regarding Hit->raw_score  
is wrong - proper interpretation of the results is up to the end user  
and there are benefits to looking at the full sequence (again,  
communicated on page 18) - but we might consider how to best encode  
the SearchIO methods to mitigate end user confusion and mistakes.

>> Trying to integrate hmmer3 into the old hmmer searchIO module was the
>> original idea. But after talking to some of the BioPerl gurus and
>> considering the inherent differences between hmmer3 and hmmer2 (at
>> least during beta, though there are still some major output report
>> differences in the live release), we decided as separate module would
>> be ideal.
>
> Some of the folks on IRC suggested that we might want to integrate the
> hmmer.pm parser as well, modularizing this a bit and loading the  
> correct
> parser depending on the requested format.

This might make sense, given that HMMER v3 is now live and seems to be  
adopted by researchers at an increasing rate. Since I used hmmer.pm as  
a template for hmmer3.pm, it shouldn't be too difficult to do,  
either.  I think a thorough conversation on this point is warranted as  
others I've talked to have preferred the modules to be separate.

I'd be interested to hear what other have to say on this point.

>> This is an obvious statement, but I feel it's important to be clear  
>> on
>> these matters - you should feel free to make any and all  
>> contributions
>> to the development of this module as you see fit.  BioPerl has been
>> wonderful to me and I started this module to give a little back, but
>> this remains community generated software.
>
> I'm planning on adding even more tests, but the basic features for
> hmmscan parsing seem to be there. I'm currently running an extensive
> test run on real genome data, hopefully I can see the results of  
> that in
> a couple of days.

Awesome!

> Cheers, and thanks for the help,

Likewise.
T