[Bioperl-l] hmmer3/hmmscan parser

Dave Messina David.Messina at sbc.su.se
Wed May 26 14:52:05 UTC 2010



> So this brings up an interesting point. At some point, we'll have to build out a few additional SearchIO methods to incorporate some of the additional information encoded in the HMMER v3 reports.

Would the new methods need to be added to SearchIO if they're specific to H3? (as opposed to just being in the H3 sub-class)



>  Sean talks a bit in the user manual about the importance of looking at both the full sequence and the best domain (see page 18 in the manual linked to on this page http://hmmer.janelia.org/#documentation).  For example, he mentions that one should consider the e-value of both the full sequence and best domain to ascertain if the query is homologous to a profile being considered via hmmsearch.
> 
> He also mentions that looking at the full sequence report values without consideration of the best domain report values can be misleading. I'm not saying that your approach regarding Hit->raw_score is wrong - proper interpretation of the results is up to the end user and there are benefits to looking at the full sequence (again, communicated on page 18) - but we might consider how to best encode the SearchIO methods to mitigate end user confusion and mistakes.

I think this is a great idea.

Of course it's always best for end-users to RTFM and understand the tools they're using, but it's clearly beneficial to make it easier to do the right thing.

Having not considered it too much, I'm not sure how to accomplish this without breaking the SearchIO idiom. But presumably a way could be found.



>> Some of the folks on IRC suggested that we might want to integrate the
>> hmmer.pm parser as well, modularizing this a bit and loading the correct
>> parser depending on the requested format.

> This might make sense, given that HMMER v3 is now live and seems to be adopted by researchers at an increasing rate. Since I used hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult to do, either.  I think a thorough conversation on this point is warranted as others I've talked to have preferred the modules to be separate.
> 
> I'd be interested to hear what other have to say on this point.

I did not follow the IRC discussion, so I confess I'm not totally clear on what "integrate the hmmer.pm parser" means. I'm taking it to mean combining the code that parses HMMER2 with the code that parses HMMER3.

But then "modularizing this a bit and loading the correct parser depending on the requested format" seems to contradict that assumption.

Perhaps you (or someone) could clarify a bit what the HMMER2 - HMMER3 integration would look like (and the goal of doing so) ?



Dave






More information about the Bioperl-l mailing list