[Bioperl-l] Bio::SearchIO::hmmer hsp behaviour

Sendu Bala bix at sendu.me.uk
Wed Jun 28 16:46:57 UTC 2006


Sendu Bala wrote:
[ from thread Bio::SearchIO - Accessing Model parameters (score, evalue, 
description) ]
[ concerning hmmpfam output ]
> I have another problem (or the same one as you? I'm can't tell...) in 
> that I can only get a single result, hit and hsp from my hmmpfam file!
> It is doing my head in, but I might be doing something wrong so will 
> look into it further before posting a bug report.

I was just doing something wrong, but...

Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report 
a single HSP per Hit so domains with multiple alignments get separate 
Hits (more FASTA like) since they aren't really HSPs'

Strangely 1.25 (Bioperl 1.4) seems to behave like that already.

In any case, this is extremely counter-intuitive, especially given that 
next_domain is a synonym of next_hsp. I think either the synonym 
relationship remains and hits have multiple hsps (and there is only one 
hit per model), or next_domain goes off and finds the hsp that is the 
next domain of the current model. But that would be incredibly broken in 
the current model since it would be found in a different hit object...

What hmmpfam does is take a database of models which can be thought of 
as database sequences. Then it aligns each one against your query 
sequences. A model could align in multiple locations along a query 
sequence. Each one of these locations is called a domain of the model. A 
user of hmmpfam is model-centric (wants to know which models are on his 
query), and so you want to know all about how well the model did in one 
go. So you should be able to get the results for a model ($hit = 
$result->next_model), get overall info about it ($hit->score etc.), then 
get more detailed information about each domain of it (while ($hsp = 
$hit->next_domain) {...}). But right now you only get one domain and you 
have to go searching through all your other hits to find a hit with the 
same ->name() as your model of interest to get the next domain of your 
model.

In my view this is less than ideal. What do people think? Should it be 
changed?



More information about the Bioperl-l mailing list