[Bioperl-l] Bio::SearchIO::hmmer hsp behaviour
Sendu Bala
bix at sendu.me.uk
Wed Jun 28 16:46:57 UTC 2006
Sendu Bala wrote:
[ from thread Bio::SearchIO - Accessing Model parameters (score, evalue,
description) ]
[ concerning hmmpfam output ]
> I have another problem (or the same one as you? I'm can't tell...) in
> that I can only get a single result, hit and hsp from my hmmpfam file!
> It is doing my head in, but I might be doing something wrong so will
> look into it further before posting a bug report.
I was just doing something wrong, but...
Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report
a single HSP per Hit so domains with multiple alignments get separate
Hits (more FASTA like) since they aren't really HSPs'
Strangely 1.25 (Bioperl 1.4) seems to behave like that already.
In any case, this is extremely counter-intuitive, especially given that
next_domain is a synonym of next_hsp. I think either the synonym
relationship remains and hits have multiple hsps (and there is only one
hit per model), or next_domain goes off and finds the hsp that is the
next domain of the current model. But that would be incredibly broken in
the current model since it would be found in a different hit object...
What hmmpfam does is take a database of models which can be thought of
as database sequences. Then it aligns each one against your query
sequences. A model could align in multiple locations along a query
sequence. Each one of these locations is called a domain of the model. A
user of hmmpfam is model-centric (wants to know which models are on his
query), and so you want to know all about how well the model did in one
go. So you should be able to get the results for a model ($hit =
$result->next_model), get overall info about it ($hit->score etc.), then
get more detailed information about each domain of it (while ($hsp =
$hit->next_domain) {...}). But right now you only get one domain and you
have to go searching through all your other hits to find a hit with the
same ->name() as your model of interest to get the next domain of your
model.
In my view this is less than ideal. What do people think? Should it be
changed?
More information about the Bioperl-l
mailing list