[Bioperl-l] query/subject in HMMER report

Jason Stajich jason@cgt.mc.duke.edu
Thu, 19 Sep 2002 17:27:36 -0400 (EDT)


Probably overthinking this but when one does a hmmpfam search,
we're querying a sequence(s) against a db of HMMs.  So the QUERY sequence
in a Query/Hit "HSP" pair would be the sequence and the HIT would be the
HMM model?

However in a hmmsearch, one is querying a single HMM against a db of
sequences, so the QUERY would be HMM model while the HIT would be the
pep/dna sequence from the db of sequences?

This is only counter-intuitive when you look at the alignment files from
both programs and the HMM model sequence is always on the top of the
alignment block.  Are people okay with the described behavior - we didn't
previously parse hmmsearch results in the Bio::Tools::HMMER parser AFAIK
so I'm not sure what precedent is appropriate.

In case it wasn't clear, I've implemented parsing of the hmmsearch
alignment in SearchIO, so we can parse reports which just list the
hits/domains and now those that include the actual alignments.

Because of the way I've implemented the alignment parsing there is one
weirdo case that will cause warnings to appear (I expect all <-* which
indicate the end of the alignment to show up on the same line).

     AgChr6 641436    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 641482
                       ....
      AgChr6 641483   xxxxxxxxXXXXXXXXXXxxxxxxxxxxxxxxxxxxxxxxxxxxx< 641530
                   -*

      AgChr6     -    -

I'll try and track it down at some point.

-jason

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu