[Bioperl-l] hmmer3.pm question re query and hit coordinates
Peng Zhou
zhoupenggeni at gmail.com
Wed Jul 11 20:05:56 UTC 2012
Thanks Chris, here is the link of the filed
bug: https://redmine.open-bio.org/issues/3369
On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote:
>
> Peng,
>
> Has this been filed as a bug yet?
>
> https://redmine.open-bio.org/projects/bioperl
>
> Seems like it would be fairly easy to fix, but I want to track it just in
> case.
>
> chris
>
> On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote:
>
> > Hello guys,
> >
> > Just a follow-up, it seems to me the bioperl-live version is still
> having the same problem - calling hit "query" while query sequence "hit". I
> also looked into the test script written for hmmer3
> (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment
> part - I guess that's why this bug was not discovered.
> >
> > To be simple, here's an output of hmmsearch v3.0:
> > # hmmsearch :: search profile(s) against a sequence database
> > # HMMER 3.0 (March 2010); http://hmmer.org/
> > # Copyright (C) 2010 Howard Hughes Medical Institute.
> > # Freely distributed under the GNU General Public License (GPLv3).
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> -
> > # query HMM file:
> /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
> > # target sequence database:
> /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
>
> > # output directed to file:
> /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
> > # number of worker threads: 4
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> -
> >
> > Query: CRP0000 [M=75]
> > Scores for complete sequences (score includes all domains):
> > --- full sequence --- --- best 1 domain --- -#dom-
> > E-value score bias E-value score bias exp N Sequence
> Description
> > ------- ------ ----- ------- ------ ----- ---- -- --------
> -----------
> > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1
> Chr2_540228_540404_+
> >
> > Domain annotation for each sequence (and alignments):
> > >> Chr2_540228_540404_+
> > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali
> to envfrom env to acc
> > --- ------ ----- --------- --------- ------- ------- -------
> ------- ------- ------- ----
> > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4
> 59 .] 1 59 [] 0.95
> >
> > Alignments for each domain:
> > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30
> > CRP0000 20
> tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
> > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg
> rrrC+Ct++c
> > Chr2_540228_540404_+ 4
> GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
> >
> 568899***99********************************************* PP
> >
> > And here is a dump of the parsed HSP object:
> > $VAR1 = bless( {
> > 'VERBOSE' => 0,
> > 'IDENTICAL' => 0,
> > 'RANK' => 1,
> > 'STRANDED' => 'NONE',
> > 'EVALUE' => '3.6e-30',
> > 'HSP_LENGTH' => 56,
> > 'ALGORITHM' => 'HMMSEARCH'
> > 'SCORE' => '95.0',
> > 'GAP_SYMBOL' => '-',
> > 'CONSERVED' => 0,
> >
> > 'HIT_NAME' => 'Chr2_540228_540404_+',
> > 'HIT_DESC' => '',
> > 'HIT_START' => '20',
> > 'HIT_END' => '74',
> > 'HIT_LENGTH' => 56,
> > 'HIT_SEQ' =>
> 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
> > 'HIT_FRAME' => 0,
> >
> > 'QUERY_NAME' => 'CRP0000',
> > 'QUERY_DESC' => undef,
> > 'QUERY_START' => '4',
> > 'QUERY_END' => '59',
> > 'QUERY_LENGTH' => '75',
> > 'QUERY_FRAME' => 0,
> > 'QUERY_SEQ' =>
> 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
> >
> > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs
> +nCa+vC++Egf gG+crg rrrC+Ct++c',
> > }, 'Bio::Search::HSP::HMMERHSP' );
> >
> > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be
> exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.
> >
> > Thanks,
> >
> > Peng,
> >
> > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
> > I'll try the bioperl-live version. Thanks guys.
> > Scott Givan
> > 541-740-4685
> > Sent from an iPhone (so expect typos).
> >
> > On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu>
> wrote:
> >
> > > This might be a disconnect between the HMMER3 version in bioperl-live
> and the one in Kai's bioperl-hmmer3 repo. I believe the one in
> bioperl-live is newer. Scott, can you give that a try?
> > >
> > > chris
> > >
> > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > >
> > >> Hi Scott,
> > >>
> > >> Thanks for writing. I'm on the road at the moment so I have to be
> briefer and less thorough than I'd like to be.
> > >>
> > >> What you are observing is not the intended behavior. Oddly, it's not
> what I recall obtaining in my tests on this software, though I was mostly
> interested in hmmsearch at the time and may have been sloppier than I
> should have been when it came to hmmscan.
> > >>
> > >> What version of HMMER3 you're using? There have been some small
> formatting changes in the past that might be causing a burp in the parser,
> though I'm doubting it.
> > >>
> > >> Kai Blin wrote some test scripts (found here:
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate
> query/hit coordinates. It might be worth giving this a shot if you haven't
> already.
> > >>
> > >> Also, if you don't mind, I'm happy to run your code on your output
> file on my end. It might help me diagnose the problem.
> > >>
> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case
> anyone else has insight into this matter.
> > >>
> > >> Best,
> > >> Thomas
> > >>
> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> > >>
> > >>> Hi Thomas,
> > >>>
> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse
> hmmscan
> > >>> reports. When I parse the files and walk through the HSP's like:
> > >>>
> > >>> while (my $hit = $rslt->next_model) {
> > >>>
> > >>> while (my $domain = $hit->next_hsp) {
> > >>>
> > >>> And retrieve the "hit" coordinates like:
> > >>>
> > >>> print "hit coords: ", $domain->start('hit'), "-",
> $domain->end('hit'),
> > >>> "\n";
> > >>>
> > >>> The coordinates returned correspond to what I would call the
> "query",
> > >>> since they are for the sequence I fed to hmmscan to search the
> profile
> > >>> database. Likewise, when retrieving the query coordinates like
> > >>> $domain->start('query'), I get what I consider the "hit"
> coordinates,
> > >>> since they are for the domain profile. Is this the intended
> behavior?
> > >>>
> > >>> Thanks.
> > >>>
> > >>> scott
> > >>>
> > >>> --
> > >>> Scott A. Givan
> > >>> Associate Director
> > >>> Informatics Research Core Facility
> > >>> 240e Bond Life Sciences Center
> > >>> Research Assistant Professor
> > >>> Molecular Microbiology and Immunology
> > >>> University of Missouri, Columbia
> > >>>
> > >>> TEL 573-882-2948
> > >>> FAX 573-884-9676
> > >>> http://ircf.rnet.missouri.edu
> > >>>
> > >>>
> > >>>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list