[Bioperl-l] Bio::Search results

Thu Sep 1 13:33:40 EDT 2005

Sean-

The top of you input hmmpfam file has these four lines:

hmmpfam - search one or more sequences against HMM database HMMER 2.3.2
(Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School of
Medicine Freely distributed under the GNU General Public License (GPL)

When I copied that out of you e-mail and into emacs on my computer I got
it all as one line like this:

hmmpfam - search one or more sequences against HMM database HMMER 2.3.2
(Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School of
Medicine Freely distributed under the GNU General Public License (GPL)

The parser needs to see a line beginning with HMMER to trigger setting
an internal variable called _reporttype (among other things).  Since
this never happened with a single line header I got the same results you
did.  Your header lines look OK in the e-mail you sent me, but assuming
you have the same problem with you header lines that I did, fixing that
should lead you to your next problem...

This line of your code has an error:

next unless ($hsp->name =~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);

HSP objects don't have names but their associated hit objects do, so
this makes your script work as intended:

next unless ($hit->name =~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);

I wonder if the header line problems could be the result of cross
platform cutting and pasting (or just cross platform file
incompatibilities).  I noticed also that in you script that you sent the
'=' turned into '=3D'.  What computer platforms do you use in your work?

HTH

Barry

-----Original Message-----

From: Sean O'Keeffe [mailto:limericksean at gmail.com] 

Sent: Tuesday, August 30, 2005 9:28 AM

To: Barry Moore

Cc: bioperl-l at portal.open-bio.org

Subject: Re: [Bioperl-l] Bio::Search results

Hi Barry, thanks for the reply. Below is a snippet of the file (I
generated it with hmmpfam using the alignment flag set to -A 0, to
remove alignments - this shouldn't affect the parsing of the file) :

hmmpfam - search one or more sequences against HMM database HMMER 2.3.2
(Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School of
Medicine Freely distributed under the GNU General Public License (GPL)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

HMM file:                 /usr/local/lib/pfam-tm

Sequence file:            Mus_musculus.NCBIM34.jul.pep.fa-short

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query sequence: ENSMUSP00000089702

Accession:      [none]

Description:    [none]

Scores for sequence family classification (score includes all domains):

Model    Description                                    Score    E-value
N 

-------- -----------                                    -----    -------
---

      [no hits above thresholds]

Parsed for domains:

Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value

-------- ------- ----- -----    ----- -----      -----  -------

      [no hits above thresholds]

//

Query sequence: ENSMUSP00000089701

Accession:      [none]

Description:    [none]

Scores for sequence family classification (score includes all domains):

Model    Description                                    Score    E-value
N 

-------- -----------                                    -----    -------
---

      [no hits above thresholds]

Parsed for domains:

Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value

-------- ------- ----- -----    ----- -----      -----  -------

      [no hits above thresholds]

//

Query sequence: ENSMUSP00000020094

Accession:      [none]

Description:    [none]

Scores for sequence family classification (score includes all domains):

Model    Description                                    Score    E-value
N 

-------- -----------                                    -----    -------
---

LRR_1    Leucine Rich Repeat                             55.0    3.7e-15
6

LRRNT    Leucine rich repeat N-terminal domain           30.1    1.1e-07
1

Parsed for domains:

Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value

-------- ------- ----- -----    ----- -----      -----  -------

LRRNT      1/1     117   142 ..     1    34 []    30.1  1.1e-07

LRR_1      1/6     168   191 ..     1    25 []    14.7   0.0049

LRR_1      2/6     192   210 ..     1    25 []     9.1      0.2

LRR_1      3/6     212   237 ..     1    25 []     8.4     0.26

LRR_1      4/6     238   257 ..     1    25 []    10.3      0.1

LRR_1      5/6     259   282 ..     1    25 []    10.1     0.12

LRR_1      6/6     290   314 ..     1    25 []     2.4        2

//

Query sequence: ENSMUSP00000074175

Accession:      [none]

Description:    [none]

Scores for sequence family classification (score includes all domains):

Model    Description                                    Score    E-value
N 

-------- -----------                                    -----    -------
---

CUB      CUB domain                                     232.5    1.3e-68
2

Trypsin  Trypsin                                        206.0    1.2e-60
1

Sushi    Sushi domain (SCR repeat)                       88.7    2.6e-25
2

EGF_CA   Calcium binding EGF domain                      29.4    1.8e-07
1

Parsed for domains:

Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value

-------- ------- ----- -----    ----- -----      -----  -------

CUB        1/2      16   137 ..     1   116 []    70.0  1.1e-19

EGF_CA     1/1     141   188 ..     1    55 []    29.4  1.8e-07

CUB        2/2     192   301 ..     1   116 []   162.5  1.5e-47

Sushi      1/2     308   370 ..     1    62 []    47.5  6.5e-13

Sushi      2/2     375   446 ..     1    62 []    41.2  5.1e-11

Trypsin    1/1     463   698 ..     1   259 []   206.0  1.2e-60

//

Cheers,

Sean.

On 8/30/05, Barry Moore <bmoore at genetics.utah.edu> wrote:

> Sean,

> 

> Don't see anything obviously wrong.  If you want to send your input 

> file, I'll try to recreate the problem.

> 

> Barry

> 

> -----Original Message-----

> From: bioperl-l-bounces at portal.open-bio.org

> [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Sean 

> O'Keeffe

> Sent: Tuesday, August 30, 2005 2:55 AM

> To: bioperl-l at portal.open-bio.org

> Subject: [Bioperl-l] Bio::Search results

> 

> Hi,

> The following code snippet is something I use to extract information 

> from hmmer result files:

> 

> use Bio::SearchIO;

> 

> my $in =3D new Bio::SearchIO( -format =3D> 'hmmer',  -file =3D> 

> $ARGV[0] ); while(my $result =3D $in->next_result) {

>   print $result->query_name(), "\n",$result->query_description(),"\n";

>   while (my $hit =3D $result->next_hit) {

>     while(my $hsp =3D $hit->next_domain) {

>       next unless ($hsp->name =3D~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);

>       print $hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n";

>     }

>   }

> }

> 

> The input file is generated by hmmpfam and is given at the command 

> line. I use it to scan for specific domain names e.g ig, fn3 lrr etc. 

> This code works for the first loop and then ends so I get the name and

> description (no hsp values as their are none for this result):

> 

> ENSMUSP00000065602=20

> pep:novel supercontig::NT_085813:405:1510:-1 gene:ENSMUSG00000054059 

> transcript:ENSMUST00000066517

> 

> My question is why does the loop end after one instance. Incidentally 

> the outputted  name and description above are the last ones in the 

> hmmer file (maybe the file is read from the back??? - don't know if 

> this means anything). Any thoughts would be appreciated. Thanks,

> Sean.

> 

> _______________________________________________

> Bioperl-l mailing list

> Bioperl-l at portal.open-bio.org 

> http://portal.open-bio.org/mailman/listinfo/bioperl-l

>