[Bioperl-l] Bio::Search results
Barry Moore
bmoore at genetics.utah.edu
Thu Sep 1 13:33:40 EDT 2005
Sean-
The top of you input hmmpfam file has these four lines:
hmmpfam - search one or more sequences against HMM database HMMER 2.3.2
(Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School of
Medicine Freely distributed under the GNU General Public License (GPL)
When I copied that out of you e-mail and into emacs on my computer I got
it all as one line like this:
hmmpfam - search one or more sequences against HMM database HMMER 2.3.2
(Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School of
Medicine Freely distributed under the GNU General Public License (GPL)
The parser needs to see a line beginning with HMMER to trigger setting
an internal variable called _reporttype (among other things). Since
this never happened with a single line header I got the same results you
did. Your header lines look OK in the e-mail you sent me, but assuming
you have the same problem with you header lines that I did, fixing that
should lead you to your next problem...
This line of your code has an error:
next unless ($hsp->name =~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
HSP objects don't have names but their associated hit objects do, so
this makes your script work as intended:
next unless ($hit->name =~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
I wonder if the header line problems could be the result of cross
platform cutting and pasting (or just cross platform file
incompatibilities). I noticed also that in you script that you sent the
'=' turned into '=3D'. What computer platforms do you use in your work?
HTH
Barry
-----Original Message-----
From: Sean O'Keeffe [mailto:limericksean at gmail.com]
Sent: Tuesday, August 30, 2005 9:28 AM
To: Barry Moore
Cc: bioperl-l at portal.open-bio.org
Subject: Re: [Bioperl-l] Bio::Search results
Hi Barry, thanks for the reply. Below is a snippet of the file (I
generated it with hmmpfam using the alignment flag set to -A 0, to
remove alignments - this shouldn't affect the parsing of the file) :
hmmpfam - search one or more sequences against HMM database HMMER 2.3.2
(Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School of
Medicine Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file: /usr/local/lib/pfam-tm
Sequence file: Mus_musculus.NCBIM34.jul.pep.fa-short
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query sequence: ENSMUSP00000089702
Accession: [none]
Description: [none]
Scores for sequence family classification (score includes all domains):
Model Description Score E-value
N
-------- ----------- ----- -------
---
[no hits above thresholds]
Parsed for domains:
Model Domain seq-f seq-t hmm-f hmm-t score E-value
-------- ------- ----- ----- ----- ----- ----- -------
[no hits above thresholds]
//
Query sequence: ENSMUSP00000089701
Accession: [none]
Description: [none]
Scores for sequence family classification (score includes all domains):
Model Description Score E-value
N
-------- ----------- ----- -------
---
[no hits above thresholds]
Parsed for domains:
Model Domain seq-f seq-t hmm-f hmm-t score E-value
-------- ------- ----- ----- ----- ----- ----- -------
[no hits above thresholds]
//
Query sequence: ENSMUSP00000020094
Accession: [none]
Description: [none]
Scores for sequence family classification (score includes all domains):
Model Description Score E-value
N
-------- ----------- ----- -------
---
LRR_1 Leucine Rich Repeat 55.0 3.7e-15
6
LRRNT Leucine rich repeat N-terminal domain 30.1 1.1e-07
1
Parsed for domains:
Model Domain seq-f seq-t hmm-f hmm-t score E-value
-------- ------- ----- ----- ----- ----- ----- -------
LRRNT 1/1 117 142 .. 1 34 [] 30.1 1.1e-07
LRR_1 1/6 168 191 .. 1 25 [] 14.7 0.0049
LRR_1 2/6 192 210 .. 1 25 [] 9.1 0.2
LRR_1 3/6 212 237 .. 1 25 [] 8.4 0.26
LRR_1 4/6 238 257 .. 1 25 [] 10.3 0.1
LRR_1 5/6 259 282 .. 1 25 [] 10.1 0.12
LRR_1 6/6 290 314 .. 1 25 [] 2.4 2
//
Query sequence: ENSMUSP00000074175
Accession: [none]
Description: [none]
Scores for sequence family classification (score includes all domains):
Model Description Score E-value
N
-------- ----------- ----- -------
---
CUB CUB domain 232.5 1.3e-68
2
Trypsin Trypsin 206.0 1.2e-60
1
Sushi Sushi domain (SCR repeat) 88.7 2.6e-25
2
EGF_CA Calcium binding EGF domain 29.4 1.8e-07
1
Parsed for domains:
Model Domain seq-f seq-t hmm-f hmm-t score E-value
-------- ------- ----- ----- ----- ----- ----- -------
CUB 1/2 16 137 .. 1 116 [] 70.0 1.1e-19
EGF_CA 1/1 141 188 .. 1 55 [] 29.4 1.8e-07
CUB 2/2 192 301 .. 1 116 [] 162.5 1.5e-47
Sushi 1/2 308 370 .. 1 62 [] 47.5 6.5e-13
Sushi 2/2 375 446 .. 1 62 [] 41.2 5.1e-11
Trypsin 1/1 463 698 .. 1 259 [] 206.0 1.2e-60
//
Cheers,
Sean.
On 8/30/05, Barry Moore <bmoore at genetics.utah.edu> wrote:
> Sean,
>
> Don't see anything obviously wrong. If you want to send your input
> file, I'll try to recreate the problem.
>
> Barry
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Sean
> O'Keeffe
> Sent: Tuesday, August 30, 2005 2:55 AM
> To: bioperl-l at portal.open-bio.org
> Subject: [Bioperl-l] Bio::Search results
>
> Hi,
> The following code snippet is something I use to extract information
> from hmmer result files:
>
> use Bio::SearchIO;
>
> my $in =3D new Bio::SearchIO( -format =3D> 'hmmer', -file =3D>
> $ARGV[0] ); while(my $result =3D $in->next_result) {
> print $result->query_name(), "\n",$result->query_description(),"\n";
> while (my $hit =3D $result->next_hit) {
> while(my $hsp =3D $hit->next_domain) {
> next unless ($hsp->name =3D~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
> print $hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n";
> }
> }
> }
>
> The input file is generated by hmmpfam and is given at the command
> line. I use it to scan for specific domain names e.g ig, fn3 lrr etc.
> This code works for the first loop and then ends so I get the name and
> description (no hsp values as their are none for this result):
>
> ENSMUSP00000065602=20
> pep:novel supercontig::NT_085813:405:1510:-1 gene:ENSMUSG00000054059
> transcript:ENSMUST00000066517
>
> My question is why does the loop end after one instance. Incidentally
> the outputted name and description above are the last ones in the
> hmmer file (maybe the file is read from the back??? - don't know if
> this means anything). Any thoughts would be appreciated. Thanks,
> Sean.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list