[Bioperl-l] Bio::Search results

Mon Sep 5 05:26:06 EDT 2005

Ok I am using bp 1.4 and haven't tried the new 1.5 (not sure how
stable it is yet) - I might install and test it.
Sean.

On 9/2/05, Barry Moore <bmoore at genetics.utah.edu> wrote:
> Sean,
> 
> Am I guessing correctly that you are using bioperl 1.4?  You should not
> have to have header information at the top of each result.  The problem
> you are having is that the parser can read single result or multi result
> files.  I didn't write this parser, so I won't pretend to know it well,
> but when you are only getting the last result it is parsing your multi
> result file as a single result file and everything but the last result
> is being overwritten rather than pushed to an array (I think).  I
> introduced a problem in the header when I copied the data out of out of
> your e-mail, and fixing that made it work.  I just tried it with bioperl
> 1.4, and I get the same problem again (with a good header).  Diffing the
> 1.4 and 1.5 versions shows a lot of changes.  If you're on 1.4 try
> upgrading to 1.5 or live from CVS, and see if that solves your problem.
> The two attached files are your data, and your script as I have been
> running them.  They should work as is with 1.5.
> 
> Barry
> 
> -----Original Message-----
> From: Sean O'Keeffe [mailto:limericksean at gmail.com]
> Sent: Friday, September 02, 2005 3:32 AM
> To: Barry Moore
> Cc: Bioperl List
> Subject: Re: [Bioperl-l] Bio::Search results
> 
> I have those four lines (all contained on separate lines) but it didn't
> work .
> 
> I ran "gawk '{if(/^\/\/$/) {print "//\n\nHMMER 2.3.2 (Oct 2003)"} else
> {print}}' my_file > whatever" on the pfam file to include a HMMER
> header after the // separator for each result.
> 
> It now works fine. I still don't understand why I need to include this
> header to parse the file. Although from memory don't Blast reports all
> include Blast header lines to separate next results?
> By the way I think that its the mailer that seems to be altering the
> text in these scripts (don't know where the "=3D" came from - it's not
> in my original). I'm running a standard linux (suse 9.0) using nedit
> as an editor.
> Sean.
> 
> On 9/2/05, Barry Moore <bmoore at genetics.utah.edu> wrote:
> > Sean I see that my mailer messed up the line formatting also.  What I
> > meant was you should have the top four lines like this:
> >
> > #Line 1: hmmpfam - search one or more sequences against HMM database
> > #Line 2: HMMER 2.3.2(Oct 2003)
> > #Line 3: Copyright (C) 1992-2003 HHMI/Washington University School of
> > #Line 4: Medicine Freely distributed under the GNU General Public
> > License (GPL)
> >
> > If you have that, and fix the hsp to hit error mentioned it works for
> > me.
> >
> > Barry
> >
> > -----Original Message-----
> > From: bioperl-l-bounces at portal.open-bio.org
> > [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Barry
> Moore
> > Sent: Thursday, September 01, 2005 11:34 AM
> > To: Bioperl List
> > Subject: RE: [Bioperl-l] Bio::Search results
> >
> > Sean-
> >
> >
> >
> > The top of you input hmmpfam file has these four lines:
> >
> >
> >
> > hmmpfam - search one or more sequences against HMM database HMMER
> 2.3.2
> > (Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School
> of
> > Medicine Freely distributed under the GNU General Public License (GPL)
> >
> >
> >
> > When I copied that out of you e-mail and into emacs on my computer I
> got
> > it all as one line like this:
> >
> >
> >
> > hmmpfam - search one or more sequences against HMM database HMMER
> 2.3.2
> > (Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School
> of
> > Medicine Freely distributed under the GNU General Public License (GPL)
> >
> >
> >
> > The parser needs to see a line beginning with HMMER to trigger setting
> > an internal variable called _reporttype (among other things).  Since
> > this never happened with a single line header I got the same results
> you
> > did.  Your header lines look OK in the e-mail you sent me, but
> assuming
> > you have the same problem with you header lines that I did, fixing
> that
> > should lead you to your next problem...
> >
> >
> >
> > This line of your code has an error:
> >
> >
> >
> > next unless ($hsp->name =~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
> >
> >
> >
> > HSP objects don't have names but their associated hit objects do, so
> > this makes your script work as intended:
> >
> >
> >
> > next unless ($hit->name =~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
> >
> >
> >
> > I wonder if the header line problems could be the result of cross
> > platform cutting and pasting (or just cross platform file
> > incompatibilities).  I noticed also that in you script that you sent
> the
> > '=' turned into '=3D'.  What computer platforms do you use in your
> work?
> >
> >
> >
> > HTH
> >
> >
> >
> > Barry
> >
> >
> >
> > -----Original Message-----
> >
> > From: Sean O'Keeffe [mailto:limericksean at gmail.com]
> >
> > Sent: Tuesday, August 30, 2005 9:28 AM
> >
> > To: Barry Moore
> >
> > Cc: bioperl-l at portal.open-bio.org
> >
> > Subject: Re: [Bioperl-l] Bio::Search results
> >
> >
> >
> > Hi Barry, thanks for the reply. Below is a snippet of the file (I
> > generated it with hmmpfam using the alignment flag set to -A 0, to
> > remove alignments - this shouldn't affect the parsing of the file) :
> >
> >
> >
> > hmmpfam - search one or more sequences against HMM database HMMER
> 2.3.2
> > (Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School
> of
> > Medicine Freely distributed under the GNU General Public License (GPL)
> >
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> -
> >
> > HMM file:                 /usr/local/lib/pfam-tm
> >
> > Sequence file:            Mus_musculus.NCBIM34.jul.pep.fa-short
> >
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> >
> >
> >
> > Query sequence: ENSMUSP00000089702
> >
> > Accession:      [none]
> >
> > Description:    [none]
> >
> >
> >
> > Scores for sequence family classification (score includes all
> domains):
> >
> > Model    Description                                    Score
> E-value
> > N
> >
> > -------- -----------                                    -----
> -------
> > ---
> >
> >       [no hits above thresholds]
> >
> >
> >
> > Parsed for domains:
> >
> > Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
> >
> > -------- ------- ----- -----    ----- -----      -----  -------
> >
> >       [no hits above thresholds]
> >
> > //
> >
> >
> >
> > Query sequence: ENSMUSP00000089701
> >
> > Accession:      [none]
> >
> > Description:    [none]
> >
> >
> >
> > Scores for sequence family classification (score includes all
> domains):
> >
> > Model    Description                                    Score
> E-value
> > N
> >
> > -------- -----------                                    -----
> -------
> > ---
> >
> >       [no hits above thresholds]
> >
> >
> >
> > Parsed for domains:
> >
> > Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
> >
> > -------- ------- ----- -----    ----- -----      -----  -------
> >
> >       [no hits above thresholds]
> >
> > //
> >
> >
> >
> > Query sequence: ENSMUSP00000020094
> >
> > Accession:      [none]
> >
> > Description:    [none]
> >
> >
> >
> > Scores for sequence family classification (score includes all
> domains):
> >
> > Model    Description                                    Score
> E-value
> > N
> >
> > -------- -----------                                    -----
> -------
> > ---
> >
> > LRR_1    Leucine Rich Repeat                             55.0
> 3.7e-15
> > 6
> >
> > LRRNT    Leucine rich repeat N-terminal domain           30.1
> 1.1e-07
> > 1
> >
> >
> >
> > Parsed for domains:
> >
> > Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
> >
> > -------- ------- ----- -----    ----- -----      -----  -------
> >
> > LRRNT      1/1     117   142 ..     1    34 []    30.1  1.1e-07
> >
> > LRR_1      1/6     168   191 ..     1    25 []    14.7   0.0049
> >
> > LRR_1      2/6     192   210 ..     1    25 []     9.1      0.2
> >
> > LRR_1      3/6     212   237 ..     1    25 []     8.4     0.26
> >
> > LRR_1      4/6     238   257 ..     1    25 []    10.3      0.1
> >
> > LRR_1      5/6     259   282 ..     1    25 []    10.1     0.12
> >
> > LRR_1      6/6     290   314 ..     1    25 []     2.4        2
> >
> > //
> >
> >
> >
> > Query sequence: ENSMUSP00000074175
> >
> > Accession:      [none]
> >
> > Description:    [none]
> >
> >
> >
> > Scores for sequence family classification (score includes all
> domains):
> >
> > Model    Description                                    Score
> E-value
> > N
> >
> > -------- -----------                                    -----
> -------
> > ---
> >
> > CUB      CUB domain                                     232.5
> 1.3e-68
> > 2
> >
> > Trypsin  Trypsin                                        206.0
> 1.2e-60
> > 1
> >
> > Sushi    Sushi domain (SCR repeat)                       88.7
> 2.6e-25
> > 2
> >
> > EGF_CA   Calcium binding EGF domain                      29.4
> 1.8e-07
> > 1
> >
> >
> >
> > Parsed for domains:
> >
> > Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
> >
> > -------- ------- ----- -----    ----- -----      -----  -------
> >
> > CUB        1/2      16   137 ..     1   116 []    70.0  1.1e-19
> >
> > EGF_CA     1/1     141   188 ..     1    55 []    29.4  1.8e-07
> >
> > CUB        2/2     192   301 ..     1   116 []   162.5  1.5e-47
> >
> > Sushi      1/2     308   370 ..     1    62 []    47.5  6.5e-13
> >
> > Sushi      2/2     375   446 ..     1    62 []    41.2  5.1e-11
> >
> > Trypsin    1/1     463   698 ..     1   259 []   206.0  1.2e-60
> >
> > //
> >
> >
> >
> >
> >
> > Cheers,
> >
> > Sean.
> >
> >
> >
> >
> >
> >
> >
> > On 8/30/05, Barry Moore <bmoore at genetics.utah.edu> wrote:
> >
> > > Sean,
> >
> > >
> >
> > > Don't see anything obviously wrong.  If you want to send your input
> >
> > > file, I'll try to recreate the problem.
> >
> > >
> >
> > > Barry
> >
> > >
> >
> > > -----Original Message-----
> >
> > > From: bioperl-l-bounces at portal.open-bio.org
> >
> > > [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Sean
> >
> > > O'Keeffe
> >
> > > Sent: Tuesday, August 30, 2005 2:55 AM
> >
> > > To: bioperl-l at portal.open-bio.org
> >
> > > Subject: [Bioperl-l] Bio::Search results
> >
> > >
> >
> > > Hi,
> >
> > > The following code snippet is something I use to extract information
> >
> > > from hmmer result files:
> >
> > >
> >
> > > use Bio::SearchIO;
> >
> > >
> >
> > > my $in =3D new Bio::SearchIO( -format =3D> 'hmmer',  -file =3D>
> >
> > > $ARGV[0] ); while(my $result =3D $in->next_result) {
> >
> > >   print $result->query_name(),
> "\n",$result->query_description(),"\n";
> >
> > >   while (my $hit =3D $result->next_hit) {
> >
> > >     while(my $hsp =3D $hit->next_domain) {
> >
> > >       next unless ($hsp->name =3D~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
> >
> > >       print $hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n";
> >
> > >     }
> >
> > >   }
> >
> > > }
> >
> > >
> >
> > > The input file is generated by hmmpfam and is given at the command
> >
> > > line. I use it to scan for specific domain names e.g ig, fn3 lrr
> etc.
> >
> > > This code works for the first loop and then ends so I get the name
> and
> >
> >
> > > description (no hsp values as their are none for this result):
> >
> > >
> >
> > > ENSMUSP00000065602=20
> >
> > > pep:novel supercontig::NT_085813:405:1510:-1 gene:ENSMUSG00000054059
> >
> > > transcript:ENSMUST00000066517
> >
> > >
> >
> > > My question is why does the loop end after one instance.
> Incidentally
> >
> > > the outputted  name and description above are the last ones in the
> >
> > > hmmer file (maybe the file is read from the back??? - don't know if
> >
> > > this means anything). Any thoughts would be appreciated. Thanks,
> >
> > > Sean.
> >
> > >
> >
> > > _______________________________________________
> >
> > > Bioperl-l mailing list
> >
> > > Bioperl-l at portal.open-bio.org
> >
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
>