[Bioperl-l] Bio::Search results

Barry Moore bmoore at genetics.utah.edu
Mon Sep 5 14:18:25 EDT 2005


Stability of 1.5 depends on what parts you're using I guess.  You can
always keep 1.4 installed, and then have a live CVS version in your home
directory.  Just use your .bashrc or .cshrc file to set PERL5LIB to the
path in your home directory.  This will put the path to the CVS version
at the top of @INC so it gets used first.  Then in testing or if you
have problems, you could shift the 1.5 path off of @INC to revert to
1.4.

Barry

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Sean
O'Keeffe
Sent: Monday, September 05, 2005 3:26 AM
To: Barry Moore
Cc: Bioperl List
Subject: Re: [Bioperl-l] Bio::Search results

Ok I am using bp 1.4 and haven't tried the new 1.5 (not sure how
stable it is yet) - I might install and test it.
Sean.

On 9/2/05, Barry Moore <bmoore at genetics.utah.edu> wrote:
> Sean,
> 
> Am I guessing correctly that you are using bioperl 1.4?  You should
not
> have to have header information at the top of each result.  The
problem
> you are having is that the parser can read single result or multi
result
> files.  I didn't write this parser, so I won't pretend to know it
well,
> but when you are only getting the last result it is parsing your multi
> result file as a single result file and everything but the last result
> is being overwritten rather than pushed to an array (I think).  I
> introduced a problem in the header when I copied the data out of out
of
> your e-mail, and fixing that made it work.  I just tried it with
bioperl
> 1.4, and I get the same problem again (with a good header).  Diffing
the
> 1.4 and 1.5 versions shows a lot of changes.  If you're on 1.4 try
> upgrading to 1.5 or live from CVS, and see if that solves your
problem.
> The two attached files are your data, and your script as I have been
> running them.  They should work as is with 1.5.
> 
> Barry
> 
> -----Original Message-----
> From: Sean O'Keeffe [mailto:limericksean at gmail.com]
> Sent: Friday, September 02, 2005 3:32 AM
> To: Barry Moore
> Cc: Bioperl List
> Subject: Re: [Bioperl-l] Bio::Search results
> 
> I have those four lines (all contained on separate lines) but it
didn't
> work .
> 
> I ran "gawk '{if(/^\/\/$/) {print "//\n\nHMMER 2.3.2 (Oct 2003)"} else
> {print}}' my_file > whatever" on the pfam file to include a HMMER
> header after the // separator for each result.
> 
> It now works fine. I still don't understand why I need to include this
> header to parse the file. Although from memory don't Blast reports all
> include Blast header lines to separate next results?
> By the way I think that its the mailer that seems to be altering the
> text in these scripts (don't know where the "=3D" came from - it's not
> in my original). I'm running a standard linux (suse 9.0) using nedit
> as an editor.
> Sean.
> 
> On 9/2/05, Barry Moore <bmoore at genetics.utah.edu> wrote:
> > Sean I see that my mailer messed up the line formatting also.  What
I
> > meant was you should have the top four lines like this:
> >
> > #Line 1: hmmpfam - search one or more sequences against HMM database
> > #Line 2: HMMER 2.3.2(Oct 2003)
> > #Line 3: Copyright (C) 1992-2003 HHMI/Washington University School
of
> > #Line 4: Medicine Freely distributed under the GNU General Public
> > License (GPL)
> >
> > If you have that, and fix the hsp to hit error mentioned it works
for
> > me.
> >
> > Barry
> >
> > -----Original Message-----
> > From: bioperl-l-bounces at portal.open-bio.org
> > [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Barry
> Moore
> > Sent: Thursday, September 01, 2005 11:34 AM
> > To: Bioperl List
> > Subject: RE: [Bioperl-l] Bio::Search results
> >
> > Sean-
> >
> >
> >
> > The top of you input hmmpfam file has these four lines:
> >
> >
> >
> > hmmpfam - search one or more sequences against HMM database HMMER
> 2.3.2
> > (Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School
> of
> > Medicine Freely distributed under the GNU General Public License
(GPL)
> >
> >
> >
> > When I copied that out of you e-mail and into emacs on my computer I
> got
> > it all as one line like this:
> >
> >
> >
> > hmmpfam - search one or more sequences against HMM database HMMER
> 2.3.2
> > (Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School
> of
> > Medicine Freely distributed under the GNU General Public License
(GPL)
> >
> >
> >
> > The parser needs to see a line beginning with HMMER to trigger
setting
> > an internal variable called _reporttype (among other things).  Since
> > this never happened with a single line header I got the same results
> you
> > did.  Your header lines look OK in the e-mail you sent me, but
> assuming
> > you have the same problem with you header lines that I did, fixing
> that
> > should lead you to your next problem...
> >
> >
> >
> > This line of your code has an error:
> >
> >
> >
> > next unless ($hsp->name =~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
> >
> >
> >
> > HSP objects don't have names but their associated hit objects do, so
> > this makes your script work as intended:
> >
> >
> >
> > next unless ($hit->name =~ /^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
> >
> >
> >
> > I wonder if the header line problems could be the result of cross
> > platform cutting and pasting (or just cross platform file
> > incompatibilities).  I noticed also that in you script that you sent
> the
> > '=' turned into '=3D'.  What computer platforms do you use in your
> work?
> >
> >
> >
> > HTH
> >
> >
> >
> > Barry
> >
> >
> >
> > -----Original Message-----
> >
> > From: Sean O'Keeffe [mailto:limericksean at gmail.com]
> >
> > Sent: Tuesday, August 30, 2005 9:28 AM
> >
> > To: Barry Moore
> >
> > Cc: bioperl-l at portal.open-bio.org
> >
> > Subject: Re: [Bioperl-l] Bio::Search results
> >
> >
> >
> > Hi Barry, thanks for the reply. Below is a snippet of the file (I
> > generated it with hmmpfam using the alignment flag set to -A 0, to
> > remove alignments - this shouldn't affect the parsing of the file) :
> >
> >
> >
> > hmmpfam - search one or more sequences against HMM database HMMER
> 2.3.2
> > (Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School
> of
> > Medicine Freely distributed under the GNU General Public License
(GPL)
> >
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
> -
> >
> > HMM file:                 /usr/local/lib/pfam-tm
> >
> > Sequence file:            Mus_musculus.NCBIM34.jul.pep.fa-short
> >
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> >
> >
> >
> > Query sequence: ENSMUSP00000089702
> >
> > Accession:      [none]
> >
> > Description:    [none]
> >
> >
> >
> > Scores for sequence family classification (score includes all
> domains):
> >
> > Model    Description                                    Score
> E-value
> > N
> >
> > -------- -----------                                    -----
> -------
> > ---
> >
> >       [no hits above thresholds]
> >
> >
> >
> > Parsed for domains:
> >
> > Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
> >
> > -------- ------- ----- -----    ----- -----      -----  -------
> >
> >       [no hits above thresholds]
> >
> > //
> >
> >
> >
> > Query sequence: ENSMUSP00000089701
> >
> > Accession:      [none]
> >
> > Description:    [none]
> >
> >
> >
> > Scores for sequence family classification (score includes all
> domains):
> >
> > Model    Description                                    Score
> E-value
> > N
> >
> > -------- -----------                                    -----
> -------
> > ---
> >
> >       [no hits above thresholds]
> >
> >
> >
> > Parsed for domains:
> >
> > Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
> >
> > -------- ------- ----- -----    ----- -----      -----  -------
> >
> >       [no hits above thresholds]
> >
> > //
> >
> >
> >
> > Query sequence: ENSMUSP00000020094
> >
> > Accession:      [none]
> >
> > Description:    [none]
> >
> >
> >
> > Scores for sequence family classification (score includes all
> domains):
> >
> > Model    Description                                    Score
> E-value
> > N
> >
> > -------- -----------                                    -----
> -------
> > ---
> >
> > LRR_1    Leucine Rich Repeat                             55.0
> 3.7e-15
> > 6
> >
> > LRRNT    Leucine rich repeat N-terminal domain           30.1
> 1.1e-07
> > 1
> >
> >
> >
> > Parsed for domains:
> >
> > Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
> >
> > -------- ------- ----- -----    ----- -----      -----  -------
> >
> > LRRNT      1/1     117   142 ..     1    34 []    30.1  1.1e-07
> >
> > LRR_1      1/6     168   191 ..     1    25 []    14.7   0.0049
> >
> > LRR_1      2/6     192   210 ..     1    25 []     9.1      0.2
> >
> > LRR_1      3/6     212   237 ..     1    25 []     8.4     0.26
> >
> > LRR_1      4/6     238   257 ..     1    25 []    10.3      0.1
> >
> > LRR_1      5/6     259   282 ..     1    25 []    10.1     0.12
> >
> > LRR_1      6/6     290   314 ..     1    25 []     2.4        2
> >
> > //
> >
> >
> >
> > Query sequence: ENSMUSP00000074175
> >
> > Accession:      [none]
> >
> > Description:    [none]
> >
> >
> >
> > Scores for sequence family classification (score includes all
> domains):
> >
> > Model    Description                                    Score
> E-value
> > N
> >
> > -------- -----------                                    -----
> -------
> > ---
> >
> > CUB      CUB domain                                     232.5
> 1.3e-68
> > 2
> >
> > Trypsin  Trypsin                                        206.0
> 1.2e-60
> > 1
> >
> > Sushi    Sushi domain (SCR repeat)                       88.7
> 2.6e-25
> > 2
> >
> > EGF_CA   Calcium binding EGF domain                      29.4
> 1.8e-07
> > 1
> >
> >
> >
> > Parsed for domains:
> >
> > Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
> >
> > -------- ------- ----- -----    ----- -----      -----  -------
> >
> > CUB        1/2      16   137 ..     1   116 []    70.0  1.1e-19
> >
> > EGF_CA     1/1     141   188 ..     1    55 []    29.4  1.8e-07
> >
> > CUB        2/2     192   301 ..     1   116 []   162.5  1.5e-47
> >
> > Sushi      1/2     308   370 ..     1    62 []    47.5  6.5e-13
> >
> > Sushi      2/2     375   446 ..     1    62 []    41.2  5.1e-11
> >
> > Trypsin    1/1     463   698 ..     1   259 []   206.0  1.2e-60
> >
> > //
> >
> >
> >
> >
> >
> > Cheers,
> >
> > Sean.
> >
> >
> >
> >
> >
> >
> >
> > On 8/30/05, Barry Moore <bmoore at genetics.utah.edu> wrote:
> >
> > > Sean,
> >
> > >
> >
> > > Don't see anything obviously wrong.  If you want to send your
input
> >
> > > file, I'll try to recreate the problem.
> >
> > >
> >
> > > Barry
> >
> > >
> >
> > > -----Original Message-----
> >
> > > From: bioperl-l-bounces at portal.open-bio.org
> >
> > > [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Sean
> >
> > > O'Keeffe
> >
> > > Sent: Tuesday, August 30, 2005 2:55 AM
> >
> > > To: bioperl-l at portal.open-bio.org
> >
> > > Subject: [Bioperl-l] Bio::Search results
> >
> > >
> >
> > > Hi,
> >
> > > The following code snippet is something I use to extract
information
> >
> > > from hmmer result files:
> >
> > >
> >
> > > use Bio::SearchIO;
> >
> > >
> >
> > > my $in =3D new Bio::SearchIO( -format =3D> 'hmmer',  -file =3D>
> >
> > > $ARGV[0] ); while(my $result =3D $in->next_result) {
> >
> > >   print $result->query_name(),
> "\n",$result->query_description(),"\n";
> >
> > >   while (my $hit =3D $result->next_hit) {
> >
> > >     while(my $hsp =3D $hit->next_domain) {
> >
> > >       next unless ($hsp->name =3D~
/^ig|^lrr|^fn3|^egf|^tsp|^psi/i);
> >
> > >       print
$hsp->start(),"\t",$hsp->end(),"\t",$hsp->evalue(),"\n";
> >
> > >     }
> >
> > >   }
> >
> > > }
> >
> > >
> >
> > > The input file is generated by hmmpfam and is given at the command
> >
> > > line. I use it to scan for specific domain names e.g ig, fn3 lrr
> etc.
> >
> > > This code works for the first loop and then ends so I get the name
> and
> >
> >
> > > description (no hsp values as their are none for this result):
> >
> > >
> >
> > > ENSMUSP00000065602=20
> >
> > > pep:novel supercontig::NT_085813:405:1510:-1
gene:ENSMUSG00000054059
> >
> > > transcript:ENSMUST00000066517
> >
> > >
> >
> > > My question is why does the loop end after one instance.
> Incidentally
> >
> > > the outputted  name and description above are the last ones in the
> >
> > > hmmer file (maybe the file is read from the back??? - don't know
if
> >
> > > this means anything). Any thoughts would be appreciated. Thanks,
> >
> > > Sean.
> >
> > >
> >
> > > _______________________________________________
> >
> > > Bioperl-l mailing list
> >
> > > Bioperl-l at portal.open-bio.org
> >
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list