[Bioperl-l] repost the problem --- Re: bl2seq hang and itsperformace

Mon Dec 15 21:55:01 EST 2003

Thank you a lot, Jason!  I have revised the code as you advised.   It is
really a great saving!  Now the program runs at 9M memory with the same 2.6M
seq file as input.  Actually, I noticed that the memory consumed won't vary
with the number of sequences in the input file.

Regards
Haifeng

----- Original Message ----- 
From: "Jason Stajich" <jason at cgt.duhs.duke.edu>
To: "Liu Haifeng" <lhaifeng at dso.org.sg>
Cc: <bioperl-l at portal.open-bio.org>
Sent: 2003Äê12ÔÂ16ÈÕ 9:25
Subject: Re: [Bioperl-l] repost the problem --- Re: bl2seq hang and
itsperformace

> bioperl sequence objects aren't particularly robust for huge sequences and
> can have you run out of memory - presumably if you run your query on the
> cmd line with the files and take bioperl out of the loop is runs fine?
>
> You may need to rethink your strategy for searching and pre-create your
> sequence files to be more IO and memory efficient.
>
> Personally I find StandAloneBlast not the best module for lots of
> searches and prepare my pipeline to be leaner when I need to.
>
> You can do this yourself in a simple script.
>
> #psuedo code
> -- create your sequence files - see Bio::Seq::LargeSeq or Bio::DB::Fasta
>    for more memory efficient ways to manipulate large sequence files.
> -- generate unique names for your subsequences, use SeqIO to create the
>    files presumably if that will work.
> -- do the bl2seq
>   my $bl2seqfh;
>   open($bl2seqfh, "bl2seq -i $file1 -j $file2 -p blastn ... |")
>     || die($!);
>   Bioperl 1.3.x only code
>   my $searchio = new Bio::SearchIO(-format => 'blast',
>                            -fh     => $bl2seqfh);
>
>   my $r = $searchio->next_result;
>   # or use Bio::Tools::BPbl2seq if you have an earlier
>   # version of the toolkit.
>
>
> This is essentially what StandAloneBlast should be doing for you, but with
> the overhead and assumptions that you are passing Bio::SeqI objects and
> creating the temporary files for you, and cleaning them up as well.  One
> drawback/bug is I think it will still open and try and create Bio::SeqI
> objects even when you passing filenames - which may be the source of your
> problem, not sure - this may also have been fixed, I've not dug into the
> code lately.
>
> -jason
> On Mon, 15 Dec 2003, Liu Haifeng wrote:
>
> > Anyone can help?  Really urgent!
> >
> > Haifeng Liu
> > ----- Original Message -----
> > From: "Liu Haifeng" <lhaifeng at dso.org.sg>
> > To: <bioperl-l at portal.open-bio.org>
> > Sent: 2003å¹?2æœ?2æ—?14:49
> > Subject: bl2seq hang and its performace
> >
> >
> > > Hi all,
> > >
> > > I noticed that one of my program written using bioperl-1.2.3 runs very
> > slow
> > > and consumes huge memory, and I doubted that it is due to the call of
> > bl2seq
> > > in the program.  Thus, I wrote a small program (bl2seq sequences
against
> > > themselves from a fasta file) below to see if it is the ture:
> > >
> > >
> > > #!/usr/bin/perl -w
> > >        use Bio::SeqIO;
> > >       use Bio::Tools::Blast;
> > >        use Bio::Tools::Run::StandAloneBlast;
> > >        use Bio::Tools::BPlite;
> > >
> > >        my $infile =shift;
> > >        my $sno=0;
> > >        my $blastalgo="blastp"; #blastp ,blastx, tblastn, tblastx
> > >        my $pin = Bio::SeqIO->new('-file' => "$infile", '-format' =>
> > > 'Fasta');
> > >       while ( my $proseq = $pin -> next_seq()) {
> > >           $sno++;
> > >           print "bl2seq $sno ..............................\n";
> > >           my @params=('program' => $blastalo);
> > >           my $factory= Bio::Tools::Run::StandAloneBlast->new(@params);
> > >           $factory->io->_io_cleanup();
> > >           my $report=$factory->bl2seq($proseq, $proseq);
> > >           while (my $hsp=$report->next_feature) {
> > >               #only need the first hsp
> > >               $report->close();
> > >            }
> > >           undef $report;
> > >      }
> > >       print "running is over\n";
> > >
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > The program runs ok for the small fastat file.  However, when I input
a
> > > fasat file around 2.6M containing 10,000 protein sequences, the
program
> > > hangs when it compare the 1782th sequence.  Also I noticed that the
> > program
> > > has consume 12M of memory at that time.   I searched the archive that
> > there
> > > have been similar bl2seq problem occurred.  However, it should have
been
> > > solved in the latest version.
> > >
> > > Anyone can show me some clues to improve the performance of calling
> > bl2seq?
> > > Thank you.
> > >
> > > Regards
> > > Haifeng Liu
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>