[Bioperl-l] repost the problem --- Re: bl2seq hang and
itsperformace
Liu Haifeng
lhaifeng at dso.org.sg
Mon Dec 15 21:55:01 EST 2003
Thank you a lot, Jason! I have revised the code as you advised. It is
really a great saving! Now the program runs at 9M memory with the same 2.6M
seq file as input. Actually, I noticed that the memory consumed won't vary
with the number of sequences in the input file.
Regards
Haifeng
----- Original Message -----
From: "Jason Stajich" <jason at cgt.duhs.duke.edu>
To: "Liu Haifeng" <lhaifeng at dso.org.sg>
Cc: <bioperl-l at portal.open-bio.org>
Sent: 2003Äê12ÔÂ16ÈÕ 9:25
Subject: Re: [Bioperl-l] repost the problem --- Re: bl2seq hang and
itsperformace
> bioperl sequence objects aren't particularly robust for huge sequences and
> can have you run out of memory - presumably if you run your query on the
> cmd line with the files and take bioperl out of the loop is runs fine?
>
> You may need to rethink your strategy for searching and pre-create your
> sequence files to be more IO and memory efficient.
>
> Personally I find StandAloneBlast not the best module for lots of
> searches and prepare my pipeline to be leaner when I need to.
>
> You can do this yourself in a simple script.
>
> #psuedo code
> -- create your sequence files - see Bio::Seq::LargeSeq or Bio::DB::Fasta
> for more memory efficient ways to manipulate large sequence files.
> -- generate unique names for your subsequences, use SeqIO to create the
> files presumably if that will work.
> -- do the bl2seq
> my $bl2seqfh;
> open($bl2seqfh, "bl2seq -i $file1 -j $file2 -p blastn ... |")
> || die($!);
> Bioperl 1.3.x only code
> my $searchio = new Bio::SearchIO(-format => 'blast',
> -fh => $bl2seqfh);
>
> my $r = $searchio->next_result;
> # or use Bio::Tools::BPbl2seq if you have an earlier
> # version of the toolkit.
>
>
> This is essentially what StandAloneBlast should be doing for you, but with
> the overhead and assumptions that you are passing Bio::SeqI objects and
> creating the temporary files for you, and cleaning them up as well. One
> drawback/bug is I think it will still open and try and create Bio::SeqI
> objects even when you passing filenames - which may be the source of your
> problem, not sure - this may also have been fixed, I've not dug into the
> code lately.
>
> -jason
> On Mon, 15 Dec 2003, Liu Haifeng wrote:
>
> > Anyone can help? Really urgent!
> >
> > Haifeng Liu
> > ----- Original Message -----
> > From: "Liu Haifeng" <lhaifeng at dso.org.sg>
> > To: <bioperl-l at portal.open-bio.org>
> > Sent: 2003å¹?2æ?2æ?14:49
> > Subject: bl2seq hang and its performace
> >
> >
> > > Hi all,
> > >
> > > I noticed that one of my program written using bioperl-1.2.3 runs very
> > slow
> > > and consumes huge memory, and I doubted that it is due to the call of
> > bl2seq
> > > in the program. Thus, I wrote a small program (bl2seq sequences
against
> > > themselves from a fasta file) below to see if it is the ture:
> > >
> > >
> > > #!/usr/bin/perl -w
> > > use Bio::SeqIO;
> > > use Bio::Tools::Blast;
> > > use Bio::Tools::Run::StandAloneBlast;
> > > use Bio::Tools::BPlite;
> > >
> > > my $infile =shift;
> > > my $sno=0;
> > > my $blastalgo="blastp"; #blastp ,blastx, tblastn, tblastx
> > > my $pin = Bio::SeqIO->new('-file' => "$infile", '-format' =>
> > > 'Fasta');
> > > while ( my $proseq = $pin -> next_seq()) {
> > > $sno++;
> > > print "bl2seq $sno ..............................\n";
> > > my @params=('program' => $blastalo);
> > > my $factory= Bio::Tools::Run::StandAloneBlast->new(@params);
> > > $factory->io->_io_cleanup();
> > > my $report=$factory->bl2seq($proseq, $proseq);
> > > while (my $hsp=$report->next_feature) {
> > > #only need the first hsp
> > > $report->close();
> > > }
> > > undef $report;
> > > }
> > > print "running is over\n";
> > >
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > The program runs ok for the small fastat file. However, when I input
a
> > > fasat file around 2.6M containing 10,000 protein sequences, the
program
> > > hangs when it compare the 1782th sequence. Also I noticed that the
> > program
> > > has consume 12M of memory at that time. I searched the archive that
> > there
> > > have been similar bl2seq problem occurred. However, it should have
been
> > > solved in the latest version.
> > >
> > > Anyone can show me some clues to improve the performance of calling
> > bl2seq?
> > > Thank you.
> > >
> > > Regards
> > > Haifeng Liu
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list