[Bioperl-l] Memory Leak in Bio::SearchIO

Christopher Fields cjfields at uiuc.edu
Tue May 16 00:14:28 UTC 2006


---- Original message ----
>Date: Mon, 15 May 2006 15:40:15 -0400
>From: "Clarke, Wayne" <ClarkeW at agr.gc.ca>  
>Subject: [Bioperl-l] Memory Leak in Bio::SearchIO  
>To: <bioperl-l at lists.open-bio.org>
>
>Hey everyone, 
>
> 
>
>I have been developing some code to download and parse blast reports
>from a remote server using Soap::Lite as well as insert the results into
>a mysql database. The problem I am having is that my program seems to be
>taking up and huge amount of RAM. For a single job of 10000 queries it
>can consume as much as a couple hundred Mb inside an hour. 

If you're parsing 10000 queries (10000 different BLAST reports, right?) then it's 
not necessarily a memory leak as much as it is object creatio.  Each report 
generates hit objects which in turn generate hsp objects.  I think Jason 
recommends using the tabular output option (-m8 or -m9) for huge reports as 
it cuts down considerably on this.  If you are cycling through each report it 
shouldn't be as much of a problem unless your BLAST reports are really huge.  
Have you tried parsing a single report to see if the problem persists?

Now, if you are using Bioperl 1.5.1 with BLAST 2.2.13 or newer, you'll likely run 
into a problem with an infinite loop that occurs due to a change in NCBI's text 
output.  You can try updating bioperl from CVS in either case to see if that helps 
any.  Tabular output and XML output, AFAIK, is the same regardless of version; 
this bug only affected text output of BLAST reports.

> I realize
>that a lot of work is being done but this seems like way too much. This
>leads me to the subject of my post. I think I may have traced the source
>of the memory leak to Bio::SearchIO. I have used Devel::Size to track
>the size of my variables and done other debugging steps and have had no
>luck with resolving this very frustrating problem. My code is as
>follows:
>
> 
>
> my $result = $connector->getQueryResult($query_id);
>
> 
>
>                my $FH;
>
>                open $FH, "<", \$result;
>
> 
>
>                my $searchio = new Bio::SearchIO(-format => "blast",
>
> 
>
>                         -fh => $FH);
>
> 
>
>                while (my $o_blast = $searchio->next_result()) {
>
>                        my $clone_id = $o_blast->query_name();
>
> 
>
>                        my $statement = $bdbi->form_push_SQL ($o_blast,
>$clone_id, 5);
>
> 
>
>this is just the leading and tailing code surrounding the use of
>Bio::SearchIO since there is quite a lot. I am mostly just wondering if
>anyone has ever had problems with SearchIO and its memory usage. I
>looked at the source code for it but am afraid it is out of my league.
>Any help/suggestions/questions would be great. Thanks
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list