[Bioperl-l] Memory not sufficient when storing human chromosom 1 in BioSQL
Chris Fields
cjfields at uiuc.edu
Fri Jul 4 15:31:05 UTC 2008
On Jul 4, 2008, at 5:10 AM, Sendu Bala wrote:
> [CC:ing Gabrielle who had an identical problem]
>
> Chris Fields wrote:
>> On Jul 3, 2008, at 6:48 AM, Andreas Dräger wrote:
>>> Recently I have successfully installed the latest version of
>>> BioPerl and BioSQL on my computer, which has 2 GB RAM. Both works
>>> fine, but when trying to insert the genbank file of the human
>>> chromosome 1, which I have downloaded from the NCBI website (ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_01/hs_ref_chr1.gbk.gz
>>> ) I receive the error message 'Out of memory'. This takes about
>>> one hour. My question is, how I can insert large genbank files in
>>> my BioSQL database using BioPerl. I do not know, what to do. Thank
>>> you for your help!!!
>>
>> Have you tried just loading the sequence into memory using
>> Bio::SeqIO? The problem may be the size of the file itself.
>
> Just looping through:
> perl -MBio::SeqIO -e '$i Bio::SeqIO->new(-file =>
> "hs_ref_chr1.gbk"); while ($seq = $i->next_seq) { $ac = $seq-
> >accession; }'
>
> This gave me a variable memory usage, typically around 360MB,
> peaking up to 980MB before dropping back down again. Seems a little
> high to me, but it doesn't seem to be a memory leak?
>
>
> Keeping every seq object in memory:
> perl -MBio::SeqIO -e '$i Bio::SeqIO->new(-file =>
> "hs_ref_chr1.gbk"); @seqs; while ($seq = $i->next_seq) { push(@seqs,
> $seq); }'
>
> This used up to 810MB. I didn't notice any peakiness, but it may
> have been there.
>
> SeqIO by itself shouldn't be causing any out of memory errors on 2
> and 4GB machines.
>
>
> What does bioperl-db do as it enters sequences into the db? How does
> it currently deal with species information?
Are the 'latest versions of bioperl/bioperl-db' Andreas indicated
above the latest versions from Subversion, or 1.5.2? I can't recall
whether 1.5.2 shipped with the memory issue fixes re: Bio::Species (I
think it did, but maybe Sendu knows better than I). Some more info
from Andreas would also help, such as OS, RDBMS, etc.
Cold this be a combination of RDBMS, bioperl-db, and bioperl memory
issues? Of course that would depend on how the local MySQL/Pg/Oracle
is set up, but if the memory peaks out at 980MB (or 810MB for all
sequences) for Bio::SeqIO alone, I could see how bioperl-db and the
RDBMS may add quite a bit more to that.
If anyone has a local bioperl-db set up we should try replicating
this. Speaking of, does anyone know if we have set up bioperl-db
testing on dev (or wherever it was to be hosted)? This was discussed
at one point.
chris
More information about the Bioperl-l
mailing list