[Bioperl-l] Memory not sufficient when storing human chromosom 1 in BioSQL
Sendu Bala
bix at sendu.me.uk
Fri Jul 4 10:10:53 UTC 2008
[CC:ing Gabrielle who had an identical problem]
Chris Fields wrote:
> On Jul 3, 2008, at 6:48 AM, Andreas Dräger wrote:
>> Recently I have successfully installed the latest version of BioPerl
>> and BioSQL on my computer, which has 2 GB RAM. Both works fine, but
>> when trying to insert the genbank file of the human chromosome 1,
>> which I have downloaded from the NCBI website
>> (ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_01/hs_ref_chr1.gbk.gz) I
>> receive the error message 'Out of memory'. This takes about one hour.
>> My question is, how I can insert large genbank files in my BioSQL
>> database using BioPerl. I do not know, what to do. Thank you for your
>> help!!!
>
> Have you tried just loading the sequence into memory using Bio::SeqIO?
> The problem may be the size of the file itself.
Just looping through:
perl -MBio::SeqIO -e '$i Bio::SeqIO->new(-file => "hs_ref_chr1.gbk");
while ($seq = $i->next_seq) { $ac = $seq->accession; }'
This gave me a variable memory usage, typically around 360MB, peaking up
to 980MB before dropping back down again. Seems a little high to me, but
it doesn't seem to be a memory leak?
Keeping every seq object in memory:
perl -MBio::SeqIO -e '$i Bio::SeqIO->new(-file => "hs_ref_chr1.gbk");
@seqs; while ($seq = $i->next_seq) { push(@seqs, $seq); }'
This used up to 810MB. I didn't notice any peakiness, but it may have
been there.
SeqIO by itself shouldn't be causing any out of memory errors on 2 and
4GB machines.
What does bioperl-db do as it enters sequences into the db? How does it
currently deal with species information?
More information about the Bioperl-l
mailing list