[Bioperl-l] SeqIO out of memory

Hilmar Lapp hlapp at gnf.org
Fri Feb 28 15:58:55 EST 2003


Brian,

it's not the size of the whole file, but the size of some of it's 
entries. RefSeq actually contains some huge sequences, ranging from 
mitochondrial and plastid genomes, to large human contigs. If you want 
those along with others as Bio::Seq objects you need a very decent 
amount of memory; if you don't want them, use the SeqBuilder interface 
do deny them. See the documentation in Bio::Seq::SeqBuilder, there are 
examples on how to use it. Presently, the genbank parser is the only 
SeqIO parser supporting a SeqBuilder - because I had exactly your 
problem a couple months ago.

	-hilmar

On Friday, February 28, 2003, at 01:06  PM, Brian Osborne wrote:

> Bioperl-l,
> Check out this one-liner, where the input file is rscu.gbff, a
> Genbank-formatted file with 111,220 entries. The fasta file that's made
> contains only 42,451 entries. Is "Out of memory" the expected result 
> for an
> input file this size?
>
> ~/data/refseq>perl -e 'use Bio::SeqIO; $in =
> Bio::SeqIO->new(-file=>"rscu.gbff",
> -format=>"genbank"); open MYOUT,">rscu.fa"; while ( $seq =
> in->next_seq ){ print
>  MYOUT ">" . $seq->accession_number . "\n" . $seq->seq . "\n"; }'
>
> Out of memory during "large" request for 33558528 bytes, total sbrk() 
> is
> 3822837
> 76 bytes at /usr/lib/perl5/site_perl/5.8.0/Bio/Seq/RichSeq.pm line 114,
> <GEN0> l
> ine 6433958.
>
> Brian O.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list