[Bioperl-l] SeqIO out of memory
Hilmar Lapp
hlapp at gnf.org
Fri Feb 28 15:58:55 EST 2003
Brian,
it's not the size of the whole file, but the size of some of it's
entries. RefSeq actually contains some huge sequences, ranging from
mitochondrial and plastid genomes, to large human contigs. If you want
those along with others as Bio::Seq objects you need a very decent
amount of memory; if you don't want them, use the SeqBuilder interface
do deny them. See the documentation in Bio::Seq::SeqBuilder, there are
examples on how to use it. Presently, the genbank parser is the only
SeqIO parser supporting a SeqBuilder - because I had exactly your
problem a couple months ago.
-hilmar
On Friday, February 28, 2003, at 01:06 PM, Brian Osborne wrote:
> Bioperl-l,
> Check out this one-liner, where the input file is rscu.gbff, a
> Genbank-formatted file with 111,220 entries. The fasta file that's made
> contains only 42,451 entries. Is "Out of memory" the expected result
> for an
> input file this size?
>
> ~/data/refseq>perl -e 'use Bio::SeqIO; $in =
> Bio::SeqIO->new(-file=>"rscu.gbff",
> -format=>"genbank"); open MYOUT,">rscu.fa"; while ( $seq =
> in->next_seq ){ print
> MYOUT ">" . $seq->accession_number . "\n" . $seq->seq . "\n"; }'
>
> Out of memory during "large" request for 33558528 bytes, total sbrk()
> is
> 3822837
> 76 bytes at /usr/lib/perl5/site_perl/5.8.0/Bio/Seq/RichSeq.pm line 114,
> <GEN0> l
> ine 6433958.
>
> Brian O.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list