[Bioperl-l] SeqIO out of memory
Brian Osborne
brian_osborne at cognia.com
Mon Mar 3 07:44:10 EST 2003
Ewan and Hilmar,
Yes, I found the offending entry a few minutes after I posted, thanks Jason.
Arabidopsis chromosome 1! According to the documentation it shouldn't be
there - expect the unexpected at LocusLink.
Thanks again,
Brian O.
-----Original Message-----
From: bioperl-l-bounces at bioperl.org [mailto:bioperl-l-bounces at bioperl.org]On
Behalf Of Ewan Birney
Sent: Saturday, March 01, 2003 7:30 AM
To: Hilmar Lapp
Cc: Brian Osborne; Bioperl
Subject: Re: [Bioperl-l] SeqIO out of memory
On Fri, 28 Feb 2003, Hilmar Lapp wrote:
> Brian,
>
> it's not the size of the whole file, but the size of some of it's
> entries. RefSeq actually contains some huge sequences, ranging from
> mitochondrial and plastid genomes, to large human contigs. If you want
> those along with others as Bio::Seq objects you need a very decent
> amount of memory; if you don't want them, use the SeqBuilder interface
> do deny them. See the documentation in Bio::Seq::SeqBuilder, there are
> examples on how to use it. Presently, the genbank parser is the only
> SeqIO parser supporting a SeqBuilder - because I had exactly your
> problem a couple months ago.
>
Aha. This a better explanation. ;)
> -hilmar
>
> On Friday, February 28, 2003, at 01:06 PM, Brian Osborne wrote:
>
> > Bioperl-l,
> > Check out this one-liner, where the input file is rscu.gbff, a
> > Genbank-formatted file with 111,220 entries. The fasta file that's made
> > contains only 42,451 entries. Is "Out of memory" the expected result
> > for an
> > input file this size?
> >
> > ~/data/refseq>perl -e 'use Bio::SeqIO; $in =
> > Bio::SeqIO->new(-file=>"rscu.gbff",
> > -format=>"genbank"); open MYOUT,">rscu.fa"; while ( $seq =
> > in->next_seq ){ print
> > MYOUT ">" . $seq->accession_number . "\n" . $seq->seq . "\n"; }'
> >
> > Out of memory during "large" request for 33558528 bytes, total sbrk()
> > is
> > 3822837
> > 76 bytes at /usr/lib/perl5/site_perl/5.8.0/Bio/Seq/RichSeq.pm line 114,
> > <GEN0> l
> > ine 6433958.
> >
> > Brian O.
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp email: lapp at gnf.org
> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
> -------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list