[Bioperl-l] dealing with large files
Jason Stajich
jason at bioperl.org
Thu Dec 20 07:13:55 UTC 2007
It gets buffered via the OS -- Bio::Root::IO calls next_line
iteratively, but eventually the whole sequence object will get put
into RAM as it is built up.
zcat or bzcat can also be used for gzipped and bzipped files
respectively, I like to use this where I want to disk space footprint
down.
Because we treat data input usually as from a stream ignoring whether
it is in a file or not, we have to have a more flexible structure to
really handle this, although I'd argue the data really belongs in a
database when it is too big for memory.
More compact Feature/Location objects would probably also help here.
I would not be surprised if the memory requirement has more to do
with the number of features than length of the sequence - human chrom
1 can fit into memory just fine on most machines with 2GB of RAM.
But it would require someone taking an interest in some re-
architecting here.
-jason
On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:
>
> On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
>
>> my $in = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -
>> format => 'EMBL');
>
> This is just for the sake of curiosity, since you already found a
> solution to your problem, but I wonder how perl will handle a file
> opened this way. Will it try to suck the whole thing into ram in
> one go?
>
> Mike
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list