[Bioperl-l] dealing with large files
Stefano Ghignone
ste.ghi at libero.it
Thu Dec 20 13:57:54 UTC 2007
I was wandering if, working with so big FILE, should be better first index the database, than query it formatting the sequences as one want...
> It gets buffered via the OS -- Bio::Root::IO calls next_line
> iteratively, but eventually the whole sequence object will get put
> into RAM as it is built up.
> zcat or bzcat can also be used for gzipped and bzipped files
> respectively, I like to use this where I want to disk space footprint
> down.
>
> Because we treat data input usually as from a stream ignoring whether
> it is in a file or not, we have to have a more flexible structure to
> really handle this, although I'd argue the data really belongs in a
> database when it is too big for memory.
> More compact Feature/Location objects would probably also help here.
> I would not be surprised if the memory requirement has more to do
> with the number of features than length of the sequence - human chrom
> 1 can fit into memory just fine on most machines with 2GB of RAM.
>
> But it would require someone taking an interest in some re-
> architecting here.
>
> -jason
>
> On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:
>
> >
> > On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
> >
> >> my $in = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -
> >> format => 'EMBL');
> >
> > This is just for the sake of curiosity, since you already found a
> > solution to your problem, but I wonder how perl will handle a file
> > opened this way. Will it try to suck the whole thing into ram in
> > one go?
> >
> > Mike
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list