[BioRuby] [GSoC][NeXML and RDF API] Code Review.
Anurag Priyam
anurag08priyam at gmail.com
Sun Jun 27 08:49:37 UTC 2010
On Sun, Jun 27, 2010 at 2:13 PM, Pjotr Prins <pjotr.public14 at thebird.nl>wrote:
> On Sun, Jun 27, 2010 at 04:45:43PM +0900, Naohisa Goto wrote:
> > Hi,
> >
> > I think the ability to handle large data and the memory usage
> > whether or not to load all data in memory at a time, is essentially
> > independent. Not loading everything in memory does not guarantee
> > the ability to handle large data, due to the disk I/O bottleneck and
> > memory management overhead.
>
> Well, depends on what you plan to do with that data :). I think you
> are saying that streaming data may not be efficient, for example for
> treating alignments. That could be true. However, I think the default
> strategy should be non-memory bound, if possible. Throughout BioRuby
> the strategy is the opposite, at the moment. For example, by default
> FASTA files are loaded in RAM. Same for BLAST XML. I regularly have
> files that exceed RAM and work around these limitations. I don't think
> this should be the *default* strategy.
>
> I prefer the Unix way of using pipes. Only use memory when it is
> available.
>
> With new code we should design for big data. If it is done from the
> start, it takes no real effort.
>
> > I think it is currently OK to depend on memory. The price of memory
> > is gradually going down, and I think buying a machine with huge
> > memory could be a solution to treat large data.
>
> We can not all afford big machines. It would hamper many
> groups/students. RAM is getting cheaper, but data is growing faster.
>
> Anurag, what is the size of RAM you have access to?
>
>
3GB. The biggest sample file I am working with is 500 lines( characters.xml
in the examples ); working with it has hardly any effect on my memory. From,
where can I get a bigger one? I can test the memory consumption with a large
enough file and report.
--
Anurag Priyam,
2nd Year Undergraduate,
Department of Mechanical Engineering,
IIT Kharagpur.
+91-9775550642
More information about the BioRuby
mailing list