[BioRuby] [GSoC][NeXML and RDF API] Code Review.

Pjotr Prins pjotr.public14 at thebird.nl
Fri Jun 25 07:42:13 UTC 2010


I think this needs to be answered by Rutger. Are we going to face
NeXML files in the future that can easily outrun memory?

Pj.

On Fri, Jun 25, 2010 at 01:04:21PM +0530, Anurag Priyam wrote:
> > How much time would it cost you to stream the data - and what does it
> > mean with regard to changing the API? I guess, in general, NeXML
> > files won't be that large, so it may not be that important (Rutger)?
> >
> > Pj.
> >
> >
> I mean switching the parsing implementation to streaming from "parsing at
> the start" and not the API. Just that using Reader API over the DOM API
> would help in the switch. Even if we do not switch, the Reader API offers a
> more memory efficient solution than the DOM API.
> 
> Btw, I am not in a favour of switch. You cannot move backwards in document
> that way. I can not fetch a tree by id if I the cursor is ahead of that
> tree. Doing nexml.each_characters and nexml.each_trees is impossible with
> pure streaming. I will have to stream one while cache the other. Otus and
> otu provide a one to many relation with trees and characters, and rows. An
> API call of the type otus.trees or otus.characters or otu.seuences would be
> impossible( not that I have already added the API call ). Imo, NeXML is
> non-linear and not meant to be streamed. Besides other NeXML implementations
> also parse the file at the start.
> 
> -- 
> Anurag Priyam,
> 2nd Year Undergraduate,
> Department of Mechanical Engineering,
> IIT Kharagpur.
> +91-9775550642



More information about the BioRuby mailing list