[BioRuby] [GSoC][NeXML and RDF API] Code Review.

Pjotr Prins pjotr.public14 at thebird.nl
Sun Jun 27 06:47:31 UTC 2010


Thanks Rutger and Hilmar,

Anurag, let's not load everything in memory.

Pj.

On Sat, Jun 26, 2010 at 05:30:19PM -0700, Hilmar Lapp wrote:
> Our ability to reconstruct trees of hundreds, thousands, and even tens  
> of thousands of characters has improved dramatically over the past  
> couple of years, and is increasingly often the goal of an analysis.  
> Genome-scale alignments also aren't so rare anymore.
>
> Aside from analysis, NeXML files can be produced by a database, and  
> hence could hold large taxonomies, or the tree of life.
>
> NeXML is an emerging standard. If implementations can't cope with the  
> large scale data that are becoming increasingly popular, it'll have a  
> hard time to get uptake.
>
> 	-hilmar
>
> On Jun 25, 2010, at 12:42 AM, Pjotr Prins wrote:
>
>> I think this needs to be answered by Rutger. Are we going to face
>> NeXML files in the future that can easily outrun memory?
>>
>> Pj.
>>
>> On Fri, Jun 25, 2010 at 01:04:21PM +0530, Anurag Priyam wrote:
>>>> How much time would it cost you to stream the data - and what does  
>>>> it
>>>> mean with regard to changing the API? I guess, in general, NeXML
>>>> files won't be that large, so it may not be that important (Rutger)?
>>>>
>>>> Pj.
>>>>
>>>>
>>> I mean switching the parsing implementation to streaming from  
>>> "parsing at
>>> the start" and not the API. Just that using Reader API over the DOM  
>>> API
>>> would help in the switch. Even if we do not switch, the Reader API  
>>> offers a
>>> more memory efficient solution than the DOM API.
>>>
>>> Btw, I am not in a favour of switch. You cannot move backwards in  
>>> document
>>> that way. I can not fetch a tree by id if I the cursor is ahead of  
>>> that
>>> tree. Doing nexml.each_characters and nexml.each_trees is impossible 
>>> with
>>> pure streaming. I will have to stream one while cache the other.  
>>> Otus and
>>> otu provide a one to many relation with trees and characters, and  
>>> rows. An
>>> API call of the type otus.trees or otus.characters or otu.seuences  
>>> would be
>>> impossible( not that I have already added the API call ). Imo, NeXML 
>>> is
>>> non-linear and not meant to be streamed. Besides other NeXML  
>>> implementations
>>> also parse the file at the start.
>>>
>>> -- 
>>> Anurag Priyam,
>>> 2nd Year Undergraduate,
>>> Department of Mechanical Engineering,
>>> IIT Kharagpur.
>>> +91-9775550642
>> _______________________________________________
>> BioRuby Project - http://www.bioruby.org/
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>



More information about the BioRuby mailing list