[BioRuby] GFF3

Tomoaki NISHIYAMA tomoakin at kenroku.kanazawa-u.ac.jp
Wed Aug 18 08:21:24 UTC 2010


Hi,

Here is how the trans-splicing gene rps12 looks like in the genomic  
context.
http://www.ncbi.nlm.nih.gov/nuccore/7525012?report=graph&v=60000:170000

> In that case I can store the seekpos of every
> gene/location and use disk access instead.


It should be safe if you scan the data and store the position in the  
GFF file of
first and last record of every gene.

>  We can not assume that memory expansion keeps up with data load.
> It is fine as an 'optimization', but we should not take it for  
> granted.


The gene number within a genome doesn't grow so much.
So, the memory becomes problematic only if you are dealing
with multiple genomes or more fine features.

Saving memory is another kind of optimization.
It's good if we can achieve to do with less memory.
I just don't care much as far as the problem fit in the memory I can use
and run in a reasonable time.

> I avoid RDB (assuming you mean RDBMS, and not the Rwanda Development
> Board), until BioRuby comes with an RDBMS that can be used in a
> transparent fashion. You can not assume every user has an RDBMS  
> readily
> available.

Oh, I meant relational database. It is for flexibility.
Its just easier for me to use a RDBMS than to think of a new way
to do without it.  So, its just expression of my way.

If you are always to query from the gene name, then gene name to seekpos
index will be sufficient.
But, then I would rather consider to store the parsed data object in
PStore than to parse the GFF file again.
-- 
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan




More information about the BioRuby mailing list