[Bioperl-l] bioperl based database infrastucture for directed graphs
Chris Fields
cjfields at uiuc.edu
Wed Jan 9 15:00:38 UTC 2008
On Jan 9, 2008, at 7:59 AM, Sendu Bala wrote:
> Robson Francisco de Souza wrote:
>> Before starting, I would like to know if the BioSQL and Chado
>> schemata
>> do have accelerators for quering intervals among billions of features
>> and feature relatioships (some examples using these databases would
>> also help, if they that these databases are efficient for such
>> tasks).
>> If these or other databases are not as suitable as
>> Bio::DB::SeqFeature
>> for feature retrieval based on interval overlap and attributes,
>
> I'm using Bio::DB::SeqFeature for that purpose, but just a warning:
> I found that with millions of features it made a db that was too
> large in terms of disc space and too slow in terms of query time. I
> had to hack out its storage of feature objects in the db, instead
> generating feature objects on request from the stored attributes.
> Doing this turned out to be faster than simply unfreezing certain
> kinds of feature objects!
Would this be Bio::SF::Annotated objects? If so I bet Storable is
storing the OntologyStore object information along with the SF (which
argues for refactoring the FeatureIO/Bio::SF::Annotated stuff in 1.7).
Not sure what can be done about that beyond your hack, though it might
be worth exploring whether one can optionally set the DB::Store to
store the object instance.
> (I also had to hack in support for retrieval by source, a patch that
> Lincoln hasn't gotten back to me about yet.)
>
> While I can't answer your main questions, I wish you good luck with
> your project and request that you keep us posted with what you
> achieve.
You can always try Lincoln on the GBrowse list as well. I would say
go ahead and commit the patch if it isn't a big deal.
chris
More information about the Bioperl-l
mailing list