[Biojava-l] unified consolidated annotation system

Kenny Yu kyu at biodiscovery.com
Wed Apr 30 10:45:28 EDT 2003


Both biosql and biodas are concerned with retrieving annotation on the basis of known features. They are not optimized for large-scale cross-system queries and data mining on the basis of annotations. I am interested in consolidating annotations from various sources (such as genbank and unigene) into a unified form on which I can run queries like "find features that are linked to DNA repair function in human but not in mouse" and  "find feature whose molecular functions overlap with those of Accession Number nnnn".  Similar mechanisms may exist in LION's SRS (http://www.lionbioscience.com/solutions/products/srs/relational). What I am contemplating is essentially a data warehouse for annotations. I'll borrow data warehouse and OLAP techniques in the design. It's a hybrid of relational, nested-relational and multidimensional database. Feature entity, or gene, and annotation are the dimensions. It models textual values as well as binary objects such as pathway diagrams as annotation values. I intend to make my work open-source and would call it UCAsql. Currently the schema design, with some documentation and java API, is available to anyone upon request by email to myself at kyu at biodiscovery.com.



More information about the Biojava-l mailing list