[BioSQL-l] PhyloDB module updates

Thu Feb 14 03:11:34 UTC 2008

Hi all,

I am proposing the following schema updates to the PhyloDB module.  
Though I have committed these already, PhyloDB is pre-alpha and still  
quite heavily evolving, so changes can be made relatively easily.

   o Added tables tree_dbxref and node_dbxref tables to allow storing
     identifiers and cross-references for trees and nodes.

Columns:

node_id   'The node to which the database corss-reference is being  
assigned.'

dbxref_id 'The database cross-reference being assigned to the node.'

term_id   'The type of the database cross-reference as a controlled  
vocabulary or ontology term. The type of a node identifier should be  
primary identifier.'

And analogous for tree.

   o Renamed edge_attribute_value and node_attribute_value to use
     'qualifier' instead of 'attribute', for the sake of consistency
     with the core schema (though attribute sounds like the better
     name).

   o Added tree_qualifier_value table to capture metadata for trees.

Columns:

tree_id 'The tree with which the metadata is being associated.'

term_id 'The name of the metadate element as a term from a controlled  
vocabulary (or ontology).'

value   'The value of the metadata element.'

rank    'The index of the metadata value if there is more than one  
value for the same metadata element. If there is only one value, this  
may be left at the default of zero.'

   o Replaced 1-n relationships (foreign keys) between bioentry and
     node and taxon and node, respectively, with n-n relationships
     (association tables). If the alignment is concatenated on  
molecular data,
     there may be more than one sequence, and these may not  
necessarily be
     from the same taxon (e.g., they might be from subspecies).

Columns:

node_id  'The node to which the taxon is being linked.'

taxon_id 'The taxon being linked to the node.'

rank     'The index of this taxon within the list of taxa being  
linked to the node, if the order is significant. Typically, this will  
be used to represent the position of the respective sequence within  
the concatenated alignment, or the partition index.'

   o Added tree_root table to allow storing of multiple roots (e.g., as
     resulting from Bayesian analysis). A phylogenetic analysis might  
suggest
     several alternative root nodes, with possible probabilities.

Columns:

tree_id 'The tree for which the referenced node is a root node.'

node_id 'The node that is a root for the referenced tree.'

is_alternate 'True if the root note is the preferential (most likely)  
root node of the tree, and false otherwise.'

significance 'The significance (such as likelihood, or posterior  
probability) with which the node is the root node. This only has  
meaning if the method used for reconstructing the tree calculates  
this value.'

Also, as an aside, none of these changes are yet reflected (or  
supported) in any of the scripts that have been written against the  
schema. Once these changes are accepted I'll start working on that  
though, and I'll also write a migration script.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================