[BioSQL-l] PhyloDB module updates
Hilmar Lapp
hlapp at gmx.net
Thu Feb 14 03:11:34 UTC 2008
Hi all,
I am proposing the following schema updates to the PhyloDB module.
Though I have committed these already, PhyloDB is pre-alpha and still
quite heavily evolving, so changes can be made relatively easily.
o Added tables tree_dbxref and node_dbxref tables to allow storing
identifiers and cross-references for trees and nodes.
Columns:
node_id 'The node to which the database corss-reference is being
assigned.'
dbxref_id 'The database cross-reference being assigned to the node.'
term_id 'The type of the database cross-reference as a controlled
vocabulary or ontology term. The type of a node identifier should be
primary identifier.'
And analogous for tree.
o Renamed edge_attribute_value and node_attribute_value to use
'qualifier' instead of 'attribute', for the sake of consistency
with the core schema (though attribute sounds like the better
name).
o Added tree_qualifier_value table to capture metadata for trees.
Columns:
tree_id 'The tree with which the metadata is being associated.'
term_id 'The name of the metadate element as a term from a controlled
vocabulary (or ontology).'
value 'The value of the metadata element.'
rank 'The index of the metadata value if there is more than one
value for the same metadata element. If there is only one value, this
may be left at the default of zero.'
o Replaced 1-n relationships (foreign keys) between bioentry and
node and taxon and node, respectively, with n-n relationships
(association tables). If the alignment is concatenated on
molecular data,
there may be more than one sequence, and these may not
necessarily be
from the same taxon (e.g., they might be from subspecies).
Columns:
node_id 'The node to which the taxon is being linked.'
taxon_id 'The taxon being linked to the node.'
rank 'The index of this taxon within the list of taxa being
linked to the node, if the order is significant. Typically, this will
be used to represent the position of the respective sequence within
the concatenated alignment, or the partition index.'
o Added tree_root table to allow storing of multiple roots (e.g., as
resulting from Bayesian analysis). A phylogenetic analysis might
suggest
several alternative root nodes, with possible probabilities.
Columns:
tree_id 'The tree for which the referenced node is a root node.'
node_id 'The node that is a root for the referenced tree.'
is_alternate 'True if the root note is the preferential (most likely)
root node of the tree, and false otherwise.'
significance 'The significance (such as likelihood, or posterior
probability) with which the node is the root node. This only has
meaning if the method used for reconstructing the tree calculates
this value.'
Also, as an aside, none of these changes are yet reflected (or
supported) in any of the scripts that have been written against the
schema. Once these changes are accepted I'll start working on that
though, and I'll also write a migration script.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the BioSQL-l
mailing list