[Open-bio-l] Fwd: [BioSQL-l] Taxa
Hilmar Lapp
hlapp@gnf.org
Fri, 30 Aug 2002 15:05:36 -0700
No there's still no tree. I actually came up with a very similar
taxon tree for our local biosql build, a table with a FK to itself
as the FK to the parent node. I kept bioentry-taxon as a simple FK
constraint though, no n-n association.
The problem with the tree reaches farther: SQL essentially is bad at
traversing trees. In Oracle you have CONNECT BY which traverses in
one direction (would still be good enough here), but is otherwise
quite restrictive as to when you can use it (no joins, no views,
etc). I don't know what MySQL and Postgres have to offer in this
regard (I haven't seen anything in this regard in MySQL; Postgres
may have something through user implemented types or so). Some
people have come up with some tricks, but those require maintenance
of the tree nodes in the schema upon every insert/delete (may still
be reasonable for Taxon).
So the question is do you want to query the tree in SQL or in
software. For the latter you wouldn't have to provide a lot of
support in the schema I'd think, that's rather a matter of what the
object allows you to do. Since we're going to have limited graph
support pretty soon (Ontologies), one could expand that to have
TaxonNodes, but that's really independent of the adaptors, too.
-hilmar
Begin forwarded message:
> From: Matthew Pocock <matthew_pocock@yahoo.co.uk>
> Date: Fri Aug 30, 2002 12:31:18 PM US/Pacific
> To: Hilmar Lapp <hlapp@gnf.org>
> Cc: "OBDA (E-mail)" <open-bio-l@open-bio.org>, Biosql <biosql-
> l@open-bio.org>
> Subject: Re: [Open-bio-l] Fwd: [BioSQL-l] Taxa
>
> Hi Hilmar,
>
> Unique key constraints are cool. I'm all for that. The current taxa
> scheima seems a shame to me. I may be out of sync with the way
> things are done now, but in feb/march there was no taxa tree. There
> was no way to walk up and down the organism classification
> hierachy. That made it difficult to ask questions like "find
> accessions for all species that are cousins of rat".
>
> I see that in some cases all you want is a (normalized?) table of
> full organism strings. In other cases you want the full tree (one
> entity per node). Is this a case where BioSQL should just put it's
> hands up and say that there is a sequence/feature scheima &
> adapter, a range of possible taxa scheimas & adapters, and a
> service (or adapter) that bridges the two - getTaxaForSeqID and
> getSeqIDForTaxa?
>
> Matthew
>
> ps Taxa, Taxon - yeah, I got this wrong for the BioJava code. +1
> for changing plural to singular
>
> pps I have mySQL schemas for taxon support if you are interested
> but it's not rocket science - one table for taxon instance,
> properties & pairent pointer, and one table for taxon_bioentry
> associations
>
> Hilmar Lapp wrote:
>> Hmm. I could take zero response as consent to all changes
>> proposed -- would be great actually. To make double sure those who
>> might have wanted to see this proposal did see it I'm resending
>> and cross-posting it ... Shout now if you're concerned or don't
>> complain later.
>> Also, if this is the first time you see this and it is important
>> to you, you're probably not on the biosql mailing list although
>> you should be. Please subscribe (see footer for list info) since
>> this is the last time I'm going to cross-post.
>> -hilmar
>> Begin forwarded message:
>>> From: "Hilmar Lapp" <hlapp@gnf.org>
>>> Date: Wed Aug 28, 2002 02:23:41 PM US/Pacific
>>> To: "Biosql" <biosql-l@open-bio.org>
>>> Subject: [BioSQL-l] Taxa
>>>
>>> I propose to make the following changes to tables Taxa and
>>> Bioentry_Taxa, ranked by decreasing priority (still, I'd like to
>>> make all of these changes).
>>>
>>> 1) Introduce UK constraints. This makes it much safer (and
>>> potentially faster due to amenability to caching) to find records).
>>> My suggestions for UK candidates are common_name, full_binomial (see
>>> 2.), and ncbi_taxa_id. (There are no UKs yet.)
>>>
>>> 2) Introduce an additional column full_binomial ('full' in the
>>> bioperl Bio::Species sense: with ssp if applicable). I think this is
>>> going to be searched against quite frequently, and I just don't
>>> think it's very practical to always extract this out of full_lineage.
>>>
>>> 3) Collapse Taxa and Bioentry_Taxa into one and add Taxa_Id as a
>>> nullable FK to Bioentry. Having this additional association table
>>> doesn't add any functionality we'd like to use, but instead can only
>>> decrease performance and enforceability. Also, it necessitates
>>> otherwise pointless code in the adaptors.
>>>
>>> 4) For consistency, rename Taxa and all Taxa related columns to
>>> Taxon (Taxa is plural, all other table names are singular).
>>>
>>> What do people think? Any outcries? If I make these changes,
>>> adaptors in different projects will have to be changed (I'll change
>>> the ones involved in Bioperl-db).
>>>
>>> -hilmar
>>> -- -------------------------------------------------------------
>>> Hilmar Lapp email: lapp at gnf.org
>>> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
>>> -------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>> -- -------------------------------------------------------------
>> Hilmar Lapp email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
>> -------------------------------------------------------------
>> _______________________________________________
>> Open-Bio-l mailing list
>> Open-Bio-l@open-bio.org
>> http://open-bio.org/mailman/listinfo/open-bio-l
>
>
> -- BioJava Consulting LTD - Support and training for BioJava
> http://www.biojava.co.uk
>
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------