[Open-bio-l] Fwd: [BioSQL-l] Taxa

Hilmar Lapp hlapp@gnf.org
Fri, 30 Aug 2002 15:05:36 -0700


No there's still no tree. I actually came up with a very similar 
taxon tree for our local biosql build, a table with a FK to itself 
as the FK to the parent node. I kept bioentry-taxon as a simple FK 
constraint though, no n-n association.

The problem with the tree reaches farther: SQL essentially is bad at 
traversing trees. In Oracle you have CONNECT BY which traverses in 
one direction (would still be good enough here), but is otherwise 
quite restrictive as to when you can use it (no joins, no views, 
etc). I don't know what MySQL and Postgres have to offer in this 
regard (I haven't seen anything in this regard in MySQL; Postgres 
may have something through user implemented types or so). Some 
people have come up with some tricks, but those require maintenance 
of the tree nodes in the schema upon every insert/delete (may still 
be reasonable for Taxon).

So the question is do you want to query the tree in SQL or in 
software. For the latter you wouldn't have to provide a lot of 
support in the schema I'd think, that's rather a matter of what the 
object allows you to do. Since we're going to have limited graph 
support pretty soon (Ontologies), one could expand that to have 
TaxonNodes, but that's really independent of the adaptors, too.

	-hilmar

Begin forwarded message:

> From: Matthew Pocock <matthew_pocock@yahoo.co.uk>
> Date: Fri Aug 30, 2002  12:31:18  PM US/Pacific
> To: Hilmar Lapp <hlapp@gnf.org>
> Cc: "OBDA (E-mail)" <open-bio-l@open-bio.org>, Biosql <biosql-
> l@open-bio.org>
> Subject: Re: [Open-bio-l] Fwd: [BioSQL-l] Taxa
>
> Hi Hilmar,
>
> Unique key constraints are cool. I'm all for that. The current taxa 
> scheima seems a shame to me. I may be out of sync with the way 
> things are done now, but in feb/march there was no taxa tree. There 
> was no way to walk up and down the organism classification 
> hierachy. That made it difficult to ask questions like "find 
> accessions for all species that are cousins of rat".
>
> I see that in some cases all you want is a (normalized?) table of 
> full organism strings. In other cases you want the full tree (one 
> entity per node). Is this a case where BioSQL should just put it's 
> hands up and say that there is a sequence/feature scheima & 
> adapter, a range of possible taxa scheimas & adapters, and a 
> service (or adapter) that bridges the two - getTaxaForSeqID and 
> getSeqIDForTaxa?
>
> Matthew
>
> ps Taxa, Taxon - yeah, I got this wrong for the BioJava code. +1 
> for changing plural to singular
>
> pps I have mySQL schemas for taxon support if you are interested 
> but it's not rocket science - one table for taxon instance, 
> properties & pairent pointer, and one table for taxon_bioentry 
> associations
>
> Hilmar Lapp wrote:
>> Hmm. I could take zero response as consent to all changes 
>> proposed -- would be great actually. To make double sure those who 
>> might have wanted to see this proposal did see it I'm resending 
>> and cross-posting it ... Shout now if you're concerned or don't 
>> complain later.
>> Also, if this is the first time you see this and it is important 
>> to you, you're probably not on the biosql mailing list although 
>> you should be. Please subscribe (see footer for list info) since 
>> this is the last time I'm going to cross-post.
>>     -hilmar
>> Begin forwarded message:
>>> From: "Hilmar Lapp" <hlapp@gnf.org>
>>> Date: Wed Aug 28, 2002  02:23:41  PM US/Pacific
>>> To: "Biosql" <biosql-l@open-bio.org>
>>> Subject: [BioSQL-l] Taxa
>>>
>>> I propose to make the following changes to tables Taxa and
>>> Bioentry_Taxa, ranked by decreasing priority (still, I'd like to
>>> make all of these changes).
>>>
>>> 1) Introduce UK constraints. This makes it much safer (and
>>> potentially faster due to amenability to caching) to find records).
>>> My suggestions for UK candidates are common_name, full_binomial (see
>>> 2.), and ncbi_taxa_id. (There are no UKs yet.)
>>>
>>> 2) Introduce an additional column full_binomial ('full' in the
>>> bioperl Bio::Species sense: with ssp if applicable). I think this is
>>> going to be searched against quite frequently, and I just don't
>>> think it's very practical to always extract this out of full_lineage.
>>>
>>> 3) Collapse Taxa and Bioentry_Taxa into one and add Taxa_Id as a
>>> nullable FK to Bioentry. Having this additional association table
>>> doesn't add any functionality we'd like to use, but instead can only
>>> decrease performance and enforceability. Also, it necessitates
>>> otherwise pointless code in the adaptors.
>>>
>>> 4) For consistency, rename Taxa and all Taxa related columns to
>>> Taxon (Taxa is plural, all other table names are singular).
>>>
>>> What do people think? Any outcries? If I make these changes,
>>> adaptors in different projects will have to be changed (I'll change
>>> the ones involved in Bioperl-db).
>>>
>>>     -hilmar
>>> -- -------------------------------------------------------------
>>> Hilmar Lapp                            email: lapp at gnf.org
>>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>>> -------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>> -- -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>> _______________________________________________
>> Open-Bio-l mailing list
>> Open-Bio-l@open-bio.org
>> http://open-bio.org/mailman/listinfo/open-bio-l
>
>
> -- BioJava Consulting LTD - Support and training for BioJava
> http://www.biojava.co.uk
>
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
>
>
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------