[BioSQL-l] The taxon table

Yves Bastide Yves.Bastide at irisa.fr
Fri Jun 27 17:37:42 EDT 2003


Hilmar Lapp wrote:
> 
> On Thursday, June 26, 2003, at 02:25  AM, Yves Bastide wrote:
> 
>> Hilmar or Aaron, care to explain how they work? (I would have just 
>> experimented, but already don't understand the examples you Hilmar 
>> posted on June, 1st)
>>
> 
> Hm - what did I post on June 1?

An extract of the table.  What I thought was that left and right were 
related to the taxon id

> 
> left_value and right_value implement the nested set solution as proposed 
> by Joe Celko. Google for Joe Celko Nested Set and you'll find something. 
> Also, there is his appraised book 'SQL for smarties' where it's 
> described. Aaron also wrote that up applied to taxon trees in his 
> O'ReillyNet article at
> 
>     http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html
> 
> Hope this helps,

Yep, thanks.  Jeez, and I've been saying I knew relational databases all 
these years :-)

The link to Aaron's article would be a nice addition; something like the 
attached patch.

A few more things:

1. if taxon_id is manually set (as in load_ncbi_taxonomy.pl), the 
sequence must be updated too.

2. on my postgres installation, --chunksize=40000 and --chunksize=0 give 
the same performances (15 min on a P4 2Ghz).  Perhaps because I set 
max_fsm_relations to 1000 instead of 100?

3. I've hacked a crude load_tax.py script. It's postgresql specific 
(using COPY FROM), meant to be used for empty taxon* tables, and works 
in about 3-4 min.  Is anyone interested?

> 
>     -hilmar
> 

Regards,

yves
-------------- next part --------------
Index: biosqldb-pg.sql
===================================================================
RCS file: /home/repository/biosql/biosql-schema/sql/biosqldb-pg.sql,v
retrieving revision 1.26
diff -u -p -r1.26 biosqldb-pg.sql
--- biosqldb-pg.sql	2003/06/05 03:15:14	1.26
+++ biosqldb-pg.sql	2003/06/27 12:36:24
@@ -35,7 +35,10 @@ CREATE INDEX db_auth on biodatabase ( au
 -- an optional extra line, as many flat file formats do not have the NCBI id 
 -- 
 -- no organelle/sub species 
--- corresponds to the node table of the NCBI taxonomy databaase 
+-- corresponds to the node table of the NCBI taxonomy database 
+-- left_value, right_value implement a nested sets model;
+-- see http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html
+-- or Joe Celko's 'SQL for smarties' for more information.
 CREATE SEQUENCE taxon_pk_seq;
 CREATE TABLE taxon ( 
 	 taxon_id INTEGER DEFAULT nextval ( 'taxon_pk_seq' ) NOT NULL , 


More information about the BioSQL-l mailing list