[BioSQL-l] left_value and right_value in taxon table
aaron.j.mackey at gsk.com
aaron.j.mackey at gsk.com
Tue Apr 8 15:58:56 UTC 2008
I believe that the first thing the load_ncbi_taxonomy.pl script does is to
wipe out everything already in the table. So you're incremental update
strategy (with deferred left/right calculation) won't work.
depending on the type of update you're making (e.g. you only add one new
terminal taxonomic node, having no children), the incremental updates are
pretty fast, computationally speaking (no tree traversal is required). I
won't be able to recite them off the top of my head, but Joe Celko's "SQL
For Smarties" book has the necessary code. In a nutshell, it's something
like if the overall topology of the tree remains unchanged, you'll need to
increment the right/left values of each node "to the right" of the new
node you've inserted by 2, but it's a tiny bit more complicated than that.
-Aaron
biosql-l-bounces at lists.open-bio.org wrote on 04/08/2008 11:24:41 AM:
> > > Dear all,
> > >
> > > I hope that I am not the 100th persons asking the following
questions:
> > > 1) what are left and right values in the taxon table for?
> > >
> >
> > they hold the nested set values. Nested sets are enumeration
algorithm
> > described in Joe Celko's SQL for Smarties books, and Aaron Mackey
gives a
> > good introduction here:
> >
> > http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html
> >
> > (This is in the schema DDL file, though obviously should be
documented
> > better. Good candidate for an FAQ, I suppose.)
>
> That link does a good job of explaining the idea.
>
> > > 2) How are they computed
> >
> > load_ncbi_taxonomy.pl recomputes them automatically after each
update. It's
> > a simple recursive depth-first graph traversal algorithm.
>
> I have the impression the recomputation is slow, and also moderately
> complex. This is fine for a weekly (or even daily) update which runs
> the load_ncbi_taxonomy.pl script.
>
> We (Biopython) are interested in incremental updates triggered when a
> new sequences is added to the database with a novel taxon id. Eric is
> looking at downloading the missing taxon data and updating the
> taxon/taxon_name tables "on the fly", transparently to the user.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 (Biopython bug)
>
> Hilmar, am I right in thinking the following: Suppose when loading a
> new sequence into the database with a novel NCBI taxon, we record a
> new minimal taxon/taxon_names entry (without the lineage, a single
> taxon entry with null left/right entries). If the user then runs
> load_ncbi_taxonomy.pl, assuming the NCBI's online database contains
> the new taxon, will this update nicely? i.e. When the new sequence is
> retrieved from the database, its full lineage will be available.
>
> Thanks
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>
More information about the BioSQL-l
mailing list