[BioSQL-l] alternative taxonomic hierarchies in BioSQL?

Bánk Beszteri Bank.Beszteri at awi.de
Fri Dec 5 10:23:20 UTC 2008


Hi Hilmar & Peter,

so it looks like using PhyloDB is probably the way to explore further 
for this, I somehow missed that point (I limited myself to think within 
the biosql core tables).
Thanks for this direction and all the ideas / insights!

Bank

Hilmar Lapp schrieb:
>
> On Dec 4, 2008, at 12:06 PM, Peter wrote:
>
>> On Thu, Dec 4, 2008 at 4:28 PM, Bánk Beszteri <Bank.Beszteri at awi.de> 
>> wrote:
>>>
>>> Dear BioSQLers,
>>>
>>> do I understand right that the current BioSQL schema allows for a 
>>> single
>>> taxonomy per database only?
>>
>> Not quite.  If you ignore that fact that the taxon table's external
>> taxonomy ID is explicitly labelled as the ncbi_taxon_id, you could
>> store any taxonomy in the taxon and taxon_name tables.  You could even
>> have multiple independent taxonomies in these tables.
>
> Right. Though it's certainly ugly to call something a ncbi_taxon_id 
> when really it is a ITIS ID, for example.
>
> Aside from that, the load_ncbi_taxonomy.pl script that comes with 
> BioSQL can't really deal with other taxonomies being stored in the 
> taxon tables, too. First, it will consider all nodes that it can't 
> find in NCBI (by ID) as having been obsoleted and will delete them, 
> and even if it somehow failed to do that, it would fail to compute the 
> nested set enumeration for all other taxonomies.
>
> Changing that would basically require namespacing taxon nodes. Though 
> it's an option, it has increasingly struck me as a duplication of what 
> the PhyloDB module provides already (see other comments below), so I 
> am actually less and less in favor of it.
>
> I think the appropriate way to look at the taxon tables is as the 
> reference taxonomy for bioentries (and so calling the identifier 
> ncbi_taxon_id is still bad as it prescribes the NCBI taxonomy as the 
> reference). In this context:
>
>> However, each bioentry can only point to one taxon entry (and thus
>> belongs to only one taxonomy), which is a big limitation.
>
> This is well motivated in biological applications and current object 
> models. I'm not sure about the other Bio* toolkits, but BioPerl for 
> example doesn't support multiple species objects for a sequence.
>
>> It would be useful to have a bioentry point to multiple taxon entries
>> (and thus multiple taxonomies, e.g. NCBI and ITIS), which might
>> require some sort of link table between the taxon and bioentry tables.
>
> Note that the PhyloDB module supports this. Nodes in a tree (or 
> taxonomy) can be associated with one or more bioentries (and, in fact, 
> reference taxon nodes).
>
>> [...]
>>> When looking into the tables taxon and taxon_name, it looks like 
>>> neither
>>> taxa nor their neighborhood relationships can belong to different 
>>> taxonomies.
>>> Is this correct, or am I missing something?
>>
>> True - but why would you want to interlink taxon entries like that?
>
> There may be use-cases for this. For example, to relate taxa named 
> differently between two taxonomies but that really are synonymous. Or 
> one taxonomy containing a synonym that the other doesn't.
>
> Not your molecular sequence database/analysis type of thing, sure. But 
> still legitimate.
>
>>
>>
>>> If this is so: are there any plans to add such a feature in the 
>>> future? I
>>> think (besides that I could use it) it could probably be useful for 
>>> others
>>> as well (to have the possibility to e.g. have an ITIS taxonomy
>
> Note that the svn / main trunk version of BioSQL has a script 
> load_itis_taxonomy.pl. It loads it into the PhyloDB module, though. 
> ITIS isn't a single tree but actually 5; there is no common root. So 
> it ends up as 5 trees in the PhyloDB tables.
>
>>> or just a user?s own private taxonomy parallel to NCBI taxonomy in a 
>>> single BioSQL
>>> DB).
>
> Yeah; I've been wanting to write a general taxonomy loader, or more 
> precisely a loader that utilizes Bio::TreeIO for reading. Just haven't 
> had time around to do that. (Need another hackathon :-)
>
>> [...] I think the issue has been raised before on the mailing list, 
>> and IIRC
>> it was agreed that there was room for improvement.  Maybe this is
>> something for BioSQL v1.1.x?
>
> Fixing the ncbi_taxon_id column name definitely. As for letting the 
> taxon tables duplicate the same capabilities as the PhyloDB tables, 
> I'm not sure that that's the best route to go.
>
>     -hilmar
>




More information about the BioSQL-l mailing list