[BioSQL-l] parent_taxon_id of a root node

Peter biopython at maubp.freeserve.co.uk
Fri Nov 14 20:48:02 UTC 2008


On Fri, Oct 3, 2008m, I wrote:
>
> Hello all,
>
> I was puzzled to find the BioSQL script load_ncbi_taxonomy.pl will set
> the parent_taxon_id of the NCBI root node in the taxon table to point
> to itself.  I would have expected this to be NULL indicating no
> parent.  If someone is using the database directly, extracting a
> lineage could trigger an infinite loop.  Can anyone explain the
> rational here?
>
> Note that when Biopython adds entries to the taxon table, it uses NULL
> for a root node.  When retrieving sequences from a BioSQL database,
> Biopython does cope with a root node with a NULL parent or a
> self-parent - would it safe to assume BioPerl and Java can also cope
> with both situations?
>
> Thanks,
>
> Peter
>

Hi again,

I thought I'd raise this question again (as I didn't see any response
last time), as I've just been bitten by the self-parent taxon problem
this afternoon.  This was for a simple webfront end to part of a
BioSQL database using SQLAlchemy in python - but that's not important.

I was using a simple loop to build up lineages, which was working fine
until I ran load_ncbi_taxonomy.pl and suddenly my program seemed to
just time out.  I'd forgotten about the self-parent root nodes used by
load_ncbi_taxonomy.pl which had triggered an infinite loop.

I hit another (less serious) problem stemming for these self-parent
root nodes when I wanted to generate a list of sub-lineages (child
entries), essentially:

SELECT * FROM taxon WHERE parent_taxon_id=12345;

When calling this on a root node, I had to modify this to explicitly
exclude itself from the children:

SELECT * FROM taxon WHERE parent_taxon_id=12345 AND taxon_id<>12345;

So to repeat my earlier question, is there a reason why
parent_taxon_id isn't just NULL for root nodes?  Was this a deliberate
design choice - because if not, I think this could be regarded as a
bug in  load_ncbi_taxonomy.pl.

Thanks

Peter



More information about the BioSQL-l mailing list