[BioSQL-l] parent_taxon_id of a root node

Hilmar Lapp hlapp at gmx.net
Sat Nov 15 18:34:45 UTC 2008


Sorry Peter - it looks like this slipped my attention (Oct was crazy).  
Thanks for raising it again. I agree with you, this looks like a bug.  
Would you mind filing it?

It's possible that has secretly been assumed as policy and hence led  
to some people identifying the root node by equating parent and  
taxon_id, but surely this sounds like the wrong way of doing it, so it  
deserves fixing.

	-hilmar

On Nov 14, 2008, at 3:48 PM, Peter wrote:

> On Fri, Oct 3, 2008m, I wrote:
>>
>> Hello all,
>>
>> I was puzzled to find the BioSQL script load_ncbi_taxonomy.pl will  
>> set
>> the parent_taxon_id of the NCBI root node in the taxon table to point
>> to itself.  I would have expected this to be NULL indicating no
>> parent.  If someone is using the database directly, extracting a
>> lineage could trigger an infinite loop.  Can anyone explain the
>> rational here?
>>
>> Note that when Biopython adds entries to the taxon table, it uses  
>> NULL
>> for a root node.  When retrieving sequences from a BioSQL database,
>> Biopython does cope with a root node with a NULL parent or a
>> self-parent - would it safe to assume BioPerl and Java can also cope
>> with both situations?
>>
>> Thanks,
>>
>> Peter
>>
>
> Hi again,
>
> I thought I'd raise this question again (as I didn't see any response
> last time), as I've just been bitten by the self-parent taxon problem
> this afternoon.  This was for a simple webfront end to part of a
> BioSQL database using SQLAlchemy in python - but that's not important.
>
> I was using a simple loop to build up lineages, which was working fine
> until I ran load_ncbi_taxonomy.pl and suddenly my program seemed to
> just time out.  I'd forgotten about the self-parent root nodes used by
> load_ncbi_taxonomy.pl which had triggered an infinite loop.
>
> I hit another (less serious) problem stemming for these self-parent
> root nodes when I wanted to generate a list of sub-lineages (child
> entries), essentially:
>
> SELECT * FROM taxon WHERE parent_taxon_id=12345;
>
> When calling this on a root node, I had to modify this to explicitly
> exclude itself from the children:
>
> SELECT * FROM taxon WHERE parent_taxon_id=12345 AND taxon_id<>12345;
>
> So to repeat my earlier question, is there a reason why
> parent_taxon_id isn't just NULL for root nodes?  Was this a deliberate
> design choice - because if not, I think this could be regarded as a
> bug in  load_ncbi_taxonomy.pl.
>
> Thanks
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






More information about the BioSQL-l mailing list