[Biojava-l] Issue with SimpleNCBITaxon class

Deepak Sheoran sheoran143 at gmail.com
Fri Apr 16 18:43:59 UTC 2010

What my experience says on this issue we should make use of taxon_id 
because its a unique key in a local instance of biosql.
ncbi_taxon_id should only be used for mapping purpose only so that a 
person can map his local taxon_id to a ncbi_taxon_id otherwise it defeat 
the sole purpose of having taxon_id as primary key in taxon table. The 
main goal which I think when biosql is designed is to make it 
independent of any other organization like genbank or NCBI but its a 
feature so that we can map a number(ncbi_taxon_id) given by a know 
authority to a local number (taxon_id).

Deepak Sheoran

On 4/15/2010 12:54 PM, Peter wrote:
> Hi,
> I've CC'd this to the BioSQL mailing list for cross project
> discussion.
> On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland  wrote:
>> Thanks Deepak.
>> I've had a look at the code and I believe its due to the
>> different ways in which BioJava and BioPerl load the
>> taxon table.
>> BioJava sets the ncbi_taxon_id and parent_taxon_id
>> columns based on the values from the NCBI taxonomy
>> file. The taxon_id column in BioJava is a meaningless
>> auto-generated value that is never used.
>> BioPerl however is generating taxon_id values and
>> linking them by setting parent_taxon_id to the
>> generated value. The parent value from the NCBI
>> taxonomy file is therefore replaced with the BioPerl
>> generated parent ID, meaning that instead of linking
>> from parent_taxon_id to ncbi_taxon_id as per BioJava,
>> the link is to taxon_id instead. (I'm basing this
>> comment on looking at load_ncbi_taxonomy.pl from
>> the BioSQL archives.)
> Note that old versions of load_ncbi_taxonomy.pl
> (which is part of BioSQL, not part of BioPerl) would
> set taxon_id equal to ncbi_taxon_id, see:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2470
> This may help explain the confusion.
>> I believe if you load the taxonomy table using BioJava,
>> you should see BioJava giving correct behaviour.
>> Likewise if you load it using BioPerl, BioPerl will
>> behave correctly. But if you load with one then query
>> with the other, you'll get incorrect results.
>> This sounds like a case for discussion on both lists -
>> a matter of standardisation between the two projects.
>> Not quickly/easily solvable for now.
> Its not just two projects (BioPerl&  BioJava) (grin).
> Its at least five projects (BioSQL itself plus BioRuby
> and Biopython).
> I'm not sure about BioRuby's implementation, but
> currently I think BioJava is the odd one out - BioPerl,
> Biopython, and the BioSQL's load_ncbi_taxonomy.pl
> all make entries in parent_taxon_id reference the
> automatically generated taxon_id (please correct
> me if I am wrong).
> My personal view is that bioperl-db is the reference
> implementation and should be followed in the event
> of any ambiguity within BioSQL. In this particular
> case, there is actually a BioSQL script to check
> against too (load_ncbi_taxonomy.pl).
> Hopefully Hilmar can give us an official verdict...
> Peter

More information about the Biojava-l mailing list