[BioPython] Concerns the update of BioSQL.taxon table

Tue Mar 25 14:54:57 UTC 2008

Dear all,

I moved to BioPython 1.45 and created a fresh BioSQL 1.0.0 database. Everything went smoothly except for one important point: the table 'taxon' defines the column ncbi_taxon_id with a unique index.

But currently, when a BioSeq is created, the lineage records are all inserted as found in the GenBank data. 

At insertion, there is no problem since insertion set NULL for all ncbi_taxon_id but for the species one, no duplicate keys are found. On the other hand, when I run my script to update the 'taxon' table, some ranks are the same (like family or class or order). I obtain then a 'duplicate entry on key 2' SQL error.

Before I did not have the problem because I did not have the ncbi_taxon_id associated to a UNIQUE index. Is this new in BioSQL 1.0.0? 

If the answer is YES then I guess that the reason behind is to avoid to repeat all ranks for each species but to define them once only. I understand that solution but then our BioPython INSERT of a new BioSeq is "incompatible" with this behavior.

Thus I wonder if we should:
a) remove the UNIQUE index on ncbi_taxon_id
or
b) rewrite the management of the 'taxon' table in BioPython with a control that we add records only for new rank, with a 'clever' parent linkage (then what about the right and left value fields?).

Please let me know,

Eric

      _____________________________________________________________________________ 
Envoyez avec Yahoo! Mail. Capacité de stockage illimitée pour vos emails. http://mail.yahoo.fr