[Biojava-l] Issue with SimpleNCBITaxon class

Deepak Sheoran sheoran143 at gmail.com
Sun Apr 11 22:48:00 UTC 2010


If we don't want to change the current code in biojava and still want to 
fix this bug I have found a way,
1) we can do this by changing one of hibernate files called 
"Taxon.hbm.xml" and replace the line
<property name="parentNCBITaxID" column="parent_taxon_id"/>
     with
<property name="parentNCBITaxID" formula="(select tax.ncbi_taxon_id from 
taxon tax where tax.taxon_id = parent_taxon_id)"/>

by changing the above setting in hibernate setting I am able to get the 
correct linage for ncbi_taxon_id = 11876(Avian sarcoma virus) which is
              Viruses; Retro-transcribing viruses; Retroviridae; 
Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus.

2) But the possible issue which we might get is with Taxonomy loader 
class which want to insert something for parent taxon_id into taxon 
table which  I think won't be possible if we do this change to hibernate 
con-fig file.

Deepak Sheoran


On 4/11/2010 4:08 PM, Deepak Sheoran wrote:
> I am using same table with biojava and bioperl taxon program and the 
> output I get is below:
>
> *Biojava:*
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the 
> lineage i get is
>             Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia 
> australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum 
> var. haydenii.
>
> Biojava process of finding names: 
> 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240   
> (wrong way of doing things)
>
> *Bioperl:*
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the 
> lineage i get is
>           Retroviridae; Orthoretrovirinae; Alpharetrovirus; 
> unclassified  Alpharetrovirus.
>
> Bioperl process of finding names: 
> 11876==>353825==>153057==>327045==>11632   (Right way of doing things)
>
> Hint: biojava search ncbi_taxon_id column with a value from 
> parent_taxon_id where bioperl search taxon_id column with a value from 
> parent_taxon_id.
>
> *Taxon and Taxon_name Table content which is being relevant  in 
> discussion:*
>
> taxon_id 	ncbi_taxon_id 	parent_taxon_id 	node_rank 	name 	name_class
> 2901 	3609 	276240 	genus 	Rhamnus 	scientific name
> 3610 	4403 	3609 	species 	Platanus occidentalis 	scientific name
> 29052 	48579 	4403 	species 	Suillus placidus 	scientific name
> 114412 	143975 	48579 	species 	Diadasia australis 	scientific name
> 143976 	176516 	143975 	species 	Arnicastrum guerrerense 	scientific name
> 30680 	50447 	176516 	family 	Labiduridae 	scientific name
> 254757 	301952 	50447 	varietas 	Oreostemma alpigenum var. haydenii 
> scientific name
> 9394 	11632 	17394 	family 	Retroviridae 	scientific name
> 277861 	327045 	9394 	subfamily 	Orthoretrovirinae 	scientific name
> 122448 	153057 	277861 	genus 	Alpharetrovirus 	scientific name
> 301952 	353825 	122448 	no rank 	unclassified Alpharetrovirus 
> scientific name
> 9584
> 	11876
> 	301952
> 	species
> 	Avian sarcoma virus
> 	scientifice name
>
>
> Thanks
> Deepak
>
> On 4/11/2010 2:53 PM, Richard Holland wrote:
>> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>>
>> thanks,
>> Richard
>>
>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>>
>>    
>>> Hi,
>>>
>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>>
>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>>
>>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>>> <property name="nodeRank" column="node_rank"/>
>>> <property name="geneticCode" column="genetic_code"/>
>>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>>> <property name="leftValue" column="left_value"/>
>>> <property name="rightValue" column="right_value"/>
>>> <property name="parentNCBITaxID" column="parent_taxon_id"/>       ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>>
>>> Thanks
>>> Deepak Sheoran
>>>
>>>
>>>      
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>    
>




More information about the Biojava-l mailing list