[Biojava-l] Issue with SimpleNCBITaxon class

Deepak Sheoran sheoran143 at gmail.com
Sun Apr 11 21:08:22 UTC 2010


I am using same table with biojava and bioperl taxon program and the 
output I get is below:

*Biojava:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage 
i get is
             Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia 
australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum 
var. haydenii.

Biojava process of finding names: 
11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240   
(wrong way of doing things)

*Bioperl:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage 
i get is
           Retroviridae; Orthoretrovirinae; Alpharetrovirus; 
unclassified  Alpharetrovirus.

Bioperl process of finding names: 
11876==>353825==>153057==>327045==>11632   (Right way of doing things)

Hint: biojava search ncbi_taxon_id column with a value from 
parent_taxon_id where bioperl search taxon_id column with a value from 
parent_taxon_id.

*Taxon and Taxon_name Table content which is being relevant  in discussion:*

taxon_id 	ncbi_taxon_id 	parent_taxon_id 	node_rank 	name 	name_class
2901 	3609 	276240 	genus 	Rhamnus 	scientific name
3610 	4403 	3609 	species 	Platanus occidentalis 	scientific name
29052 	48579 	4403 	species 	Suillus placidus 	scientific name
114412 	143975 	48579 	species 	Diadasia australis 	scientific name
143976 	176516 	143975 	species 	Arnicastrum guerrerense 	scientific name
30680 	50447 	176516 	family 	Labiduridae 	scientific name
254757 	301952 	50447 	varietas 	Oreostemma alpigenum var. haydenii 
scientific name
9394 	11632 	17394 	family 	Retroviridae 	scientific name
277861 	327045 	9394 	subfamily 	Orthoretrovirinae 	scientific name
122448 	153057 	277861 	genus 	Alpharetrovirus 	scientific name
301952 	353825 	122448 	no rank 	unclassified Alpharetrovirus 
scientific name
9584
	11876
	301952
	species
	Avian sarcoma virus
	scientifice name


Thanks
Deepak

On 4/11/2010 2:53 PM, Richard Holland wrote:
> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>
> thanks,
> Richard
>
> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>
>    
>> Hi,
>>
>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>
>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>
>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>> <property name="nodeRank" column="node_rank"/>
>> <property name="geneticCode" column="genetic_code"/>
>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>> <property name="leftValue" column="left_value"/>
>> <property name="rightValue" column="right_value"/>
>> <property name="parentNCBITaxID" column="parent_taxon_id"/>       ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>
>> Thanks
>> Deepak Sheoran
>>
>>
>>      
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>    




More information about the Biojava-l mailing list