[Biojava-l] A bug in Class "org.biojavax.bio.seq.io.GenbankFormat"

Martin Jones martin.jones at ed.ac.uk
Fri Apr 2 11:23:21 UTC 2010


You can also get the hierarchy directly from the NCBI taxonomy dump...
this is in Groovy but gives you the idea:

HashMap<Integer, TreeNode> taxid2node = [:]
HashMap<Integer, Integer> child2parent = [:]

def nodePattern = ~/^(\d+)\t\|\t(\d+)\t\|\t(.+?)\t\|/


def count=0
new File("/home/martin/nodes.dmp").eachLine{
   line ->
   count++
   def matcher = (line =~ nodePattern)
   if (matcher.matches()){
         Integer myId = matcher[0][1].toInteger()
         Integer parentId = matcher[0][2].toInteger()
         String myRank = matcher[0][3]

         def node = new TreeNode(taxid : myId, rank:myRank)
         taxid2node[(myId)] = node

         child2parent[(myId)] = parentId

    }
}
// do something with the hash



-Martin



On 2 April 2010 08:38, Richard Holland <holland at eaglegenomics.com> wrote:
> The parsers don't load the hiearachy from Genbank because it is redundant information separately available from NCBI taxonomy. Also it tends to be buggy and can differ between Genbank files for the same organism.
>
> If you want the hierarchy. you need to be using BioJava in conjunction with BioSQL and load the NCBI taxonomy into your BioSQL instance ( http://www.biojava.org/wiki/BioJava:BioJavaXDocs#NCBI_Taxonomy_data ), from where BioJava can then retrieve it using the sample code you show in your email.
>
> thanks,
> Richard
>
> On 2 Apr 2010, at 04:02, Huijie Qiao wrote:
>
>> version 1.7.1
>>
>> line 361
>> else if (sectionKey.equals(SOURCE_TAG)) {
>>      // ignore - can get all this from the first feature
>>
>> actually the content in the SOURCE_TAG and the first feature are different
>> in some gb file.
>>
>> For example, the example file in
>> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb
>>
>> The Source TAG is
>> SOURCE      Bos taurus (cattle)
>>  ORGANISM  Bos taurus
>>            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>> Euteleostomi;
>>            Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia;
>>            Pecora; Bovidae; Bovinae; Bos.
>>
>> and the first feature tag is
>> FEATURES             Location/Qualifiers
>>     source          1..1136
>>                     /organism="Bos taurus"
>>                     /mol_type="mRNA"
>>                     /db_xref="taxon:9913"
>>                     /clone="pBB2I"
>>                     /tissue_type="liver"
>>
>> I can't get the hierarchy info through the follow codes.
>> NCBITaxon taxon = seq.getTaxon();
>> System.out.println(taxon.getNameHierarchy()); output is "."
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>




More information about the Biojava-l mailing list