[Biojava-l] A bug in Class "org.biojavax.bio.seq.io.GenbankFormat"

Richard Holland holland at eaglegenomics.com
Fri Apr 2 07:38:44 UTC 2010


The parsers don't load the hiearachy from Genbank because it is redundant information separately available from NCBI taxonomy. Also it tends to be buggy and can differ between Genbank files for the same organism. 

If you want the hierarchy. you need to be using BioJava in conjunction with BioSQL and load the NCBI taxonomy into your BioSQL instance ( http://www.biojava.org/wiki/BioJava:BioJavaXDocs#NCBI_Taxonomy_data ), from where BioJava can then retrieve it using the sample code you show in your email.

thanks,
Richard

On 2 Apr 2010, at 04:02, Huijie Qiao wrote:

> version 1.7.1
> 
> line 361
> else if (sectionKey.equals(SOURCE_TAG)) {
>      // ignore - can get all this from the first feature
> 
> actually the content in the SOURCE_TAG and the first feature are different
> in some gb file.
> 
> For example, the example file in
> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb
> 
> The Source TAG is
> SOURCE      Bos taurus (cattle)
>  ORGANISM  Bos taurus
>            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
> Euteleostomi;
>            Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia;
>            Pecora; Bovidae; Bovinae; Bos.
> 
> and the first feature tag is
> FEATURES             Location/Qualifiers
>     source          1..1136
>                     /organism="Bos taurus"
>                     /mol_type="mRNA"
>                     /db_xref="taxon:9913"
>                     /clone="pBB2I"
>                     /tissue_type="liver"
> 
> I can't get the hierarchy info through the follow codes.
> NCBITaxon taxon = seq.getTaxon();
> System.out.println(taxon.getNameHierarchy()); output is "."
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/





More information about the Biojava-l mailing list