[Biojava-l] Parsing Genbank/EMBL/XML Sequences from binary NCBI ASN.1 daily update files

Seth Johnson johnson.biotech at gmail.com
Mon Jun 5 14:22:57 UTC 2006


I apologize again for not posting the stacktrace. Here it is:
==========================
org.biojava.bio.BioException: Could not read sequence
        at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:112)
        at exonhit.parsers.GenBankParser.main(GenBankParser.java:347)
Caused by: java.lang.NullPointerException
        at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addFeatureProperty(SimpleRichSequenceBuilder.java:356)
        at org.biojavax.bio.seq.io.INSDseqFormat$INSDseqHandler.endElement(INSDseqFormat.java:853)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:633)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1241)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1685)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
        at org.biojavax.utils.XMLTools.readXMLChunk(XMLTools.java:97)
        at org.biojavax.bio.seq.io.INSDseqFormat.readRichSequence(INSDseqFormat.java:246)
        at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109)
        ... 1 more
Java Result: -1
============================
Here's the XML that causes that exception (taken out of a bigger file
of several hundred sequences):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<INSDSeq>
    <INSDSeq_locus>DQ485973</INSDSeq_locus>
    <INSDSeq_length>1356</INSDSeq_length>
    <INSDSeq_moltype>DNA</INSDSeq_moltype>
    <INSDSeq_topology>linear</INSDSeq_topology>
    <INSDSeq_division>ENV</INSDSeq_division>
    <INSDSeq_update-date>08-MAY-2006</INSDSeq_update-date>
    <INSDSeq_create-date>08-MAY-2006</INSDSeq_create-date>
    <INSDSeq_definition>Uncultured Mollicutes bacterium clone P7 16S
ribosomal RNA gene, partial sequence</INSDSeq_definition>
    <INSDSeq_primary-accession>DQ485973</INSDSeq_primary-accession>
    <INSDSeq_accession-version>DQ485973.1</INSDSeq_accession-version>
    <INSDSeq_other-seqids>
      <INSDSeqid>gb|DQ485973.1|</INSDSeqid>
      <INSDSeqid>gi|94482885</INSDSeqid>
    </INSDSeq_other-seqids>
    <INSDSeq_keywords>
      <INSDKeyword>ENV</INSDKeyword>
    </INSDSeq_keywords>
    <INSDSeq_source>uncultured Mollicutes bacterium</INSDSeq_source>
    <INSDSeq_organism>uncultured Mollicutes bacterium</INSDSeq_organism>
    <INSDSeq_taxonomy>Bacteria; Firmicutes; Mollicutes; environmental
samples</INSDSeq_taxonomy>
    <INSDSeq_references>
      <INSDReference>
        <INSDReference_reference>1 (bases 1 to 1356)</INSDReference_reference>
        <INSDReference_position>1..1356</INSDReference_position>
        <INSDReference_authors>
          <INSDAuthor>Kostanjsek,R.</INSDAuthor>
          <INSDAuthor>Strus,J.</INSDAuthor>
          <INSDAuthor>Avgustin,G.</INSDAuthor>
        </INSDReference_authors>
        <INSDReference_title>A novel lineage of Mollicutes associated
with the hindgut wall of the terrestrial isopod Porcellio scaber
(Crustacea: Isopoda)</INSDReference_title>
        <INSDReference_journal>Unpublished</INSDReference_journal>
      </INSDReference>
      <INSDReference>
        <INSDReference_reference>2 (bases 1 to 1356)</INSDReference_reference>
        <INSDReference_position>1..1356</INSDReference_position>
        <INSDReference_authors>
          <INSDAuthor>Kostanjsek,R.</INSDAuthor>
          <INSDAuthor>Strus,J.</INSDAuthor>
          <INSDAuthor>Avgustin,G.</INSDAuthor>
        </INSDReference_authors>
        <INSDReference_title>Direct Submission</INSDReference_title>
        <INSDReference_journal>Submitted (07-APR-2006) Department of
Biology, Biotechnical Faculty, University of Ljubljana, Vecna Pot 111,
Ljubljana 1000, Slovenia</INSDReference_journal>
      </INSDReference>
    </INSDSeq_references>
    <INSDSeq_feature-table>
      <INSDFeature>
        <INSDFeature_key>source</INSDFeature_key>
        <INSDFeature_location>1..1356</INSDFeature_location>
        <INSDFeature_intervals>
          <INSDInterval>
            <INSDInterval_from>1</INSDInterval_from>
            <INSDInterval_to>1356</INSDInterval_to>
            <INSDInterval_accession>DQ485973.1</INSDInterval_accession>
          </INSDInterval>
        </INSDFeature_intervals>
        <INSDFeature_quals>
          <INSDQualifier>
            <INSDQualifier_name>organism</INSDQualifier_name>
            <INSDQualifier_value>uncultured Mollicutes
bacterium</INSDQualifier_value>
          </INSDQualifier>
          <INSDQualifier>
            <INSDQualifier_name>mol_type</INSDQualifier_name>
            <INSDQualifier_value>genomic DNA</INSDQualifier_value>
          </INSDQualifier>
          <INSDQualifier>
            <INSDQualifier_name>isolation_source</INSDQualifier_name>
            <INSDQualifier_value>isopod gut</INSDQualifier_value>
          </INSDQualifier>
          <INSDQualifier>
            <INSDQualifier_name>specific_host</INSDQualifier_name>
            <INSDQualifier_value>Porcellio scaber</INSDQualifier_value>
          </INSDQualifier>
          <INSDQualifier>
            <INSDQualifier_name>db_xref</INSDQualifier_name>
            <INSDQualifier_value>taxon:220137</INSDQualifier_value>
          </INSDQualifier>
          <INSDQualifier>
            <INSDQualifier_name>clone</INSDQualifier_name>
            <INSDQualifier_value>P7</INSDQualifier_value>
          </INSDQualifier>
          <INSDQualifier>
            <INSDQualifier_name>environmental_sample</INSDQualifier_name>
          </INSDQualifier>
        </INSDFeature_quals>
      </INSDFeature>
      <INSDFeature>
        <INSDFeature_key>rRNA</INSDFeature_key>
        <INSDFeature_location>&lt;1..&gt;1356</INSDFeature_location>
        <INSDFeature_intervals>
          <INSDInterval>
            <INSDInterval_from>1</INSDInterval_from>
            <INSDInterval_to>1356</INSDInterval_to>
            <INSDInterval_accession>DQ485973.1</INSDInterval_accession>
          </INSDInterval>
        </INSDFeature_intervals>
        <INSDFeature_partial5 value="true"/>
        <INSDFeature_partial3 value="true"/>
        <INSDFeature_quals>
          <INSDQualifier>
            <INSDQualifier_name>product</INSDQualifier_name>
            <INSDQualifier_value>16S ribosomal RNA</INSDQualifier_value>
          </INSDQualifier>
        </INSDFeature_quals>
      </INSDFeature>
    </INSDSeq_feature-table>
    <INSDSeq_sequence>AACGCTGGCGGCATGCCTAATACATGCAAGTCGAACGAACTGCCCCTGAACTAAAAGAAGTGCTTGCACGGAAGTTAGGGACGGAATTTGCAGTTAGTGGCGAACGGGTGAGTAACACGTGGGTAACCTACCATAGAGATTGGGATAACTGTTGGAAACGACAGCTAAAACCGAATAAGATTAATTCTACAAAGAGGAATAATTTAAATAGGCGTTTGCCTAGCTTTATGATGGGCCCGCGGTGCATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATGCATAGCCGGACTGAGAGGTTGAACGGCCACATTGGGACTGAGACACGGCCCAGACAACTACGGTTGGCAGCAGTAGGGAATTTTTCGCAATGGACGAAAGTCTGACGGAGCAATGCCGCGTGAGTGAAGACGGTTTTCGGATTGTAAAACTCTGTTGTGTGGGGGGAACACCTATATGAGAGGAATTGCTCATTAATTGACGCCACCACACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCGAGCGTTTTCCGGAATTATTGGGCGTAAAGAGCGTGTAGGCGGGTATGAATAAGTCTGGTGTGAAATCTAAGTGGCTCAACCACTTAAATTGCATTGGAAACTGCCAAACTAGAATACGGAGGGGTAAGTGGAATTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAGGGACACCAATGGCGAAGGCAGCTTAATGGACCCGAGATTGACGCTGAGACGCGAAAGCTTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTTAAACGATGAGTGCTAGGTATTGGATTAATTTCAGTGCCCGGAGTTAACGCATTAAGCCCTCCGCCTGAGGAGTACGGTCGCAAGGCTGAAACTCAAAGGAATTGACGGGGACCCGCACAAGTGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCAAAACTTGACATCCCCTGCGAAGCTATAGAAGTATAGTGGAGGTTATCAGGGTGACAGATGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTAGGTTAAGTCCTGCAACGAGCGCAACCCCTGTCTGCAGTTGCTACCATTAAGTTGAGGACTCTGCAGAGACTGCTAGTGTAAGCTAGAGGAAGGTGGGGATGACGTCAAATCATCATGCCTCTTACGTTTTGGGCTACACACGTGCTACAATGGCTGATACAAAGGGCTGCGAACTCGCGAGAGTAAGCGAATCCCAAAAAGTCAGTCTAAGTTCGGATTGAAGTTCTGCAACTCGACTTTCATGAAGTCGGAATGCNCTAGTAATACG</INSDSeq_sequence>
  </INSDSeq>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On 6/5/06, Richard Holland <richard.holland at ebi.ac.uk> wrote:
> This one should be fixed in CVS now. Typo on my behalf - I put in code
> to make it work with both 87+ and pre-87 version of EMBL, then got the
> regexes the wrong way round!!
>
> Could you send the full stacktrace for the INSDseq format problem you're
> having? (The one where you say you've tracked it down to the qualifier
> value being missing). I can't see anything wrong there, so I need the
> stacktrace in order to know which exact sequence of events is throwing
> the exception.
>
> cheers,
> Richard
>
>
> On Fri, 2006-06-02 at 13:04 -0400, Seth Johnson wrote:
> > Hi Richard,
> >
> > I made sure I have the latest source code from CVS compiled
> > (EMBLFormat.java & GenbankFormat.java are from 05/24/06).  I'm happy
> > to report that GenBank issue is solved!!!!
> > As far as EMBL parsing, I apologize for not providing the stack dump
> > for ISSUE #1.  Here's the dump of the exception:
> > --------------------------------------------------------
> > org.biojava.bio.BioException: Could not read sequence
> >         at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:112)
> >         at exonhit.parsers.GenBankParser.main(GenBankParser.java:359)
> > Caused by: java.lang.NumberFormatException: null
> >         at java.lang.Integer.parseInt(Integer.java:415)
> >         at java.lang.Integer.parseInt(Integer.java:497)
> >         at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:299)
> >         at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109)
> >         ... 1 more
> > Java Result: -1
> > -------------------------------------------------------
> > Here, again, is the code that I'm using to to parse:
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >         BufferedReader gbBR = null;
> >         try {
> >             gbBR = new BufferedReader(new
> > FileReader("C:\\Download\\ASN2BSML\\seth_06_02.emb"));
> >         } catch (FileNotFoundException fnfe) {
> >             fnfe.printStackTrace();
> >             System.exit(-1);
> >         }
> >         Namespace gbNspace = (Namespace)
> > RichObjectFactory.getObject(SimpleNamespace.class, new
> > Object[]{"gbSpace"} );
> >         RichSequenceIterator gbSeqs =
> > RichSequence.IOTools.readEMBLDNA(gbBR,gbNspace);
> >         while (gbSeqs.hasNext()) {
> >             try {
> >                 RichSequence rs = gbSeqs.nextRichSequence();
> >                 NCBITaxon myTaxon = rs.getTaxon();
> >             }catch (BioException be){
> >                 be.printStackTrace();
> >                 System.exit(-1);
> >             }
> >         }
> > ~~~~~~~~~~~~~~~~~~~~~~~~~
> > And here's the EMBL file that I'm trying to parse:
> > +++++++++++++++++++++++++
> > ID   DQ472184  standard; DNA; INV; 546 BP.
> > XX
> > AC   DQ472184;
> > XX
> > SV   DQ472184.1
> > DT   15-MAY-2006
> > XX
> > DE   Trypanosoma cruzi strain CL Brener actin-related protein 3 (ARC21) gene,
> > DE   complete cds.
> > XX
> > KW   .
> > XX
> > OS   Trypanosoma cruzi strain CL Brener
> > OC   Eukaryota; Euglenozoa; Kinetoplastida; Trypanosomatidae; Trypanosoma;
> > OC   Schizotrypanum.
> > XX
> > RN   [1]
> > RP   1-546
> > RA   De Melo L.D.B.;
> > RT   "Actin of Trypanosoma cruzi: ubiquitous actin-binding proteins";
> > RL   Unpublished.
> > XX
> > RN   [2]
> > RP   1-546
> > RA   De Melo L.D.B.;
> > RT   ;
> > RL   Submitted (03-APR-2006) to the EMBL/GenBank/DDBJ databases.
> > RL   Instituto de Biofisica Carlos Chagas Filho, Universidade Federal do Rio
> > RL   de Janeiro, Cidade Universitaria, CCS, Bl.G, Sl.G157, Rio de Janeiro, RJ
> > RL   21949-900, Brazil
> > XX
> > FH   Key             Location/Qualifiers
> > FH
> > FT   source          1..546
> > FT                   /organism="Trypanosoma cruzi strain CL Brener"
> > FT                   /mol_type="genomic DNA"
> > FT                   /strain="CL Brener"
> > FT                   /db_xref="taxon:353153"
> > FT   gene            <1..>546
> > FT                   /gene="ARC21"
> > FT                   /note="TcARC21"
> > FT   mRNA            <1..>546
> > FT                   /gene="ARC21"
> > FT                   /product="actin-related protein 3"
> > FT   CDS             1..546
> > FT                   /gene="ARC21"
> > FT                   /note="actin-binding protein; ARPC3 21 kDa; putative
> > FT                   member of Arp2/3 complex"
> > FT                   /codon_start=1
> > FT                   /product="actin-related protein 3"
> > FT                   /protein_id="ABF13401.1"
> > FT                   /db_xref="GI:93360014"
> > FT                   /translation="MHSRWNGYEESSLLGCGVYPLRRTSRLTPPGPAPRMDEMIEEG
> > FT                   EEEPQDIVDEAFYFFKPHMFFRNFPIKGAGDRVILYLTMYLHECLKKIVQLKREEAH
> > FT                   SVLLNYATMPFASPGEKDFPFNAFFPAGNEEEQEKWREYAKQLRLEANARLIEKVFL
> > FT                   FPEKDGTGNKFWMAFAKRPFLASS"
> >      atgcacagca ggtggaatgg gtatgaagaa agtagtcttt tgggctgcgg tgtttatccg        60
> >      cttcgccgca cgtcacggct cactccaccc ggccctgcac cgcggatgga tgaaatgatt       120
> >      gaggagggcg aagaggagcc acaagacatt gttgacgagg cattttactt ttttaagccc       180
> >      cacatgtttt ttcgtaattt tcccattaag ggtgctggtg atcgtgtcat tctgtacttg       240
> >      acgatgtacc ttcatgagtg tttgaagaaa attgtccagt tgaagcgtga agaggcccat       300
> >      tctgtgcttc ttaactacgc tacgatgccg tttgcatcac caggggaaaa ggactttccg       360
> >      tttaacgcgt ttttccctgc tgggaatgag gaggaacaag aaaaatggcg agagtatgca       420
> >      aaacagcttc gattggaggc caacgcacgt ctcattgaga aggtttttct ttttccagag       480
> >      aaggacggca ccggaaacaa gttctggatg gcgtttgcga agaggccttt cttggcttct       540
> >      agttag                                                                  546
> > //
> > ID   DQ472185  standard; DNA; INV; 543 BP.
> > XX
> > AC   DQ472185;
> > XX
> > SV   DQ472185.1
> > DT   15-MAY-2006
> > XX
> > DE   Trypanosoma cruzi strain CL Brener actin-related protein 4 (ARC20) gene,
> > DE   complete cds.
> > XX
> > KW   .
> > XX
> > OS   Trypanosoma cruzi strain CL Brener
> > OC   Eukaryota; Euglenozoa; Kinetoplastida; Trypanosomatidae; Trypanosoma;
> > OC   Schizotrypanum.
> > XX
> > RN   [1]
> > RP   1-543
> > RA   De Melo L.D.B.;
> > RT   "Actin of Trypanosoma cruzi: ubiquitous actin-binding proteins";
> > RL   Unpublished.
> > XX
> > RN   [2]
> > RP   1-543
> > RA   De Melo L.D.B.;
> > RT   ;
> > RL   Submitted (03-APR-2006) to the EMBL/GenBank/DDBJ databases.
> > RL   Instituto de Biofisica Carlos Chagas Filho, Universidade Federal do Rio
> > RL   de Janeiro, Cidade Universitaria, CCS, Bl.G, Sl.G157, Rio de Janeiro, RJ
> > RL   21949-900, Brazil
> > XX
> > FH   Key             Location/Qualifiers
> > FH
> > FT   source          1..543
> > FT                   /organism="Trypanosoma cruzi strain CL Brener"
> > FT                   /mol_type="genomic DNA"
> > FT                   /strain="CL Brener"
> > FT                   /db_xref="taxon:353153"
> > FT   gene            <1..>543
> > FT                   /gene="ARC20"
> > FT                   /note="TcARC20"
> > FT   mRNA            <1..>543
> > FT                   /gene="ARC20"
> > FT                   /product="actin-related protein 4"
> > FT   CDS             1..543
> > FT                   /gene="ARC20"
> > FT                   /note="actin-binding protein; ARPC4 20 kDa; putative
> > FT                   member of Arp2/3 complex"
> > FT                   /codon_start=1
> > FT                   /product="actin-related protein 4"
> > FT                   /protein_id="ABF13402.1"
> > FT                   /db_xref="GI:93360016"
> > FT                   /translation="MATAYLPYYDCIKCTLHAALCIGNYPSCTVERHNKPEVEVADH
> > FT                   LENNGEIKVQDFLLNPIRIVRSEQESCLIEPSINSTRISVSFLKSDAIAEIIARKYV
> > FT                   GFLAQRAKQFHILRKKPIPGYDISFLISHEEVETMHRNRIIQFIITFLMDIDADIAA
> > FT                   MKLNVNQRARRAAMEFFLALNFT"
> >      atggcaaccg cctatttgcc ttactacgac tgcatcaagt gcacgttgca cgcggctttg        60
> >      tgcatcggga attatccttc atgtaccgtg gagcgtcata ataaaccaga agttgaggtt       120
> >      gcagaccatc tggagaataa tggtgaaata aaagtacaag atttccttct taaccccata       180
> >      cgcattgtgc gttcagaaca ggaaagttgt cttattgaac ctagtataaa cagcacacgc       240
> >      atatctgtat cgtttctcaa gagcgacgct attgcagaga ttattgcccg aaagtacgtt       300
> >      ggatttttag ctcagcgagc caaacagttt cacatcttga gaaaaaagcc tattccggga       360
> >      tatgatataa gttttttgat ttctcacgag gaagtagaaa caatgcatag gaataggatt       420
> >      attcaattta taattacttt cttgatggat attgatgctg acattgctgc aatgaagttg       480
> >      aatgtgaatc aacgtgcacg tcgagcagcg atggaattct ttcttgcatt gaatttcaca       540
> >      tga                                                                     543
> > //
> > +++++++++++++++++++++++++++++++++
> >
> > It looks to me like there's some kind of problem with parsing the
> > sequence version number. I even tried the sequence from test directory
> > (AY069118.em) with same outcome.
> >
> > Regards,
> >
> > Seth
> >
> > On 6/2/06, Richard Holland <richard.holland at ebi.ac.uk> wrote:
> > > Hi Seth.
> > >
> > > Your second point, about the authors string not being read correctly in
> > > Genbank format, has been fixed (or should have been if I got the code
> > > right!). Could you check the latest version of biojava-live out of CVS
> > > and give it another go? Basically the parser did not recognise the
> > > CONSRTM tag, as it is not mentioned in the sample record provided by
> > > NCBI, which is what I based the parser on.
> > >
> > > I've set it up now so that it reads the CONSRTM tag, but the value is
> > > merged with the authors tag with (consortium) appended. There will still
> > > be problems if the consortium value has commas in it - not sure how to
> > > fix this yet.
> > >
> > > Your first point is harder to solve because you did not provide a
> > > complete stack trace for the exceptions you are getting. The complete
> > > stack trace would enable me to identify exactly where things are going
> > > wrong and give me a better chance of fixing them. Could you send the
> > > stack trace, and I'll see what I can do.
> > >
> > > cheers,
> > > Richard
> > >
> > >
> > > On Thu, 2006-06-01 at 18:03 -0400, Seth Johnson wrote:
> > > > Hi All,
> > > >
> > > > I'm a newbie to the whole BioJava(X) API and was hoping to get some
> > > > clarification on several issues that I'm having.
> > > > I am developing a parser that would take as input "NCBI Incremental
> > > > ASN.1 Sequence Updates to Genbank" files (
> > > > ftp://ftp.ncbi.nih.gov/ncbi-asn1/daily-nc ) , gunzip them, and use the
> > > > ASN2GB converter (
> > > > ftp://ftp.ncbi.nih.gov/asn1-converters/by_program/asn2gb ) to convert
> > > > resulting sequences to a format parsable by BioJava(X) (
> > > > http://www.penguin-soft.com/penguin/man/1/asn2gb.html ). This is where
> > > > my problems start.
> > > >
> > > > ISSUE 1:
> > > > I've tried to parse all of the formats that ASN2GB outputs ( GenBank
> > > > (default) , EMBL, nucleotide GBSet (XML), nucleotide INSDSet (XML),
> > > > tiny seq (XML) ) using either BioJava or BioJavaX API.  Only GenBank
> > > > format is recognized by the
> > > > "RichSequence.IOTools.readGenbankDNA(inBuf,gbNspace)" function with
> > > > some exceptions that I'll describe in issue #2.  This is the code that
> > > > I'm using to parse, for example, the EMBL output:
> > > >
> > > > BufferedReader inBuf = new BufferedReader(new FileReader("embl_output.emb"));
> > > > Namespace gbNspace = (Namespace)
> > > > RichObjectFactory.getObject(SimpleNamespace.class, new
> > > > Object[]{"gbSpace"} );
> > > > RichSequenceIterator gbSeqs = RichSequence.IOTools.readEMBLDNA(inBuf,gbNspace);
> > > > while (gbSeqs.hasNext()) {
> > > >   try {
> > > >            RichSequence rs = gbSeqs.nextRichSequence();
> > > >            // Further processing or RichSequence object from here
> > > >
> > > >        } catch (BioException be){
> > > >            be.printStackTrace();
> > > >        }
> > > > }
> > > >
> > > > The multi-sequence EMBL file looks like this:
> > > > ---------------------------------------------------------------------------------
> > > > ID   DQ472184  standard; DNA; INV; 546 BP.
> > > > XX
> > > > AC   DQ472184;
> > > > XX
> > > > SV   DQ472184.1
> > > > DT   15-MAY-2006
> > > > XX
> > > > DE   Trypanosoma cruzi strain CL Brener actin-related protein 3 (ARC21) gene,
> > > > DE   complete cds.
> > > > XX
> > > > KW   .
> > > > XX
> > > > OS   Trypanosoma cruzi strain CL Brener
> > > > OC   Eukaryota; Euglenozoa; Kinetoplastida; Trypanosomatidae; Trypanosoma;
> > > > OC   Schizotrypanum.
> > > > XX
> > > > RN   [1]
> > > > RP   1-546
> > > > RA   De Melo L.D.B.;
> > > > RT   "Actin of Trypanosoma cruzi: ubiquitous actin-binding proteins";
> > > > RL   Unpublished.
> > > > XX
> > > > RN   [2]
> > > > RP   1-546
> > > > RA   De Melo L.D.B.;
> > > > RT   ;
> > > > RL   Submitted (03-APR-2006) to the EMBL/GenBank/DDBJ databases.
> > > > RL   Instituto de Biofisica Carlos Chagas Filho, Universidade Federal do Rio
> > > > RL   de Janeiro, Cidade Universitaria, CCS, Bl.G, Sl.G157, Rio de Janeiro, RJ
> > > > RL   21949-900, Brazil
> > > > XX
> > > > FH   Key             Location/Qualifiers
> > > > FH
> > > > FT   source          1..546
> > > > FT                   /organism="Trypanosoma cruzi strain CL Brener"
> > > > FT                   /mol_type="genomic DNA"
> > > > FT                   /strain="CL Brener"
> > > > FT                   /db_xref="taxon:353153"
> > > > FT   gene            <1..>546
> > > > FT                   /gene="ARC21"
> > > > FT                   /note="TcARC21"
> > > > FT   mRNA            <1..>546
> > > > FT                   /gene="ARC21"
> > > > FT                   /product="actin-related protein 3"
> > > > FT   CDS             1..546
> > > > FT                   /gene="ARC21"
> > > > FT                   /note="actin-binding protein; ARPC3 21 kDa; putative
> > > > FT                   member of Arp2/3 complex"
> > > > FT                   /codon_start=1
> > > > FT                   /product="actin-related protein 3"
> > > > FT                   /protein_id="ABF13401.1"
> > > > FT                   /db_xref="GI:93360014"
> > > > FT                   /translation="MHSRWNGYEESSLLGCGVYPLRRTSRLTPPGPAPRMDEMIEEG
> > > > FT                   EEEPQDIVDEAFYFFKPHMFFRNFPIKGAGDRVILYLTMYLHECLKKIVQLKREEAH
> > > > FT                   SVLLNYATMPFASPGEKDFPFNAFFPAGNEEEQEKWREYAKQLRLEANARLIEKVFL
> > > > FT                   FPEKDGTGNKFWMAFAKRPFLASS"
> > > >      atgcacagca ggtggaatgg gtatgaagaa agtagtcttt tgggctgcgg tgtttatccg        60
> > > >      cttcgccgca cgtcacggct cactccaccc ggccctgcac cgcggatgga tgaaatgatt       120
> > > >      gaggagggcg aagaggagcc acaagacatt gttgacgagg cattttactt ttttaagccc       180
> > > >      cacatgtttt ttcgtaattt tcccattaag ggtgctggtg atcgtgtcat tctgtacttg       240
> > > >      acgatgtacc ttcatgagtg tttgaagaaa attgtccagt tgaagcgtga agaggcccat       300
> > > >      tctgtgcttc ttaactacgc tacgatgccg tttgcatcac caggggaaaa ggactttccg       360
> > > >      tttaacgcgt ttttccctgc tgggaatgag gaggaacaag aaaaatggcg agagtatgca       420
> > > >      aaacagcttc gattggaggc caacgcacgt ctcattgaga aggtttttct ttttccagag       480
> > > >      aaggacggca ccggaaacaa gttctggatg gcgtttgcga agaggccttt cttggcttct       540
> > > >      agttag                                                                  546
> > > > //
> > > > ID   DQ472185  standard; DNA; INV; 543 BP.
> > > > XX
> > > > AC   DQ472185;
> > > > XX
> > > > SV   DQ472185.1
> > > > DT   15-MAY-2006
> > > > XX
> > > > DE   Trypanosoma cruzi strain CL Brener actin-related protein 4 (ARC20) gene,
> > > > DE   complete cds.
> > > > XX
> > > > KW   .
> > > > XX
> > > > OS   Trypanosoma cruzi strain CL Brener
> > > > OC   Eukaryota; Euglenozoa; Kinetoplastida; Trypanosomatidae; Trypanosoma;
> > > > OC   Schizotrypanum.
> > > > XX
> > > > RN   [1]
> > > > RP   1-543
> > > > RA   De Melo L.D.B.;
> > > > RT   "Actin of Trypanosoma cruzi: ubiquitous actin-binding proteins";
> > > > RL   Unpublished.
> > > > XX
> > > > RN   [2]
> > > > RP   1-543
> > > > RA   De Melo L.D.B.;
> > > > RT   ;
> > > > RL   Submitted (03-APR-2006) to the EMBL/GenBank/DDBJ databases.
> > > > RL   Instituto de Biofisica Carlos Chagas Filho, Universidade Federal do Rio
> > > > RL   de Janeiro, Cidade Universitaria, CCS, Bl.G, Sl.G157, Rio de Janeiro, RJ
> > > > RL   21949-900, Brazil
> > > > XX
> > > > FH   Key             Location/Qualifiers
> > > > FH
> > > > FT   source          1..543
> > > > FT                   /organism="Trypanosoma cruzi strain CL Brener"
> > > > FT                   /mol_type="genomic DNA"
> > > > FT                   /strain="CL Brener"
> > > > FT                   /db_xref="taxon:353153"
> > > > FT   gene            <1..>543
> > > > FT                   /gene="ARC20"
> > > > FT                   /note="TcARC20"
> > > > FT   mRNA            <1..>543
> > > > FT                   /gene="ARC20"
> > > > FT                   /product="actin-related protein 4"
> > > > FT   CDS             1..543
> > > > FT                   /gene="ARC20"
> > > > FT                   /note="actin-binding protein; ARPC4 20 kDa; putative
> > > > FT                   member of Arp2/3 complex"
> > > > FT                   /codon_start=1
> > > > FT                   /product="actin-related protein 4"
> > > > FT                   /protein_id="ABF13402.1"
> > > > FT                   /db_xref="GI:93360016"
> > > > FT                   /translation="MATAYLPYYDCIKCTLHAALCIGNYPSCTVERHNKPEVEVADH
> > > > FT                   LENNGEIKVQDFLLNPIRIVRSEQESCLIEPSINSTRISVSFLKSDAIAEIIARKYV
> > > > FT                   GFLAQRAKQFHILRKKPIPGYDISFLISHEEVETMHRNRIIQFIITFLMDIDADIAA
> > > > FT                   MKLNVNQRARRAAMEFFLALNFT"
> > > >      atggcaaccg cctatttgcc ttactacgac tgcatcaagt gcacgttgca cgcggctttg        60
> > > >      tgcatcggga attatccttc atgtaccgtg gagcgtcata ataaaccaga agttgaggtt       120
> > > >      gcagaccatc tggagaataa tggtgaaata aaagtacaag atttccttct taaccccata       180
> > > >      cgcattgtgc gttcagaaca ggaaagttgt cttattgaac ctagtataaa cagcacacgc       240
> > > >      atatctgtat cgtttctcaa gagcgacgct attgcagaga ttattgcccg aaagtacgtt       300
> > > >      ggatttttag ctcagcgagc caaacagttt cacatcttga gaaaaaagcc tattccggga       360
> > > >      tatgatataa gttttttgat ttctcacgag gaagtagaaa caatgcatag gaataggatt       420
> > > >      attcaattta taattacttt cttgatggat attgatgctg acattgctgc aatgaagttg       480
> > > >      aatgtgaatc aacgtgcacg tcgagcagcg atggaattct ttcttgcatt gaatttcaca       540
> > > >      tga                                                                     543
> > > > //
> > > > -----------------------------------------------------------------------
> > > > I get an exception message "Could Not Read Sequence".  Same thing
> > > > happens if I use the readINSDSetDNA reader instead of readEMBLDNA one
> > > > with the following INSDset file (beginning of the file):
> > > >
> > > > <?xml version="1.0"?>
> > > > <!DOCTYPE INSDSeq PUBLIC "-//NCBI//INSD INSDSeq/EN" "INSD_INSDSeq.dtd">
> > > > <INSDSeq>
> > > >   <INSDSeq_locus>DQ022078</INSDSeq_locus>
> > > >   <INSDSeq_length>16729</INSDSeq_length>
> > > >   <INSDSeq_moltype>DNA</INSDSeq_moltype>
> > > >   <INSDSeq_topology>linear</INSDSeq_topology>
> > > >   <INSDSeq_division>ENV</INSDSeq_division>
> > > >   <INSDSeq_update-date>15-MAY-2006</INSDSeq_update-date>
> > > >   <INSDSeq_create-date>15-MAY-2006</INSDSeq_create-date>
> > > >   <INSDSeq_definition>Uncultured bacterium WWRS-2005 putative
> > > > aminoglycoside phosphotransferase (a3.001), putative oxidoreductase
> > > > (a3.002), putative oxidoreductase (a3.003), putative beta-lactamase
> > > > class C (estA3), putative permease (a3.005), putative transmembrane
> > > > signal peptide (a3.006), thiol-disulfide isomerase (a3.007), histone
> > > > acetyltransferase HPA2 (a3.008), putative enzyme (a3.009), putative
> > > > asparaginase (a3.010), hypothetical protein (a3.011), hypothetical
> > > > protein (a3.012), putative membrane protease subunit (a3.013),
> > > > putative haloalkane dehalogenase (a3.014), putative transcriptional
> > > > regulator (a3.015), putative peptidyl-dipeptidase Dcp (a3.016), and
> > > > hypothetical protein (a3.017) genes, complete cds</INSDSeq_definition>
> > > >   <INSDSeq_primary-accession>DQ022078</INSDSeq_primary-accession>
> > > >   <INSDSeq_other-seqids>
> > > >     <INSDSeqid>gb|DQ022078.1|</INSDSeqid>
> > > >     <INSDSeqid>gi|71842722</INSDSeqid>
> > > >   </INSDSeq_other-seqids>
> > > >   <INSDSeq_keywords>
> > > >     <INSDKeyword>ENV</INSDKeyword>
> > > >   </INSDSeq_keywords>
> > > >   <INSDSeq_references>
> > > >     <INSDReference>
> > > >       <INSDReference_reference>?</INSDReference_reference>
> > > >       <INSDReference_position>1..16729</INSDReference_position>
> > > >       <INSDReference_authors>
> > > >         <INSDAuthor>Schmeisser,C.</INSDAuthor>
> > > >         <INSDAuthor>Elend,C.</INSDAuthor>
> > > >         <INSDAuthor>Streit,W.R.</INSDAuthor>
> > > >       </INSDReference_authors>
> > > >       <INSDReference_title>Isolation and biochemical characterization
> > > > of two novel metagenome derived esterases</INSDReference_title>
> > > >       <INSDReference_journal>Appl. Environ. Microbiol. 0:0-0
> > > > (2006)</INSDReference_journal>
> > > >     </INSDReference>
> > > >     <INSDReference>
> > > >       <INSDReference_reference>?</INSDReference_reference>
> > > >       <INSDReference_position>1..16729</INSDReference_position>
> > > >       <INSDReference_authors>
> > > >         <INSDAuthor>Schmeisser,C.</INSDAuthor>
> > > >         <INSDAuthor>Elend,C.</INSDAuthor>
> > > >         <INSDAuthor>Streit,W.R.</INSDAuthor>
> > > >       </INSDReference_authors>
> > > >       <INSDReference_journal>Submitted (29-APR-2005) to the
> > > > EMBL/GenBank/DDBJ databases. Molekulare Enzymtechnologie, University
> > > > Duisburg-Essen, Lotharstrasse 1, Duisburg D-47057,
> > > > Germany</INSDReference_journal>
> > > >     </INSDReference>
> > > >   </INSDSeq_references>
> > > >
> > > > So my question is wether the ASN2GB produces output that's
> > > > incompatible with BioJava parsers or is there a problem with the
> > > > sequence themselves or the problems with the majority of parsers???
> > > > Could it be that I'm using the API wrongly for the above formats,
> > > > although GenBank parser works as advertised with some exceptions
> > > > below:
> > > >
> > > > ISSUE #2:
> > > > When I try to parse GenBank files using the following code:
> > > >
> > > > BufferedReader inBuf = new BufferedReader(new FileReader("genbank_output.gb"));
> > > > Namespace gbNspace = (Namespace)
> > > > RichObjectFactory.getObject(SimpleNamespace.class, new
> > > > Object[]{"gbSpace"} );
> > > > RichSequenceIterator gbSeqs =
> > > > RichSequence.IOTools.readGenbankDNA(inBuf,gbNspace);
> > > > while (gbSeqs.hasNext()) {
> > > >   try {
> > > >            RichSequence rs = gbSeqs.nextRichSequence();
> > > >            // Further processing or RichSequence object from here
> > > >
> > > >        } catch (BioException be){
> > > >            be.printStackTrace();
> > > >        }
> > > > }
> > > >
> > > > Genbank file in question:
> > > >
> > > > LOCUS       BC074905                 838 bp    mRNA    linear   PRI 15-APR-2006
> > > > DEFINITION  Homo sapiens kallikrein 14, mRNA (cDNA clone MGC:104038
> > > >             IMAGE:30915482), complete cds.
> > > > ACCESSION   BC074905
> > > > VERSION     BC074905.2  GI:50959825
> > > > KEYWORDS    MGC.
> > > > SOURCE      Homo sapiens (human)
> > > >   ORGANISM  Homo sapiens
> > > >             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> > > >             Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> > > >             Catarrhini; Hominidae; Homo.
> > > > REFERENCE   1  (bases 1 to 838)
> > > >   AUTHORS   Strausberg,R.L., Feingold,E.A., Grouse,L.H., Derge,J.G.,
> > > >             Klausner,R.D., Collins,F.S., Wagner,L., Shenmen,C.M., Schuler,G.D.,
> > > >             Altschul,S.F., Zeeberg,B., Buetow,K.H., Schaefer,C.F., Bhat,N.K.,
> > > >             Hopkins,R.F., Jordan,H., Moore,T., Max,S.I., Wang,J., Hsieh,F.,
> > > >             Diatchenko,L., Marusina,K., Farmer,A.A., Rubin,G.M., Hong,L.,
> > > >             Stapleton,M., Soares,M.B., Bonaldo,M.F., Casavant,T.L.,
> > > >             Scheetz,T.E., Brownstein,M.J., Usdin,T.B., Toshiyuki,S.,
> > > >             Carninci,P., Prange,C., Raha,S.S., Loquellano,N.A., Peters,G.J.,
> > > >             Abramson,R.D., Mullahy,S.J., Bosak,S.A., McEwan,P.J.,
> > > >             McKernan,K.J., Malek,J.A., Gunaratne,P.H., Richards,S.,
> > > >             Worley,K.C., Hale,S., Garcia,A.M., Gay,L.J., Hulyk,S.W.,
> > > >             Villalon,D.K., Muzny,D.M., Sodergren,E.J., Lu,X., Gibbs,R.A.,
> > > >             Fahey,J., Helton,E., Ketteman,M., Madan,A., Rodrigues,S.,
> > > >             Sanchez,A., Whiting,M., Madan,A., Young,A.C., Shevchenko,Y.,
> > > >             Bouffard,G.G., Blakesley,R.W., Touchman,J.W., Green,E.D.,
> > > >             Dickson,M.C., Rodriguez,A.C., Grimwood,J., Schmutz,J., Myers,R.M.,
> > > >             Butterfield,Y.S., Krzywinski,M.I., Skalska,U., Smailus,D.E.,
> > > >             Schnerch,A., Schein,J.E., Jones,S.J. and Marra,M.A.
> > > >   CONSRTM   Mammalian Gene Collection Program Team
> > > >   TITLE     Generation and initial analysis of more than 15,000 full-length
> > > >             human and mouse cDNA sequences
> > > >   JOURNAL   Proc. Natl. Acad. Sci. U.S.A. 99 (26), 16899-16903 (2002)
> > > >    PUBMED   12477932
> > > > REFERENCE   2  (bases 1 to 838)
> > > >   CONSRTM   NIH MGC Project
> > > >   TITLE     Direct Submission
> > > >   JOURNAL   Submitted (25-JUN-2004) National Institutes of Health, Mammalian
> > > >             Gene Collection (MGC), Bethesda, MD 20892-2590, USA
> > > >   REMARK    NIH-MGC Project URL: http://mgc.nci.nih.gov
> > > > COMMENT     On Aug 4, 2004 this sequence version replaced gi:49901832.
> > > >             Contact: MGC help desk
> > > >             Email: cgapbs-r at mail.nih.gov
> > > >             Tissue Procurement: Genome Sequence Centre, British Columbia Cancer
> > > >             Center
> > > >             cDNA Library Preparation: British Columbia Cancer Research Center
> > > >             cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL)
> > > >             DNA Sequencing by: Genome Sequence Centre,
> > > >             BC Cancer Agency, Vancouver, BC, Canada
> > > >             info at bcgsc.bc.ca
> > > >             Martin Hirst, Thomas Zeng, Ryan Morin, Michelle Moksa, Johnson
> > > >             Pang, Diana Mah, Jing Wang, Kieth Fichter, Eric Chuah, Allen
> > > >             Delaney, Rob Kirkpatrick, Agnes Baross, Sarah Barber, Mabel
> > > >             Brown-John, Steve S. Chand, William Chow, Ryan Babakaiff, Dave
> > > >             Wong, Corey Matsuo, Jaclyn Beland, Susan Gibson, Luis delRio, Ruth
> > > >             Featherstone, Malachi Griffith, Obi Griffith, Ran Guin, Nancy Liao,
> > > >             Kim MacDonald,  Mike R. Mayo, Josh Moran, Diana Palmquist, JR
> > > >             Santos, Duane Smailus, Jeff Stott, Miranda Tsai, George Yang,
> > > >             Jacquie Schein, Asim Siddiqui,Steven Jones, Rob Holt, Marco Marra.
> > > >
> > > >             Clone distribution: MGC clone distribution information can be found
> > > >             through the I.M.A.G.E. Consortium/LLNL at: http://image.llnl.gov
> > > >             Series: IRBU Plate: 4 Row: C Column: 3.
> > > >
> > > >             Differences found between this sequence and the human reference
> > > >             genome (build 36) are described in misc_difference features below.
> > > > FEATURES             Location/Qualifiers
> > > >      source          1..838
> > > >                      /organism="Homo sapiens"
> > > >                      /mol_type="mRNA"
> > > >                      /db_xref="taxon:9606"
> > > >                      /clone="MGC:104038 IMAGE:30915482"
> > > >                      /tissue_type="Lung, PCR rescued clones"
> > > >                      /clone_lib="NIH_MGC_273"
> > > >                      /lab_host="DH10B"
> > > >                      /note="Vector: pCR4 Topo TA with reversed insert"
> > > >      gene            1..838
> > > >                      /gene="KLK14"
> > > >                      /note="synonym: KLK-L6"
> > > >                      /db_xref="GeneID:43847"
> > > >                      /db_xref="HGNC:6362"
> > > >                      /db_xref="IMGT/GENE-DB:6362"
> > > >                      /db_xref="MIM:606135"
> > > >      CDS             49..804
> > > >                      /gene="KLK14"
> > > >                      /codon_start=1
> > > >                      /product="KLK14 protein"
> > > >                      /protein_id="AAH74905.1"
> > > >                      /db_xref="GI:50959826"
> > > >                      /db_xref="GeneID:43847"
> > > >                      /db_xref="HGNC:6362"
> > > >                      /db_xref="IMGT/GENE-DB:6362"
> > > >                      /db_xref="MIM:606135"
> > > >                      /translation="MFLLLTALQVLAIAMTRSQEDENKIIGGYTCTRSSQPWQAALLA
> > > >                      GPRRRFLCGGALLSGQWVITAAHCGRPILQVALGKHNLRRWEATQQVLRVVRQVTHPN
> > > >                      YNSRTHDNDLMLLQLQQPARIGRAVRPIEVTQACASPGTSCRVSGWGTISSPIARYPA
> > > >                      SLQCVNINISPDEVCQKAYPRTITPGMVCAGVPQGGKDSCQGDSGGPLVCRGQLQGLV
> > > >                      SWGMERCALPGYPGVYTNLCKYRSWIEETMRDK"
> > > >      misc_difference 98
> > > >                      /gene="KLK14"
> > > >                      /note="'G' in cDNA is 'A' in the human genome; amino acid
> > > >                      difference: 'R' in cDNA, 'Q' in the human genome."
> > > >      misc_difference 133
> > > >                      /gene="KLK14"
> > > >                      /note="'T' in cDNA is 'C' in the human genome; amino acid
> > > >                      difference: 'Y' in cDNA, 'H' in the human genome."
> > > > ORIGIN
> > > >         1 atgtccctga gggtcttggg ctctgggacc tggccctcag cccctaaaat gttcctcctg
> > > >        61 ctgacagcac ttcaagtcct ggctatagcc atgacacgga gccaagagga tgagaacaag
> > > >       121 ataattggtg gctatacgtg cacccggagc tcccagccgt ggcaggcggc cctgctggcg
> > > >       181 ggtcccaggc gccgcttcct ctgcggaggc gccctgcttt caggccagtg ggtcatcact
> > > >       241 gctgctcact gcggccgccc gatccttcag gttgccctgg gcaagcacaa cctgaggagg
> > > >       301 tgggaggcca cccagcaggt gctgcgcgtg gttcgtcagg tgacgcaccc caactacaac
> > > >       361 tcccggaccc acgacaacga cctcatgctg ctgcagctac agcagcccgc acggatcggg
> > > >       421 agggcagtca ggcccattga ggtcacccag gcctgtgcca gccccgggac ctcctgccga
> > > >       481 gtgtcaggct ggggaactat atccagcccc atcgccaggt accccgcctc tctgcaatgc
> > > >       541 gtgaacatca acatctcccc ggatgaggtg tgccagaagg cctatcctag aaccatcacg
> > > >       601 cctggcatgg tctgtgcagg agttccccag ggcgggaagg actcttgtca gggtgactct
> > > >       661 gggggacccc tggtgtgcag aggacagctc cagggcctcg tgtcttgggg aatggagcgc
> > > >       721 tgcgccctgc ctggctaccc cggtgtctac accaacctgt gcaagtacag aagctggatt
> > > >       781 gaggaaacga tgcgggacaa atgatggtct tcacggtggg atggacctcg tcagctgc
> > > > //
> > > >
> > > > I get the following exception:
> > > >
> > > > java.lang.IllegalArgumentException: Authors string cannot be null
> > > > org.biojava.bio.BioException: Could not read sequence
> > > >         at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:112)
> > > >         at exonhit.parsers.GenBankParser.getSequences(GenBankParser.java:107)
> > > >         at exonhit.parsers.GenBankParser.runGBparser(GenBankParser.java:258)
> > > >         at exonhit.parsers.GenBankParser.main(GenBankParser.java:341)
> > > > Caused by: java.lang.IllegalArgumentException: Authors string cannot be null
> > > >         at org.biojavax.DocRefAuthor$Tools.parseAuthorString(DocRefAuthor.java:76)
> > > >         at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:356)
> > > >         at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109)
> > > >
> > > > -----------------------------------------------------------------------
> > > >
> > > > I'm trying to see what could be the problem with this particular
> > > > sequence.  Looks to me like the AUTHORS portion is not getting parsed
> > > > correctly.  Any ideas would be greatly appreciated!
> > > >
> > > --
> > > Richard Holland (BioMart Team)
> > > EMBL-EBI
> > > Wellcome Trust Genome Campus
> > > Hinxton
> > > Cambridge CB10 1SD
> > > UNITED KINGDOM
> > > Tel: +44-(0)1223-494416
> > >
> > >
> >
> >
> --
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416
>
>


-- 
Best Regards,


Seth Johnson
Senior Bioinformatics Associate

Ph: (202) 470-0900
Fx: (775) 251-0358




More information about the Biojava-l mailing list