[Biojava-l] Problem parsing biojava xml file

Mark Schreiber markjschreiber at gmail.com
Thu Jul 17 12:44:08 UTC 2008


Hi -

In the past I have seen this when  there are invisible metacharacters
in the stream or file before the XML proper starts. This can happen
with language variants of Unicode.  Try trimming the String before
parsing.

- Mark

On Thu, Jun 5, 2008 at 2:16 AM, benn <benn at mpi-cbg.de> wrote:
> Hello,
>
>        Sorry to pepper the board with questions!  I am working on BLAST
> parsing and have the standard output for BLAST working fine with JUnit
> tests.  So I am attempting to recreate this for files in XML format comming
> from blast (blastp), however I have the problem that I get a SAXExepttion
> that content is not allowed before prolog.  I thought I could have some
> invisible characters whihc is causing it to throw a wobbly but I cannto see
> any.  Has anyone else come across the problem.  for completeness i have
> attached teh blast file and the code to parse is below:
>
> <code>
>  private List<SeqSimilaritySearchResult> parseBlast(String filename)
>          throws IOException, SAXException, BioException {
>
>      InputStream is = new FileInputStream(
>              "src/test/resources/blast/standardoutput.blastp");
>
>      BlastXMLParserFacade parser = new BlastXMLParserFacade();
>      SeqSimilarityAdapter adapter = new SeqSimilarityAdapter();
>      parser.setContentHandler(adapter);
>      List<SeqSimilaritySearchResult> results = new
> ArrayList<SeqSimilaritySearchResult>();
>
>      SearchContentHandler builder = new BlastLikeSearchBuilder(results,
>              new DummySequenceDB("queries"),
>              new DummySequenceDBInstallation());
>
>      adapter.setSearchContentHandler(builder);
>
>      parser.parse(new InputSource(is));
>      return results;
>  }
> </code>
>
> Cheers,
>
> Neil
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>



More information about the Biojava-l mailing list