[Biojava-l] Problem while using BlastXMLParserFacade

Mark Schreiber markjschreiber at gmail.com
Fri Jun 6 02:34:21 UTC 2008


Hi -

I have seen this problem twice. Once it was due to 'invisible' white
space before the XML. If you call trim() on the String that you get
from the file you can get rid of this. You may not even be able to see
it as some meta-characters are non-printing.  The other problem is if
your file was produced and saved on linux/unix and parsed on Windows.
Problems can occur due to the different line feeds and carriage
returns. While Java automatically knows which combination works for
which operating it will assume the file you are working on came from
your operating system.  If this is the problem you can solve it using
the unix utility dos2unix or unix2dos depending on which way you are
going.

- Mark

On Thu, Jun 5, 2008 at 7:49 PM, benn <benn at mpi-cbg.de> wrote:
>
> Hello,
>
>       Sorry to pepper the board with questions!  I am working on BLAST parsing and have the standard output for BLAST working fine with JUnit tests.  So I am attempting to recreate this for files in XML format coming from blast (blastp), however I have the problem that I get a SAXException that content is not allowed before prolog.  I thought I could have some invisible characters which is causing it to throw a wobbly but I cannot see any.  Has anyone else come across the problem?  For completeness, the file can be downloaded at : http://idisk-srv1.mpi-cbg.de/~benn/xmloutput.xml (the mailing list server would not me attach the file to the email) and the code which parses is below:
>
> <code>
>  private List<SeqSimilaritySearchResult> parseBlast(String filename)
>         throws IOException, SAXException, BioException {
>
>     InputStream is = new FileInputStream(
>             "src/test/resources/blast/standardoutput.blastp");
>
>     BlastXMLParserFacade parser = new BlastXMLParserFacade();
>     SeqSimilarityAdapter adapter = new SeqSimilarityAdapter();
>     parser.setContentHandler(adapter);
>     List<SeqSimilaritySearchResult> results = new ArrayList<SeqSimilaritySearchResult>();
>
>     SearchContentHandler builder = new BlastLikeSearchBuilder(results,
>             new DummySequenceDB("queries"),
>             new DummySequenceDBInstallation());
>
>     adapter.setSearchContentHandler(builder);
>
>     parser.parse(new InputSource(is));
>     return results;
>  }
> </code>
>
> Cheers,
>
> Neil
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l



More information about the Biojava-l mailing list