[Biojava-l] A problem parsing Blast XML output (blastN vs. blastP)

Benoit VARVENNE varvenne at genoway.com
Fri Nov 24 17:04:01 UTC 2006


Ooooo sorry i hadn't noticed that. Here is the traduction (home made...)

"org.xml.sax.SAXParseException: An XML declaration can only begin with an
entity"

The problem is that my two files (attached this time) don't seem to differ.

Have you got an xml validation tool to recommend me ?

Cheers,

Benoît 

Le 24/11/06 17:56, « Richard Holland » <holland at ebi.ac.uk> a écrit :

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Unfortunately the exception you are getting is in French and sadly my 1
> year of French lessons in school failed to make any impact on my ability
> to understand it. :)
> 
> But, my guess would be that something is wrong with the BlastP output,
> as the exception is an XML parser one and not a BioJava one. It's most
> likely that the BlastP report is not valid XML. Try running it through
> an XML validation tool to check.
> 
> cheers,
> Richard
> 
> Benoit VARVENNE wrote:
>> Hello,
>> 
>> I'm parsing blast results using biojava1.5 and a BlastXMLParserFacade with
>> the code put at the end of this mail.
>> 
>> I've tried this with a blastN query and there i got no trouble.
>> However, i've tried to do exactly the same thing with a BlastP query and
>> i've got the exception cited at the end of this mail.
>> 
>> I've verified and the two infiles (blastn/blastp) seem to have the same
>> structures (except that one is for prot so data are different). (Please find
>> them as attached if you're used to this).
>> 
>> Can someone help me ? I don't understand why it works in a case and not in
>> the other one ...
>> 
>> Thanks a lot,
>> Cheers,
>> 
>> Benoît.
>> 
>> 
>> -------------------
>> My code :
>> -----
>> InputStream is = new FileInputStream(blastFile);
>> // blastFile is the xml file, output of my blast
>> 
>>       //make a BlastLikeSAXParser
>>       BlastXMLParserFacade parser = new BlastXMLParserFacade();
>>       //make the SAX event adapter that will pass events to a Handler.
>>       SeqSimilarityAdapter adapter = new SeqSimilarityAdapter();
>> 
>>       //set the parsers SAX event adapter
>>       parser.setContentHandler(adapter);
>> 
>>       //The list to hold the SeqSimilaritySearchResults
>>       List results = new ArrayList();
>> 
>>       //create the SearchContentHandler that will build
>> SeqSimilaritySearchResults
>>       //in the results List
>>       SearchContentHandler builder = new BlastLikeSearchBuilder(results,
>>           new DummySequenceDB("queries"), new
>> DummySequenceDBInstallation());
>> 
>>       //register builder with adapter
>>       adapter.setSearchContentHandler(builder);
>> 
>>       parser.parse(new InputSource(is)); // From here come the Exception
>> 
>> -------------------
>> 
>> 
>> -------------------
>> The exception :
>> ----
   org.xml.sax.SAXParseException: An XML declaration can only begin with an
entity
>>         at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3376)
>>         at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3364)
>>         at org.apache.crimson.parser.Parser2.maybePI(Parser2.java:1140)
>>         at org.apache.crimson.parser.Parser2.maybeMisc(Parser2.java:1266)
>>         at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:671)
>>         at org.apache.crimson.parser.Parser2.parse(Parser2.java:337)
>>         at 
>> org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448)
>>         at 
>> org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLPars
>> erFacade.java:180)
>> 
>> 
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
>> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFFZyRI4C5LeMEKA/QRAmQoAJoD1RjbKgqOsiRVW1rPBrYDcaAObgCglHYI
> 48o/exGfv1xSSnLzMMjPKxo=
> =tRNi
> -----END PGP SIGNATURE-----
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: BlastN_XML_ouput.txt
Type: application/octet-stream
Size: 2959941 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20061124/6fff7c36/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BlastP_XML_output.txt
Type: application/octet-stream
Size: 1961596 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20061124/6fff7c36/attachment-0005.obj>


More information about the Biojava-l mailing list