[Biojava-l] blast SAX parser

David Huen smh1008 at cus.cam.ac.uk
Thu Jun 26 08:40:52 EDT 2003


On Thu, 26 Jun 2003, Russell Smithies wrote:

I normally use a bash script with sed for this but i think it's still
worth having.

Mind if I stick it into the javadocs somewhere within the blastXML
package?

Regards,
David

> Some people may call this cheating but I wrote a simple utility
> pre-processor for blast XML to convert it into something a basic SAX parser
> can read  :-)
> -----------------------------------------------------------
> import java.io.*;
> 
> public class XMLPreProcessor{
>   /**
>    * A simple utility method to create a new XML file containing data
>    * converted from the default blast -m7 XML format into something that
>    * can be easily read by a standard SAX parser.
>    *
>    * @param inFileName name of file in default blast -m7 format
>    * @param outfileName name of output file converted to SAX-parser
> compliant XML
>    * @author Russell Smithies
>    */
>   public void process(String inFileName, String outfileName){
>     try{
>       BufferedReader in = new BufferedReader(new FileReader(new
> File(inFileName)));
>       BufferedWriter out = new BufferedWriter(new FileWriter(outfileName));
>       StringBuffer sb = null;
>       //print XML version header
>       out.write(in.readLine());
>       out.newLine();
>       while(in.ready()){
>         String line = in.readLine();
>         //preserve single line comments containing DTD stuff
>         if(line.indexOf("<!") >= 0){
>           out.write(line);
>           out.newLine();
>           //XML header type node
>         } else if(line.indexOf(">") == line.length() - 1){
>           out.write(line);
>           out.newLine();
>           //prune crap out of other lines
>         } else{
>           sb = new StringBuffer(line);
>           sb.replace(sb.indexOf(">"), sb.indexOf(">") + 1, "=\"");
>           sb.delete(sb.lastIndexOf("<"), sb.length() - 1);
>           sb.insert(sb.length() - 1, "\"/");
>           sb.replace(sb.indexOf("_"), sb.indexOf("_") + 1, " ");
>           out.write(sb.toString());
>           out.newLine();
>         }
>       }
>       out.flush();
>       out.close();
>     } catch(IOException ex){
>       ex.printStackTrace();
>     }
>   }
> }
> --------------------------------------------------------------------------
> 
> it produces nice looking XML but it's probably not worth adding to biojava.
> 
> 
> Russell
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

David Huen, Ph.D.              Email: smh1008 at cus.cam.ac.uk
Dept. of Genetics              Fax  : +44 1223 333992
University of Cambridge        Phone: +44 1223 766748/333982
Cambridge, CB2 3EH
U.K.



More information about the Biojava-l mailing list