[Biojava-l] blast SAX parser

Russell Smithies russell.smithies at xtra.co.nz
Thu Jun 26 12:07:46 EDT 2003


Some people may call this cheating but I wrote a simple utility
pre-processor for blast XML to convert it into something a basic SAX parser
can read  :-)
-----------------------------------------------------------
import java.io.*;

public class XMLPreProcessor{
  /**
   * A simple utility method to create a new XML file containing data
   * converted from the default blast -m7 XML format into something that
   * can be easily read by a standard SAX parser.
   *
   * @param inFileName name of file in default blast -m7 format
   * @param outfileName name of output file converted to SAX-parser
compliant XML
   * @author Russell Smithies
   */
  public void process(String inFileName, String outfileName){
    try{
      BufferedReader in = new BufferedReader(new FileReader(new
File(inFileName)));
      BufferedWriter out = new BufferedWriter(new FileWriter(outfileName));
      StringBuffer sb = null;
      //print XML version header
      out.write(in.readLine());
      out.newLine();
      while(in.ready()){
        String line = in.readLine();
        //preserve single line comments containing DTD stuff
        if(line.indexOf("<!") >= 0){
          out.write(line);
          out.newLine();
          //XML header type node
        } else if(line.indexOf(">") == line.length() - 1){
          out.write(line);
          out.newLine();
          //prune crap out of other lines
        } else{
          sb = new StringBuffer(line);
          sb.replace(sb.indexOf(">"), sb.indexOf(">") + 1, "=\"");
          sb.delete(sb.lastIndexOf("<"), sb.length() - 1);
          sb.insert(sb.length() - 1, "\"/");
          sb.replace(sb.indexOf("_"), sb.indexOf("_") + 1, " ");
          out.write(sb.toString());
          out.newLine();
        }
      }
      out.flush();
      out.close();
    } catch(IOException ex){
      ex.printStackTrace();
    }
  }
}
--------------------------------------------------------------------------

it produces nice looking XML but it's probably not worth adding to biojava.


Russell




More information about the Biojava-l mailing list