[Biojava-l] blast SAX parser
David Huen
smh1008 at cus.cam.ac.uk
Thu Jun 26 08:40:52 EDT 2003
On Thu, 26 Jun 2003, Russell Smithies wrote:
I normally use a bash script with sed for this but i think it's still
worth having.
Mind if I stick it into the javadocs somewhere within the blastXML
package?
Regards,
David
> Some people may call this cheating but I wrote a simple utility
> pre-processor for blast XML to convert it into something a basic SAX parser
> can read :-)
> -----------------------------------------------------------
> import java.io.*;
>
> public class XMLPreProcessor{
> /**
> * A simple utility method to create a new XML file containing data
> * converted from the default blast -m7 XML format into something that
> * can be easily read by a standard SAX parser.
> *
> * @param inFileName name of file in default blast -m7 format
> * @param outfileName name of output file converted to SAX-parser
> compliant XML
> * @author Russell Smithies
> */
> public void process(String inFileName, String outfileName){
> try{
> BufferedReader in = new BufferedReader(new FileReader(new
> File(inFileName)));
> BufferedWriter out = new BufferedWriter(new FileWriter(outfileName));
> StringBuffer sb = null;
> //print XML version header
> out.write(in.readLine());
> out.newLine();
> while(in.ready()){
> String line = in.readLine();
> //preserve single line comments containing DTD stuff
> if(line.indexOf("<!") >= 0){
> out.write(line);
> out.newLine();
> //XML header type node
> } else if(line.indexOf(">") == line.length() - 1){
> out.write(line);
> out.newLine();
> //prune crap out of other lines
> } else{
> sb = new StringBuffer(line);
> sb.replace(sb.indexOf(">"), sb.indexOf(">") + 1, "=\"");
> sb.delete(sb.lastIndexOf("<"), sb.length() - 1);
> sb.insert(sb.length() - 1, "\"/");
> sb.replace(sb.indexOf("_"), sb.indexOf("_") + 1, " ");
> out.write(sb.toString());
> out.newLine();
> }
> }
> out.flush();
> out.close();
> } catch(IOException ex){
> ex.printStackTrace();
> }
> }
> }
> --------------------------------------------------------------------------
>
> it produces nice looking XML but it's probably not worth adding to biojava.
>
>
> Russell
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
David Huen, Ph.D. Email: smh1008 at cus.cam.ac.uk
Dept. of Genetics Fax : +44 1223 333992
University of Cambridge Phone: +44 1223 766748/333982
Cambridge, CB2 3EH
U.K.
More information about the Biojava-l
mailing list