[Biojava-l] (no subject)

Bruno Aranda - Dev bruno_dev at ebiointel.com
Wed Jul 21 02:52:39 EDT 2004


Hi Alexandre,

To parse the ClustalW results I use a SequenceAlignmentSAXParser and a
custom implementation of DefaultHandler which I call
'SequenceAlignmentContentHandler'.

The code for the custom DefaultHandler class is:


public final class SequenceCollectionContentHandler extends DefaultHandler {

    private final Map sequenceMap;
    private final Alphabet alphabet;

    private String currentSeqName;
    private String currentSeq;

    /**
     * Creates a new <code>SequenceAlignmentContentHandler</code> instance.
     *
     * @param map
     *            The map to be filled with sequences
     * @param alphabet
     *            The alphabet to be used
     */
    public SequenceCollectionContentHandler(Map map, Alphabet alphabet) {
        this.sequenceMap = map;
        this.alphabet = alphabet;
    }

    // This method is called when an element is encountered
    public final void startElement(String namespaceURI, String localName,
            String qName, Attributes atts) {

        if (localName.equals("Sequence")) {
            startCurrentSequence(atts);
        }
    }

    /*
     * (non-Javadoc)
     *
     * @see org.xml.sax.ContentHandler#characters(char[], int, int)
     */
    public final void characters(char[] ch, int start, int length)
            throws SAXException {
        String content = new String(ch, start, length);
        this.currentSeq = content;
    }

    /*
     * (non-Javadoc)
     *
     * @see org.xml.sax.ContentHandler#endElement(java.lang.String,
     *      java.lang.String, java.lang.String)
     */
    public final void endElement(String uri, String localName, String qName)
            throws SAXException {
        if (localName.equals("Sequence")) {
            endCurrentSequence();
        }

    }

    private void startCurrentSequence(Attributes atts) {
        String attName = atts.getLocalName(0);
        if (attName.equals("sequenceName")) {
            this.currentSeqName = atts.getValue(0);
        }
    }

    private void endCurrentSequence() {
        if (this.alphabet.equals(DNATools.getDNA())) {
            try {
                Sequence seq = DNATools.createDNASequence(currentSeq,
                        currentSeqName);
                this.sequenceMap.put(currentSeqName, seq);
            } catch (IllegalSymbolException e) {
                System.err.println(this.getClass()
                        + " - IllegalSymbolException: " + e.getMessage());
            }

        } else if (this.alphabet.equals(RNATools.getRNA())) {
            try {
                Sequence seq = RNATools.createRNASequence(currentSeq,
                        currentSeqName);
                this.sequenceMap.put(currentSeqName, seq);
            } catch (IllegalSymbolException e) {
                System.err.println(this.getClass()
                        + " - IllegalSymbolException: " + e.getMessage());
            }
        } else if (this.alphabet.equals(ProteinTools.getAlphabet())) {
            try {
                Sequence seq = ProteinTools.createProteinSequence(currentSeq,
                        currentSeqName);
                this.sequenceMap.put(currentSeqName, seq);
            } catch (IllegalSymbolException e) {
                System.err.println(this.getClass()
                        + " - IllegalSymbolException: " + e.getMessage());
            }
        }
    }

}


Then, the code to use the SequenceAlignmentSAXParser and the handler could
be:

		// copy and paste from here

		File alnFile = new File("/yout/aln/file"); // put here the path to the
aln output file from the clustal
		Alphabet alphabet = ...; // put here the alphabet to be use (eg.
DNATools.getDNA());

		Map seqMap = new HashMap(); // this map will be fill by the sequences
from the alignment

		SequenceAlignmentSAXParser parser = new SequenceAlignmentSAXParser();

		ContentHandler handler = new SequenceCollectionContentHandler(
				seqMap, alphabet);
		try {
			BufferedReader contents = new BufferedReader(new InputStreamReader(
					alnStream));

			parser.setContentHandler(handler);
			parser.parse(new InputSource(contents));

		} catch (FileNotFoundException fnfe) {
			System.out.println(fnfe.getMessage());
			System.out.println("Couldn't open file");
		} catch (IOException ioe) {
			ioe.printStackTrace();
		} catch (SAXException se) {
			System.err.println(se.getMessage());
			se.printStackTrace();
		}

		// Finally I create the alignment object using the Map
		Alignment alignment = new SimpleAlignment(seqMap);


		// end of copy


So you have an Alignment instance which contains all the sequences in the
alignment. I know there are better aproximations, but this one works for
me... If you have any doubt, don't hesitate to ask again!

Cheers,

Bruno



More information about the Biojava-l mailing list