[Biojava-l] GenBank Parser Exception

Ron Kuhn rkuhn@Cellomics.com
Wed, 21 Nov 2001 15:39:27 -0500


I have another fix for an exception that I got when parsing GenBank
sequences that have strands defined on the LOCUS line (e.g. AF343912).
BioJava assumes that the strands and topology (when both given) are separate
tokens. This is not true. Here is the fix for the processHeaderLine method
in GenbankFormat.java:

Substitute the following code for the code inside the if LOCUS:

if (line.startsWith(GenbankFormat.LOCUS_TAG)
{
    // the LOCUS line is a special case because it contains the
    // locus, size, molecule type, GenBank division, and the date
    // of last modification.
    if (line.length() < 73)
    	throw new ParseException("LOCUS line too short [" + line + "]");
    	
    saveSeqAnno2(GenbankFormat.LOCUS_TAG, line.substring(12, 22));
    saveSeqAnno2(GenbankFormat.SIZE_TAG, line.substring(22, 29));
    saveSeqAnno2(GenbankFormat.STRAND_NUMBER_TAG, line.substring(33, 35));
    saveSeqAnno2(GenbankFormat.TYPE_TAG, line.substring(36, 41));
    saveSeqAnno2(GenbankFormat.CIRCULAR_TAG, line.substring(42, 52));
    saveSeqAnno2(GenbankFormat.DIVISION_TAG, line.substring(52, 55));
    saveSeqAnno2(GenbankFormat.DATE_TAG, line.substring(62, 73));
}

And add the supporting method:
    /**
     * Private method to process a header tag and associated value.
     *
     * @param tag The tag to add
     * @param value The value of the associated tag
     * @throws ParseException Thrown when an error occurs parsing the file
     */
	private void saveSeqAnno2(String tag, String value)
	throws ParseException
	{
		value = value.trim();	// strip whitespace
		if (value.length() > 0) {
			this.saveSeqAnno();
			headerTag = tag;
	    	headerTagText = new StringBuffer(value);
		}
	}

Ron Kuhn