[Biojava-l] Problems parsing in RNA sequence in genbank format

Lachlan Coin lc1 at sanger.ac.uk
Fri Apr 4 14:02:00 EST 2003


Hi,

I am parsing in RNA sequence data.  One of the positions has a 'R' which
stands for the ambiguity symbol A or G.  However, the RNA alphabet does
not have this as a token for anything (I did a quick test, and the
ambiguity token for A or G is N).  So the genbank reader falls over when
it gets to this.

Any suggestions on how to handle this?  Can we modify the symbol
tokenization for RNA to cope with this case?

Thanks a lot,

Lachlan



More information about the Biojava-l mailing list