[Biojava-l] How to create a SymbolList with a String that containsillegal Char

VERHOEF Frans verhoeff2 at gis.a-star.edu.sg
Mon Dec 8 21:25:55 EST 2003


Hi Tao,

Am I right you want to read in genbank data? You might want to take a
look at this particular page of biojava in anger:
http://www.biojava.org/docs/bj_in_anger/ReadingGES.htm

This page describes how to read in sequence data from genbank.
I hope this helps.

Regards

Frans 


> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org [mailto:biojava-l-
> bounces at portal.open-bio.org] On Behalf Of Tao Xu
> Sent: Tuesday, December 09, 2003 10:02 AM
> To: biojava-l at biojava.org
> Subject: [Biojava-l] How to create a SymbolList with a String that
> containsillegal Char
> 
> Hi there,
> 
> Does anyone know how to create a SymbolList with a String that
> contains illegal symbol?
> 
> I encountered IllegalSymbolException when I tried to retrieve
> sequences from a sequence database. The sequence that gave me the
> trouble was a refseq sequence, accession number NT_039621, Mus
> musculus chromosome 15 genomic contig. I firsted used
> DNATools.createDNA(String dna), and got IllegalSymbolException that
> indicated there was at least one 'u' in the sequence. I then used
> NucleotideTools.createNucleotide(String nucleotide), this time the 'u'
> did not cause any problem, but however I sitll got
> IllegalSymbolException that inidicated there was 'l' in the sequence.
> 
> I am afraid there must be lots of illegal symbols in GenBank's
> sequences, I am wondering if there is a way to create error-tolerate
> SymbolList object. If not, I am afraid I have to create an Alphabet
> object that contains Symbols that covers all char in java and use this
> Alphabet object to create a CharacterTokenization using
> CharacterTokenization(Alphabet alpha, boolean caseSensitive)
> constructor, and then use the resulting CharacterTokenization object
> to call SimpleSymbolList(SymbolTokenization st, String seqString) to
> get a SimpleSymbolList object. I guess there must be a better way in
> Biojava to do this. Your help is highly appreciated.
> 
> If I have to create an Alphatebet that covers all char in Java, how
> can I do it? I originally thought merge NUCLEOTIDE and PROTEIN
> Alphabet to create a new Alphabet would be able to cover all the
> Symboles in GenBank sequences, but I noticed there was no method to
> merge to Alphabets in AlphabetManager. Is there a way to merge two
> Alphabets? If not, probably it is worth to implement one. It will be
> useful not only to handle IllegalSymbols exist in the databases, but
> also other applications like using non-standard symbols to generate
> blastable MSBlast database.
> 
> Thanks a lot for your help.
> 
> Regards,
> 
> Tao
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l



More information about the Biojava-l mailing list