[Biojava-l] single letter code from protein ambiguities?

Mon May 19 09:35:54 UTC 2008

Thank you for pointing me in the right
direction:##################################
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SymbolList
symL =
DNATools.createDNA(&quot;atnatggnnatg&quot;);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
SymbolList symL2 =
DNATools.toProtein(symL);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
System.out.println(symL2.seqString());&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
for (Iterator i = symL2.iterator(); i.hasNext();)
{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Symbol sym = (Symbol)
i.next();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
System.out.println(sym.getName() +
&quot;#&quot;);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
SymbolTokenization toke =
symL2.getAlphabet().getTokenization(&quot;token&quot;);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
for (Iterator i = symL2.iterator(); i.hasNext();)
{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Symbol sym = (Symbol)
i.next();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Alphabet arg =
sym.getMatches();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
for (Iterator i2 = ((FiniteAlphabet) arg).iterator(); i2.hasNext();)
{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Symbol sym2 = (Symbol)
i2.next();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
System.out.println(&quot;toke:
&quot;+toke.tokenizeSymbol(sym2));&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
System.out.println(&quot;name:
&quot;+sym2.getName());&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
System.out.println(&quot;\n&quot;);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
}################################## This will print out the single letter
code: System.out.println(&quot;toke: &quot;+toke.tokenizeSymbol(sym2)); This
will print out the three letter code:  System.out.println(&quot;name:
&quot;+sym2.getName()); Do you think it is worthwhile to put this sample code
in the wiki?Thanks,Daniel&quot;Mark Schreiber&quot;
&lt;markjschreiber at gmail.com&gt; wrote:  &gt; Hi - &gt;  &gt; Yes, this is
absolutely possible. If biojava can create an unambigous &gt; amino acid from
an ambigous codon it will. If the possible amino acids &gt; are a choice of 2
or more an ambiguity symbol (BasisSymbol) is created &gt; that contains those
amino acids. &gt;  &gt; Note that if you turn any ambiguous amino acid into a
String then you &gt; will just get an X so you need to decompose it into it's
underlying &gt; AtomicSymbols. &gt;  &gt; See
http://biojava.org/wiki/BioJava:Cookbook:Alphabets:Ambiguous for &gt; some
idea (except in your case you need to do the reverse). &gt;  &gt; This would
make another nice example for the cookbook so when you get &gt; some demo code
working it would be good if you could put it up on the &gt; wiki. &gt;  &gt; -
Mark &gt;  &gt; On Wed, May 7, 2008 at 6:31 PM, community at struck.lu
&lt;community at struck.lu&gt; &gt; wrote: &gt; &gt; Hi,I am just beginning to
use biojava and I have a question concerning the &gt; &gt; parsing of protein
sequences containing ambiguities:Is it possible to get &gt; all &gt; &gt; the
possible amino acids at each position of the protein sequence with a &gt; &gt;
single letter code instead of the three letter code?Suppose I would &gt;
translate &gt; &gt; a DNA sequence containing an &amp;quot;N&amp;quot;, so the
protein translation would &gt; &gt; also contain ambiguities:SymbolList symL =
&gt; &gt; DNATools.createDNA(&amp;quot;atnatg&amp;quot;);SymbolList symL2 =
&gt; &gt; DNATools.toProtein(symL);Iterator symIt = &gt; &gt;
symL2.iterator();System.out.println(symL2.seqString());OUTPUT:XMSymbol &gt;
&gt; hm;while (symIt.hasNext()) {&amp;nbsp;&amp;nbsp;&amp;nbsp; hm = (Symbol)
&gt; &gt; symIt.next();&amp;nbsp;&amp;nbsp;&amp;nbsp; &gt;
System.out.println(hm.getName());}OUTOUT:[MET &gt; &gt; ILE]METWould it be
possible to ouput:MIMRegards,Daniel Struck &gt; &gt;
_________________________________________________________ &gt; &gt; Mail sent
using root eSolutions Webmailer - www.root.lu &gt; &gt;
_______________________________________________ &gt; &gt; Biojava-l mailing
list  -  Biojava-l at lists.open-bio.org &gt; &gt;
http://lists.open-bio.org/mailman/listinfo/biojava-l &gt; &gt; &gt;
_________________________________________________________
Mail sent using root eSolutions Webmailer - www.root.lu