[Biojava-l] single letter code from protein ambiguities?
community at struck.lu
community at struck.lu
Mon May 19 09:35:54 UTC 2008
Thank you for pointing me in the right
direction:##################################
SymbolList
symL =
DNATools.createDNA("atnatggnnatg");
SymbolList symL2 =
DNATools.toProtein(symL);
System.out.println(symL2.seqString());
for (Iterator i = symL2.iterator(); i.hasNext();)
{
Symbol sym = (Symbol)
i.next();
System.out.println(sym.getName() +
"#");
}
SymbolTokenization toke =
symL2.getAlphabet().getTokenization("token");
for (Iterator i = symL2.iterator(); i.hasNext();)
{
Symbol sym = (Symbol)
i.next();
Alphabet arg =
sym.getMatches();
for (Iterator i2 = ((FiniteAlphabet) arg).iterator(); i2.hasNext();)
{
Symbol sym2 = (Symbol)
i2.next();
System.out.println("toke:
"+toke.tokenizeSymbol(sym2));
System.out.println("name:
"+sym2.getName());
}
System.out.println("\n");
}################################## This will print out the single letter
code: System.out.println("toke: "+toke.tokenizeSymbol(sym2)); This
will print out the three letter code: System.out.println("name:
"+sym2.getName()); Do you think it is worthwhile to put this sample code
in the wiki?Thanks,Daniel"Mark Schreiber"
<markjschreiber at gmail.com> wrote: > Hi - > > Yes, this is
absolutely possible. If biojava can create an unambigous > amino acid from
an ambigous codon it will. If the possible amino acids > are a choice of 2
or more an ambiguity symbol (BasisSymbol) is created > that contains those
amino acids. > > Note that if you turn any ambiguous amino acid into a
String then you > will just get an X so you need to decompose it into it's
underlying > AtomicSymbols. > > See
http://biojava.org/wiki/BioJava:Cookbook:Alphabets:Ambiguous for > some
idea (except in your case you need to do the reverse). > > This would
make another nice example for the cookbook so when you get > some demo code
working it would be good if you could put it up on the > wiki. > > -
Mark > > On Wed, May 7, 2008 at 6:31 PM, community at struck.lu
<community at struck.lu> > wrote: > > Hi,I am just beginning to
use biojava and I have a question concerning the > > parsing of protein
sequences containing ambiguities:Is it possible to get > all > > the
possible amino acids at each position of the protein sequence with a > >
single letter code instead of the three letter code?Suppose I would >
translate > > a DNA sequence containing an &quot;N&quot;, so the
protein translation would > > also contain ambiguities:SymbolList symL =
> > DNATools.createDNA(&quot;atnatg&quot;);SymbolList symL2 =
> > DNATools.toProtein(symL);Iterator symIt = > >
symL2.iterator();System.out.println(symL2.seqString());OUTPUT:XMSymbol >
> hm;while (symIt.hasNext()) {&nbsp;&nbsp;&nbsp; hm = (Symbol)
> > symIt.next();&nbsp;&nbsp;&nbsp; >
System.out.println(hm.getName());}OUTOUT:[MET > > ILE]METWould it be
possible to ouput:MIMRegards,Daniel Struck > >
_________________________________________________________ > > Mail sent
using root eSolutions Webmailer - www.root.lu > >
_______________________________________________ > > Biojava-l mailing
list - Biojava-l at lists.open-bio.org > >
http://lists.open-bio.org/mailman/listinfo/biojava-l > > >
_________________________________________________________
Mail sent using root eSolutions Webmailer - www.root.lu
More information about the Biojava-l
mailing list