[Biojava-dev] [Bug 2854] New: Selection of protein alphabet is hardcoded in ProteinTools class
Mark Schreiber
markjschreiber at gmail.com
Thu Jun 11 01:30:00 UTC 2009
This actually raises an interesting point for the development of
biojava3. Do we actually need separate protein alphabets? I can't
actually remember the reason these are separate. Is there a good
argument for this???
- Mark
On Thu, Jun 11, 2009 at 5:59 AM, <bugzilla-daemon at portal.open-bio.org> wrote:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2854
>
> Summary: Selection of protein alphabet is hardcoded in
> ProteinTools class
> Product: BioJava
> Version: live (CVS source)
> Platform: All
> OS/Version: All
> Status: NEW
> Severity: normal
> Priority: P2
> Component: seq
> AssignedTo: biojava-dev at biojava.org
> ReportedBy: mdharsee at ocbn.ca
>
>
> In our application we are calling createProtein() in class
> org.biojava.bio.seq.ProteinTools to generate SymbolList objects to encapsulate
> peptide sequences that are composed of the 20 common amino acid symbols, as
> well as the 'X' ambiguity symbol.
>
> However createProtein() forces the selection of the PROTEIN-TERM alphabet from
> AlphabetManager.xml, through the call to 'getTAlphabet()' as copied below:
>
> public static SymbolList createProtein(String theProtein)
> throws IllegalSymbolException
> {
> SymbolTokenization p = null;
> try {
> p = getTAlphabet().getTokenization("token");
> } catch (BioException e) {
> throw new BioError("Something has gone badly wrong with Protein", e);
> }
> return new SimpleSymbolList(p, theProtein);
> }
>
> This selection should rather be made based on the symbol content of the input
> sequence(s), rather than being hardcoded. Only if the input data contains the
> symbol 'TER' (terminus) or some abiguity symbol that covers the PROTEIN-TERM
> alphabet, should the PROTEIN-TERM alphabet be selected. Otherwise the simpler
> PROTEIN alphabet should be selected.
>
> On a related note, the PROTEIN alphabet defined in AlphabetManager.xml consists
> of 22 residues and includes the less commonly found 'SEC' (selenocysteine, U)
> and 'PYR' (pyroglutamic acid, O). However, many applications only require the
> common 20-symbol alphabet that excludes the latter two residues. It would be
> useful to include a new alphabet in AlphabetManager.xml that defines the
> simpler 20-symbol set of common amino acids. Perhaps this point should be a
> feature request?
>
> Cheers,
> Moyez
>
>
> --
> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
More information about the biojava-dev
mailing list