[Biopython-dev] RNA Alphabet: request for comments

Wed Jun 16 09:03:37 UTC 2010

Hi Peter,

> Why do you need the  _set_sequence method? Why not just put that
> small piece of code inside the __init__ method?

In _set_sequence there'll be a small parser taking care of modifications
where the one-letter abbreviations do not suffice. E.g. a sequence could
be

"CCC022UCCC"

(22U is a 5-hydroxyuridine).

--> being parsed into a list of RNAAlphabetEntries
['C','C','C','22U','C','C','C']

So the code will grow a little, but the basic idea stays the same.

If someone wants a one-letter representation, it could be "CCCxCCC", but
this is degenerate because 'x' is used for several modifications.

Best Regards,
   Kristian

>>> Why not create a Seq subclass instead of your class
>>> ModifiedRNAString(str)?
>>
>> This turned out to be a lot simpler. Worked right away. New commit at:
>>
>> http://github.com/krother/biopython/commit/b0a6071f2b08a4f9bfee33a8d675c0e21b60ba70
>>
>> more comments welcome.
>
> Why do you need the  _set_sequence method? Why not just put that
> small piece of code inside the __init__ method?
>
>> Next steps from my side would be:
>>
>> 1) add all modifications to the Alphabet.
>> 2) add some RNA-specific methods.
>> 3) add more tests.
>> 4) sync with latest master branch.
>> 5) request code merge.
>>
>> Best regards,
>>     Kristian
>
> If this works out we should look at doing a Protein 3-letter code version
> for use with PDB sequences (I'm thinking about the modified amino acids).
>
> Peter
>
>