[Bioperl-l] RNA fold

Tue Dec 9 12:12:26 EST 2003

On Tue, 9 Dec 2003, Chris Fields wrote:

> I think that you can use parenthetical formats for pseudoknot-like
> structures (improperly nested Watson-Crick helices).  The idea is that
> () would represent secondary structure, and other brackets {}[] would
> represent higher-order structures, like so:
>
>      Helix         Pseudoknot
> ______________       _______________
> |            |       |             |
> (((((....))))).[[[...((((..]]]..))))
>                 |___________|
>

Ah, if you're thinking of a generic format then different brackets are
going to get you into trouble - Sean Eddy's INFERNAL suite (think
HMMer for RNAs) already uses different brackets to markup different
layers of nesting, like:

..[[[[..<<<<<..>>>>>....<<<..>>>..]]]]..

There is an informal standard to incorporate pseudoknot info into the
bracket notation using letters for non-nested base pairs:

<<<<<<<.<<<...AAAA..>>>>>>>>>>..aaaa......

The upper case stuff base pairs with the lower case stuff.  This seems
like a really bad idea, but given that you're parsing the vast
majority of the structure with brackets, and the most complicated
known nested pseudoknot (in the alpha operon leader) only involves
letters A, B and C, its not so bad.  Also this provides a natural
separation for the algorithms that can only deal with nested
interactions (SCFGs and the like) from those that can use everything.
For what its worth this is how we markup such non-nested things in the
Rfam database.

> Of course this is where the problem lies, b/c all structures in this
> format are constrained to simple 1:1 base associations, such as simple
> Watson-Crick base pairs or noncanonical base pairs (A-G, G-U, etc).
> Some higher order structures, like triple-helices (A:U:U) and quaternary
> helices (G:G:G:G) can't be accounted for.  Also, the parenthetical
> syntax gets a bit confusing for very large sequences (16s rRNA, for
> instance).
>

Yep - tough in a single line.  We've also been thinking about how to
mark these up in alignments of RNAs in Rfam, but without decision.
You might think of things which aren't 1:1 as tertiary interactions
and therefore seperable from the secondary structure which the bracket
notation is designed to cope with.

> I think that the format all really depends on the program and the
> particular use.
<snip>
> After all this babbling, I do think that RNAML is the way to go with
> this.

These two seem contradictory to me :)

I don't kow much about RNAML but I get the impression its trying to
solve all RNA sequence/markup/annotation issues in one go.  Depending
on your point of view this is either a great idea or very bad.  I
haven't decided yet :)

Sam

--------------------------------------------------------------------
Sam Griffiths-Jones                              sgj at sanger.ac.uk
http://www.sanger.ac.uk/Users/sgj                +44 (0)1223 834244

Wisdom #8002: Always try to do things in chronological order;
it's less confusing that way.
--------------------------------------------------------------------