[Bioperl-l] RNA folding

Tue Feb 6 16:45:36 UTC 2007

On Feb 6, 2007, at 5:52 AM, Caroline Johnston wrote:

> Hello,
>
> I've just joined the list - I'm a Bioinformatics PhD student at Essex
> University doing transcriptomics-related things. Mainly microarray
> analysis and more recently looking at RNA structure prediction.
>
> I was thinking about having a go at writing a bioperl-run wrapper  
> around
> some of the structure prediction stuff, but according to the wiki  
> this is
> being done already (at least for the Vienna tools). I spoke to Albert
> Vilella at the EBI the other day and he said Chris Fields was the  
> man to
> speak to. So could he (or anyone) let me know what the current  
> state of
> RNA structure prediction tools in bioperl is?
>
> Cheers,
> Cass xx
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Actually, the only RNA tool wrappers I have made are ones for ERPIN,  
RNAMotif, and Infernal (the only one in bioperl-run CVS at this time  
is RNAMotif).  I am planning on writing up wrappers for Vienna,  
UNAFold, and a few others but haven't really started in.  Here's  
where I'm at right now...

I am writing up a new set of AnnotationI classes which positionally  
describe data (Meta) which I hope will help deal with this stuff.   
These would be similar in nature to Heikki's Bio::Seq::Meta classes:

http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html

I would use a regular Bio::SeqI and store the structural data and  
anything else (such as energy calculations, etc) as Annotation  
objects in an AnnotationCollection, and then write up a series of  
SeqIO modules to get data into/out of the designated structure  
formats, like UNAfold ct, RNAML, and so on.  Each sequence would then  
be capable of holding more than one structural Annotation (i.e. could  
represent different folding pathways, alternative RNA folds, and so on).

At this point I represent the data as an array of hashes where $array 
[0] is nt 1 and the hash keys indicate the type of interaction, base  
interacted with, etc.  The text representation would be as simple  
Eddy WUSS (Rfam-like) format by default, which is capable of  
representing some complex data (pseudoknots, for instance), is  
compact, and is documented (via the Infernal manual).  Tags will  
probably switch to more ontologically relevant terms (probably from  
RNAML or RNA Ontology), but in general it is something like this:

[
  {'interaction' => 'WC',
    'base'  => 24},
  {'interaction' => 'WC',
    'base'  => 23},
  {'interaction' => 'SS'},
...
]

In this implementation every seq position would have some kind of  
interaction designation, though that's open for debate as it could  
just be simple text or undef for single-stranded regions.

This is also scalable based on complexity of the data: if one wanted  
to add tert/quaternary interactions, location, base modifications,  
remote sequence interactions, etc., extra key/value pairs could be  
used.  Comversely, if one only wanted sec structure (for drawing RNA  
structures, for example), then only that data would be parsed.

If you (or anyone listening) have any suggestions I would greatly  
appreciate them.

chris