[Bioperl-l] RNA folding
Chris Fields
cjfields at uiuc.edu
Tue Feb 6 16:45:36 UTC 2007
On Feb 6, 2007, at 5:52 AM, Caroline Johnston wrote:
> Hello,
>
> I've just joined the list - I'm a Bioinformatics PhD student at Essex
> University doing transcriptomics-related things. Mainly microarray
> analysis and more recently looking at RNA structure prediction.
>
> I was thinking about having a go at writing a bioperl-run wrapper
> around
> some of the structure prediction stuff, but according to the wiki
> this is
> being done already (at least for the Vienna tools). I spoke to Albert
> Vilella at the EBI the other day and he said Chris Fields was the
> man to
> speak to. So could he (or anyone) let me know what the current
> state of
> RNA structure prediction tools in bioperl is?
>
> Cheers,
> Cass xx
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Actually, the only RNA tool wrappers I have made are ones for ERPIN,
RNAMotif, and Infernal (the only one in bioperl-run CVS at this time
is RNAMotif). I am planning on writing up wrappers for Vienna,
UNAFold, and a few others but haven't really started in. Here's
where I'm at right now...
I am writing up a new set of AnnotationI classes which positionally
describe data (Meta) which I hope will help deal with this stuff.
These would be similar in nature to Heikki's Bio::Seq::Meta classes:
http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html
I would use a regular Bio::SeqI and store the structural data and
anything else (such as energy calculations, etc) as Annotation
objects in an AnnotationCollection, and then write up a series of
SeqIO modules to get data into/out of the designated structure
formats, like UNAfold ct, RNAML, and so on. Each sequence would then
be capable of holding more than one structural Annotation (i.e. could
represent different folding pathways, alternative RNA folds, and so on).
At this point I represent the data as an array of hashes where $array
[0] is nt 1 and the hash keys indicate the type of interaction, base
interacted with, etc. The text representation would be as simple
Eddy WUSS (Rfam-like) format by default, which is capable of
representing some complex data (pseudoknots, for instance), is
compact, and is documented (via the Infernal manual). Tags will
probably switch to more ontologically relevant terms (probably from
RNAML or RNA Ontology), but in general it is something like this:
[
{'interaction' => 'WC',
'base' => 24},
{'interaction' => 'WC',
'base' => 23},
{'interaction' => 'SS'},
...
]
In this implementation every seq position would have some kind of
interaction designation, though that's open for debate as it could
just be simple text or undef for single-stranded regions.
This is also scalable based on complexity of the data: if one wanted
to add tert/quaternary interactions, location, base modifications,
remote sequence interactions, etc., extra key/value pairs could be
used. Comversely, if one only wanted sec structure (for drawing RNA
structures, for example), then only that data would be parsed.
If you (or anyone listening) have any suggestions I would greatly
appreciate them.
chris
More information about the Bioperl-l
mailing list