[Bioperl-l] RNA fold

Sat Jul 31 11:42:15 EDT 2004

On Jul 30, 2004, at 3:48 PM, Michael Janis wrote:

> Hello,
>
> First time posting...
>
> I'd like to re-open a discussion thread that was started Fri Dec 5, 
> 2003 by Vesselin Baev concerning the existence / need for theoretical 
> RNA fold output parsers (such as RNAfold, mfold, and the like).  Chris 
> Fields posted intentions to work on such a project, and the thread 
> morfed into considerations of ct output (which I currently use in my 
> own database for structural information) vs. bracket notation  output 
> vs. RNAML for storage and interpretation of structural data.
>
> The licensing for mfold is restrictive in that it cannot be 
> re-distributed freely.

Somewhat true.  Basically, you need to agree to a license when using 
the software (like most software, even freeware).  The main difference 
is that the license needs to be signed by the end-user (usually the PI 
or the institution).  One could always use the web interface for most 
analyses, but for the (relatively few) who want to modify some of the 
parameters, the licensed program is available.  You can actually 
download it from the web now (the link is found here: 
http://www.bioinfo.rpi.edu/~zukerm/rna/mfold-3.1.html). The other 
alternative is using the Vienna Package, which comes with a perl 
interface.

> However, the extensive mfold sub-optimal fold lists are an important 
> consideration when probing hypothetical folds (especially since it's 
> really guesswork to assign parameters such as temperature and ion 
> content).

I disagree.  The mfold parameters are based on real-world 
experimentation to determine conditions for folding based on different 
temperatures and ionic conditions.  Biochemically and biologically 
speaking, the temperature and ionic range for a particular fold can be 
extrapolated from other studies (such as optimum growth temp, in vivo 
ionic conditions, etc) to determine approximate folds (key word being 
approximate as mfold doesn't predict pseudoknots or tertiary 
interactions).  For instance, E. coli grows best at 37 deg. C, and the 
detailed biochemical makeup of the cell has been determined (including 
ionic concentrations in vivo).  If you were doing something like RNA 
interference, then learning these conditions is very important.  In 
essence, there's no "guesswork" involved; just a bit of research.

>  mfold gives .ct output like other programs, which can be easily 
> converted on the fly to any bracket notation you like (I personally 
> store covariance information in my extended bracket notation using 
> lots of canonical and non-canonical specific characters).  However, 
> bracket notation, as pointed out, is great for inline GFF db tables 
> (such as the '$feat->add_tag_value('secondary_structure',$str);' 
> suggestion from Jason Stajich) but really does not carry forward all 
> covariance information.  .ct output is just the opposite in terms of 
> GFF db format - not exactly inline, but a wealth of structural and 
> primary sequence information is retained in this format.

The problem with RNA notations right now is the use of different 
formats in notation.  It would be great to have a standard notation for 
all of these, which is what RNAML is about.

> So the question is, what work has been done in this area?  My 
> knowledge expertise breaks down when I try to incorporate my .ct db 
> tables with my GFF - built dbase.  In other words, I lose the ability 
> to utilize bioperl tools to query and analyze this data since I have 
> deviated so far from the standard Bio::DB::GFF dbase format.

I think Jason (or somebody else?) had mentioned that one could store a 
tag for a file location containing the information.  The file could 
then be opened and parsed.

> I'd like to work to create a parser for .ct output that fits well into 
> a bp scheme (like Seq::Meta etc.), and while RNAML seems overly 
> complicated for my needs, it would be nice to have a common data 
> definition that supercedes all others in information content, thus 
> allowing a SeqIO like converter to load / dump data from such a master 
> data definition (with warnings where appropriate).  Before I begin, 
> however, I would like to know if any further work has been done in 
> this area.

I haven't worked on it in a while b/c of benchwork taking first 
priority.  However, I plan on returning to it at some point, starting 
with a RNAmotif parser.

You might want to check bioperl-run.  I believe there are some modules 
for the Vienna programs (RNAFold, etc.) and the Pise mfold interface by 
Catherine Ledontal.  As mentioned above, the Vienna package also has a 
perl interface (though not affiliated with Bioperl).

> Likewise feedback from others much better at bioperl than myself: 
> suggestions for storing such lengthy .ct definitions within the GFF 
> framework, where each potential fold ma!
> y have suboptimal folds grouped together, each with their own .ct data.

I would say store tags in the GFF framework for the file location 
containing structural information to get around storing this very 
complex data.  I can't see GFF storing very complex information in the 
current form w/o making the format much more (unnecessarily) 
complicated.

> Apologies for the train of thought style of email.
>
> Yours,
>
> Michael Janis
> -- 
>
>
> Michael Janis, UCLA Biochemistry Graduate Student
> Every message PGP signed.
>
> "The major difference between a thing that might go wrong and a thing 
> that cannot possibly go wrong is that when a thing that cannot 
> possibly go wrong goes wrong, it usually turns out to be impossible to 
> get at or repair."
> -Douglas Adams
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
Chris Fields
Postdoctoral Reseacher - Dept. of Biochemistry
Laboratory of Dr. Robert Switzer
University of Illinois at Urbana-Champaign