[Bioperl-l] RNA fold
Chris Fields
cjfields at uiuc.edu
Sat Jul 31 11:42:15 EDT 2004
On Jul 30, 2004, at 3:48 PM, Michael Janis wrote:
> Hello,
>
> First time posting...
>
> I'd like to re-open a discussion thread that was started Fri Dec 5,
> 2003 by Vesselin Baev concerning the existence / need for theoretical
> RNA fold output parsers (such as RNAfold, mfold, and the like). Chris
> Fields posted intentions to work on such a project, and the thread
> morfed into considerations of ct output (which I currently use in my
> own database for structural information) vs. bracket notation output
> vs. RNAML for storage and interpretation of structural data.
>
> The licensing for mfold is restrictive in that it cannot be
> re-distributed freely.
Somewhat true. Basically, you need to agree to a license when using
the software (like most software, even freeware). The main difference
is that the license needs to be signed by the end-user (usually the PI
or the institution). One could always use the web interface for most
analyses, but for the (relatively few) who want to modify some of the
parameters, the licensed program is available. You can actually
download it from the web now (the link is found here:
http://www.bioinfo.rpi.edu/~zukerm/rna/mfold-3.1.html). The other
alternative is using the Vienna Package, which comes with a perl
interface.
> However, the extensive mfold sub-optimal fold lists are an important
> consideration when probing hypothetical folds (especially since it's
> really guesswork to assign parameters such as temperature and ion
> content).
I disagree. The mfold parameters are based on real-world
experimentation to determine conditions for folding based on different
temperatures and ionic conditions. Biochemically and biologically
speaking, the temperature and ionic range for a particular fold can be
extrapolated from other studies (such as optimum growth temp, in vivo
ionic conditions, etc) to determine approximate folds (key word being
approximate as mfold doesn't predict pseudoknots or tertiary
interactions). For instance, E. coli grows best at 37 deg. C, and the
detailed biochemical makeup of the cell has been determined (including
ionic concentrations in vivo). If you were doing something like RNA
interference, then learning these conditions is very important. In
essence, there's no "guesswork" involved; just a bit of research.
> mfold gives .ct output like other programs, which can be easily
> converted on the fly to any bracket notation you like (I personally
> store covariance information in my extended bracket notation using
> lots of canonical and non-canonical specific characters). However,
> bracket notation, as pointed out, is great for inline GFF db tables
> (such as the '$feat->add_tag_value('secondary_structure',$str);'
> suggestion from Jason Stajich) but really does not carry forward all
> covariance information. .ct output is just the opposite in terms of
> GFF db format - not exactly inline, but a wealth of structural and
> primary sequence information is retained in this format.
The problem with RNA notations right now is the use of different
formats in notation. It would be great to have a standard notation for
all of these, which is what RNAML is about.
> So the question is, what work has been done in this area? My
> knowledge expertise breaks down when I try to incorporate my .ct db
> tables with my GFF - built dbase. In other words, I lose the ability
> to utilize bioperl tools to query and analyze this data since I have
> deviated so far from the standard Bio::DB::GFF dbase format.
I think Jason (or somebody else?) had mentioned that one could store a
tag for a file location containing the information. The file could
then be opened and parsed.
> I'd like to work to create a parser for .ct output that fits well into
> a bp scheme (like Seq::Meta etc.), and while RNAML seems overly
> complicated for my needs, it would be nice to have a common data
> definition that supercedes all others in information content, thus
> allowing a SeqIO like converter to load / dump data from such a master
> data definition (with warnings where appropriate). Before I begin,
> however, I would like to know if any further work has been done in
> this area.
I haven't worked on it in a while b/c of benchwork taking first
priority. However, I plan on returning to it at some point, starting
with a RNAmotif parser.
You might want to check bioperl-run. I believe there are some modules
for the Vienna programs (RNAFold, etc.) and the Pise mfold interface by
Catherine Ledontal. As mentioned above, the Vienna package also has a
perl interface (though not affiliated with Bioperl).
> Likewise feedback from others much better at bioperl than myself:
> suggestions for storing such lengthy .ct definitions within the GFF
> framework, where each potential fold ma!
> y have suboptimal folds grouped together, each with their own .ct data.
I would say store tags in the GFF framework for the file location
containing structural information to get around storing this very
complex data. I can't see GFF storing very complex information in the
current form w/o making the format much more (unnecessarily)
complicated.
> Apologies for the train of thought style of email.
>
> Yours,
>
> Michael Janis
> --
>
>
> Michael Janis, UCLA Biochemistry Graduate Student
> Every message PGP signed.
>
> "The major difference between a thing that might go wrong and a thing
> that cannot possibly go wrong is that when a thing that cannot
> possibly go wrong goes wrong, it usually turns out to be impossible to
> get at or repair."
> -Douglas Adams
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
Chris Fields
Postdoctoral Reseacher - Dept. of Biochemistry
Laboratory of Dr. Robert Switzer
University of Illinois at Urbana-Champaign
More information about the Bioperl-l
mailing list