[Bioperl-l] RNA fold
Michael Janis
mjanis at chem.ucla.edu
Fri Jul 30 16:48:02 EDT 2004
Hello,
First time posting...
I'd like to re-open a discussion thread that was started Fri Dec 5, 2003 by Vesselin Baev concerning the existence / need for theoretical RNA fold output parsers (such as RNAfold, mfold, and the like). Chris Fields posted intentions to work on such a project, and the thread morfed into considerations of ct output (which I currently use in my own database for structural information) vs. bracket notation output vs. RNAML for storage and interpretation of structural data.
The licensing for mfold is restrictive in that it cannot be re-distributed freely. However, the extensive mfold sub-optimal fold lists are an important consideration when probing hypothetical folds (especially since it's really guesswork to assign parameters such as temperature and ion content). mfold gives .ct output like other programs, which can be easily converted on the fly to any bracket notation you like (I personally store covariance information in my extended bracket notation using lots of canonical and non-canonical specific characters). However, bracket notation, as pointed out, is great for inline GFF db tables (such as the '$feat->add_tag_value('secondary_structure',$str);' suggestion from Jason Stajich) but really does not carry forward all covariance information. .ct output is just the opposite in terms of GFF db format - not exactly inline, but a wealth of structural and primary sequence information is retained in this format.
So the question is, what work has been done in this area? My knowledge expertise breaks down when I try to incorporate my .ct db tables with my GFF - built dbase. In other words, I lose the ability to utilize bioperl tools to query and analyze this data since I have deviated so far from the standard Bio::DB::GFF dbase format. I'd like to work to create a parser for .ct output that fits well into a bp scheme (like Seq::Meta etc.), and while RNAML seems overly complicated for my needs, it would be nice to have a common data definition that supercedes all others in information content, thus allowing a SeqIO like converter to load / dump data from such a master data definition (with warnings where appropriate). Before I begin, however, I would like to know if any further work has been done in this area. Likewise feedback from others much better at bioperl than myself: suggestions for storing such lengthy .ct definitions within the GFF framework, where each potential fold may have suboptimal folds grouped together, each with their own .ct data.
Apologies for the train of thought style of email.
Yours,
Michael Janis
--
Michael Janis, UCLA Biochemistry Graduate Student
Every message PGP signed.
"The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong, it usually turns out to be impossible to get at or repair."
-Douglas Adams
More information about the Bioperl-l
mailing list