Bioperl: MSF file manipulation

Mon, 6 Mar 2000 10:06:50 -0000

I'm writing some scripts for analysis of covariance in aligned RNA sequences
and have struggled with two aspects of handling GCG MSF format files.  Am I
missing something obvious ?

1. GCG v.10 MSF format files appear to be incompatible with the read_MSF
method because the standard format contains sequence numbering at the
beginning and end of every aligned block.  I got round this by hacking
SimpleAlign.pm to ignore lines that started spaces and digits, but realise
this will not work for many alignments.

2. Many alignments are DNA, rather than RNA (some are mixed !).  Although I
could either change my matching criteria (e.g. expand the allowed matches so
that GT eq GU), it would be better to edit these 'in-place'.  By extracting
each sequence object in turn, deleting it, converting it to an RNA string
(with $seq->Dna_to_Rna) and then recreating a new sequence object and
re-inserting it into a new alignment I can do this, but it's a bit of a
mess.

Since there's more than one way to do it, I'm assuming I've chosen the most
long-winded and circuitous route and that there's a bioperler out there
who's solved these trivia before - any offers ?  For the record I'm using
bioperl 05.1 on an NT box with Activestate build 522 of Perl (5.005_03).

Thanks
David

-- --
David J. Evans          |   mailto:David.Evans@vir.gla.ac.uk
Institute of Virology   |    http://www.polio.vir.gla.ac.uk/
University of Glasgow   |        Tel/Fax +44 (0)141 330 6249
Church Street           |
Glasgow                 |                Mobile 07940 592768
G11 5JR                 |

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================