[Bioperl-l] Simple question on reading a .msf file

Brian Osborne brian_osborne at cognia.com
Sun Dec 5 14:12:52 EST 2004


Try map_chars() rather than gap_chars, see if that works. It's described in
the Bio::AlignIO::msf POD.

Brian O.

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Terry Jones
Sent: Sunday, December 05, 2004 1:31 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] Simple question on reading a .msf file

Hi. I'm new to bioperl (though not to perl itself). I have what I
think should be a simple task, and yet after reading the FAQ and
various POD and web pages, I can't make it work. I'm probably doing
something fundamentally wrong.

I've been sent a file with a .msf suffix. The start of the file is at
the bottom of this email. A bit of googling leads me to think I have a
multiple sequence alignment file of type msf, as produced by a tool
like clustalw.

I want to extract the sequences, do various things to them, write them
out as fasta, etc. After some reading and experimenting, it seems that
to read the file I should do

  my $in = Bio::AlignIO->new(-file => $file, -format => 'msf');

This works, but it seems there is a problem because a subsequent call
to $in->next_aln() results in an exception:

	-------------------- WARNING ---------------------
	MSG: seq doesn't validate, mismatch is 1

	------------- EXCEPTION  -------------
	MSG: Attempting to set the sequence to [TTCC
	(some lines of DNA bases deleted here)
	ACGGTTGG~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~] which
does not look healthy
	STACK Bio::PrimarySeq::seq /Library/Perl/5.8.1/Bio/PrimarySeq.pm:264
	STACK Bio::PrimarySeq::new /Library/Perl/5.8.1/Bio/PrimarySeq.pm:214
	STACK Bio::LocatableSeq::new /Library/Perl/5.8.1/Bio/LocatableSeq.pm:100
	STACK Bio::AlignIO::msf::next_aln
	STACK toplevel ./simple.pl:9

So it seems to me that I'm going about this the wrong way. Other .msf
files I've seen, from other sources, also contain ~ characters, and I
can't read these as above either.

Is there something obviously wrong in what I'm doing?

One thing that makes me feel I'm partly on the right track is that if
I simply remove all the ~ characters from the .msf file, then next_aln
works and I can call $s = $aln->each_seq() on that and then $s->seq()
on that, and get out sequences. But this is surely wrong, as deleting
all the ~ characters changes the sequences.

I've tried


but this fails, telling me

  Can't locate object method "gap_char" via package "Bio::AlignIO::msf" at
./simple.pl line 11.

gap_char is defined in Bio::SimpleAlign, but when I try using
Bio::SimpleAlign instead of Bio::AlignIO, the method next_aln() is no
longer available. So that's not right either.

Can someone tell me what I'm doing wrong here?


The start of my .msf file:

PileUp of: *.ha1

 Symbol comparison table: GenRunData:pileupdna.cmp  CompCheck: 3341

                   GapWeight: 5
             GapLengthWeight: 1

 file.msf  MSF: 1106  Type: N  November 12, 2003 11:38  Check: 9322 ..

 Name: jjtll       Len:  1032  Check: 2l50  Weight:  1.00
 Name: ltkaa       Len:  1032  Check: 1129  Weight:  1.00
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org

More information about the Bioperl-l mailing list