[Bioperl-l] Simple question on reading a .msf file
Brian Osborne
brian_osborne at cognia.com
Sun Dec 5 14:12:52 EST 2004
Terry,
Try map_chars() rather than gap_chars, see if that works. It's described in
the Bio::AlignIO::msf POD.
Brian O.
-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Terry Jones
Sent: Sunday, December 05, 2004 1:31 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] Simple question on reading a .msf file
Hi. I'm new to bioperl (though not to perl itself). I have what I
think should be a simple task, and yet after reading the FAQ and
various POD and web pages, I can't make it work. I'm probably doing
something fundamentally wrong.
I've been sent a file with a .msf suffix. The start of the file is at
the bottom of this email. A bit of googling leads me to think I have a
multiple sequence alignment file of type msf, as produced by a tool
like clustalw.
I want to extract the sequences, do various things to them, write them
out as fasta, etc. After some reading and experimenting, it seems that
to read the file I should do
my $in = Bio::AlignIO->new(-file => $file, -format => 'msf');
This works, but it seems there is a problem because a subsequent call
to $in->next_aln() results in an exception:
-------------------- WARNING ---------------------
MSG: seq doesn't validate, mismatch is 1
---------------------------------------------------
------------- EXCEPTION -------------
MSG: Attempting to set the sequence to [TTCC
(some lines of DNA bases deleted here)
ACGGTTGG~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~] which
does not look healthy
STACK Bio::PrimarySeq::seq /Library/Perl/5.8.1/Bio/PrimarySeq.pm:264
STACK Bio::PrimarySeq::new /Library/Perl/5.8.1/Bio/PrimarySeq.pm:214
STACK Bio::LocatableSeq::new /Library/Perl/5.8.1/Bio/LocatableSeq.pm:100
STACK Bio::AlignIO::msf::next_aln
/Library/Perl/5.8.1/Bio/AlignIO/msf.pm:131
STACK toplevel ./simple.pl:9
So it seems to me that I'm going about this the wrong way. Other .msf
files I've seen, from other sources, also contain ~ characters, and I
can't read these as above either.
Is there something obviously wrong in what I'm doing?
One thing that makes me feel I'm partly on the right track is that if
I simply remove all the ~ characters from the .msf file, then next_aln
works and I can call $s = $aln->each_seq() on that and then $s->seq()
on that, and get out sequences. But this is surely wrong, as deleting
all the ~ characters changes the sequences.
I've tried
$in->gap_char('~');
but this fails, telling me
Can't locate object method "gap_char" via package "Bio::AlignIO::msf" at
./simple.pl line 11.
gap_char is defined in Bio::SimpleAlign, but when I try using
Bio::SimpleAlign instead of Bio::AlignIO, the method next_aln() is no
longer available. So that's not right either.
Can someone tell me what I'm doing wrong here?
Regards,
Terry.
--------------------------------------------------------------------
The start of my .msf file:
!!NA_MULTIPLE_ALIGNMENT 1.0
PileUp of: *.ha1
Symbol comparison table: GenRunData:pileupdna.cmp CompCheck: 3341
GapWeight: 5
GapLengthWeight: 1
file.msf MSF: 1106 Type: N November 12, 2003 11:38 Check: 9322 ..
Name: jjtll Len: 1032 Check: 2l50 Weight: 1.00
Name: ltkaa Len: 1032 Check: 1129 Weight: 1.00
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list