[Bioperl-l] GCG MSF format alignments

Peter Schattner schattner@alum.mit.edu
Fri, 09 Mar 2001 14:14:19 -0800


David

First of all, thanks for your report.  It's only by getting user
feedback  that we can get out all the bugs!

"David J. Evans" wrote:
> My recent attempts to use AlignIO (in v.0.7.0) have choked on GCG format MSF
> files - these have the  header info, a // and the aligned sequences, but
> also have a number line.
> I've looked as msf.pm in /AlignIO and it looks as though all starts going
> wrong about line 102 :
> if( ! exists $hash{$name} ) {
> which throws an exception ... changing 'throw' to 'warn' and adding a 'next'
> next line gets round this, and it creates the SimpleAlign object correctly

I'm not an expert in MSF format and just set up the parsing to match the
test files we had.  As far as I can tell your fix should work fine and
I'll make the change .   

> Regarding the ~ characters ... my primitive understanding suggests this
> throws an exception in Bio::Primaryseq::seq with a message that '[some
> sequence ~~~] does not look healthy'.

I've had this problem too in other contexts.  Bio::Primaryseq is
currently rather strict about what it allows in asequence.  I would
prefer to see it issue a warning rather than die when it comes across a
bad character (what do you think about that Ewan? Hilmar? Jason?) 
Alternatively we at least would need  to change the place in the
SimpleAlign documentation which says it is possible to read in such
unusual characters and change them later with a map_chars.

> 
> Finally, I get an error when I try and generate MSF format output from my
> SimpleAlign object, which looks like this :
> Can't locate object method "GCG_checksum" via package "Bio::LocatableSeq" at
> D:/Computing/DJEcustomLib/bioperl-0.07.0/Bio/SimpleAlign.pm line 1594,
> <GEN0> line 119.

I suspect you used the "deprecated" write_msf routine from
SimpleAlign.pm.  This routine has a bug which I will fix (as well as
marking the routine as deprecated).  In the meantime, if you write your
file with the preferred AlignIO syntax, everything should be fine, eg:
$strout = Bio::AlignIO->new(-file=>
">".Bio::Root::IO->catfile("t","testout.msf"), '-format' => 'msf');
$status = $strout->write_aln($aln);
(For more detail check out the description of AlignIO syntax in
bptutorial.pl - a shameless plug ;-)

Cheers

Peter