[Bioperl-l] [Bioperl-guts-l] Notification: incoming/1065 (fwd)

Jason Stajich jason@cgt.mc.duke.edu
Thu, 17 Jan 2002 08:38:08 -0500 (EST)


Steve -

Interestingly I have just been working on the phylip module to convert my
clustalw alignments to phylip - I added an ability for noninterleaved
sequences which does work just fine with PHYLIP programs.  I will work
on making the interleaved version actual work correctly.
I also fixed it so that nexus and phylip output do not include the /N-N in
the format line.

If you can manage checking out the code from cvs you can get the changes
otherwise these will be available in a 1.0pre release in the coming weeks.

In the meantime I would also suggest upgrading to the 0.9.3 bioperl
developer release although it does not fix the remote BLAST bug you
reported.

Here is the little perl script I use regularly to convert my formats:
#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
my $format = shift @ARGV || 'clustalw';
my $oformat = shift @ARGV || 'nexus';
my $in = new Bio::AlignIO(
                          -format => $format);
my @params = (-verbose=> 0,
              -format => $oformat);
if( $oformat =~ /phylip/i ) {
    push @params, ('-interleaved' => 0);
}

my $out = new Bio::AlignIO( @params );

while( my $aln = $in->next_aln ) {
    $out->write_aln($aln);
}

-jason

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu

---------- Forwarded message ----------
Date: Thu, 17 Jan 2002 07:48:59 -0500
From: bioperl-bugs@bioperl.org
To: bioperl-guts-l@bioperl.org
Subject: [Bioperl-guts-l] Notification: incoming/1065

JitterBug notification

new message incoming/1065

Message summary for PR#1065
	From: cann0010@tc.umn.edu
	Subject: Incorrect phylip output format
	Date: Thu, 17 Jan 2002 07:48:59 -0500
	0 replies 	0 followups

====> ORIGINAL MESSAGE FOLLOWS <====

>From cann0010@tc.umn.edu Thu Jan 17 07:48:59 2002
Received: from localhost (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id g0HCmxA05024
	for <bioperl-bugs@pw600a.bioperl.org>; Thu, 17 Jan 2002 07:48:59 -0500
Date: Thu, 17 Jan 2002 07:48:59 -0500
Message-Id: <200201171248.g0HCmxA05024@pw600a.bioperl.org>
From: cann0010@tc.umn.edu
To: bioperl-bugs@bioperl.org
Subject: Incorrect phylip output format

Full_Name: Steve Cannon
Module: phylip.pm
Version: 0.9.0
PerlVer: 5.6.0
OS: Mac OS X 10.1.2
Submission from: ecannon.dsl.visi.com (208.42.18.252)


I have noticed three problems in AlignIO alignment format conversions.

First, phylip.pm is placing three line returns between sequence blocks.
Felsenstein's programs in the Phylip suite can't deal with this -- they require
two returns between blocks (that is, one blank line rather than two; illustrated
below).

Second (just an annoyance), when converting from, say, fasta to phylip format,
any dashes in  the fasta-format alignment generate STDIO warnings -- one warning
per sequence (annoying, since any decent alignment will have gaps, usually
indicated by dashes). Typical warning:

-------------------- WARNING ---------------------
MSG: In sequence MtTC36450 residue count gives value 64.
Overriding value [65] with value 64 for Bio::LocatableSeq::end().
---------------------------------------------------

Third (just an annoyance), some garbage is inserted into the phylip-formatted
sequence names, in the form of truncated "start-end position" numbers. For
example, if the  original sequence name has the 7 characters 'ABCDEFG', three
characters indicating the  start position of the sequence will be added to the
name, bringing the name to the allowed 10-character phylip name length:
'ABCDEFG/1-'. This added information is never useful in the 10-character names,
and will usually have to be subsequently stripped out.

Example output from alignIO.pm / phylip.pm :

 8 65
H122_HMM/1     tyvklatlav fmltqflivq tknveagqcp ragracsqae snacgdieec
MtNP212753     tyvklatlav fmltqflivq tknveegqcp fagrvcsqye snacgdseec
MtTC36450/     tyvklvtlav fmlttflive tmniearlcp tagtacsqrr gnscggie-c
MtTC42530/     tyvklatlav fmlttflivq tknveagecp svgrgctqll lnpcgnilec
MtTC29400/     tyvklatlav fmltqflivq iknveagqca rvgmrcsral pnpcgdivtc
MtTC30424/     ---------- ---------- ---iearecp sfgtvcsilr snscgniiey
MtTC41409/     tyvklailav lhltiflifq tknveaascp nvgavcspfe tkpcgnvkdc
MtTC28522/     tyvklitlal flvttllmfq tknveaefcs svgsfcspfn tnpcgylgnc


               icvsegshyd ggick
               icvsewshyd ggick
               icvsqgypyd ggick
               icvsrwi-yg gsicq
               rcvh--lhlv gstc-
               iciphwih-- ggick
               rclpwglff- -gtc-
               rcvpy--yly ggtce


... would be better (notice line returns) as:

 8 65
H122_HMM       tyvklatlav fmltqflivq tknveagqcp ragracsqae snacgdieec
MtNP212753     tyvklatlav fmltqflivq tknveegqcp fagrvcsqye snacgdseec
MtTC36450      tyvklvtlav fmlttflive tmniearlcp tagtacsqrr gnscggie-c
MtTC42530      tyvklatlav fmlttflivq tknveagecp svgrgctqll lnpcgnilec
MtTC29400      tyvklatlav fmltqflivq iknveagqca rvgmrcsral pnpcgdivtc
MtTC30424      ---------- ---------- ---iearecp sfgtvcsilr snscgniiey
MtTC41409      tyvklailav lhltiflifq tknveaascp nvgavcspfe tkpcgnvkdc
MtTC28522      tyvklitlal flvttllmfq tknveaefcs svgsfcspfn tnpcgylgnc

               icvsegshyd ggick
               icvsewshyd ggick
               icvsqgypyd ggick
               icvsrwi-yg gsicq
               rcvh--lhlv gstc-
               iciphwih-- ggick
               rclpwglff- -gtc-
               rcvpy--yly ggtce



_______________________________________________
Bioperl-guts-l mailing list
Bioperl-guts-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-guts-l