[Bioperl-l] [Bioperl-guts-l] Notification: incoming/1065 (fwd)
Jason Stajich
jason@cgt.mc.duke.edu
Thu, 17 Jan 2002 08:38:08 -0500 (EST)
Steve -
Interestingly I have just been working on the phylip module to convert my
clustalw alignments to phylip - I added an ability for noninterleaved
sequences which does work just fine with PHYLIP programs. I will work
on making the interleaved version actual work correctly.
I also fixed it so that nexus and phylip output do not include the /N-N in
the format line.
If you can manage checking out the code from cvs you can get the changes
otherwise these will be available in a 1.0pre release in the coming weeks.
In the meantime I would also suggest upgrading to the 0.9.3 bioperl
developer release although it does not fix the remote BLAST bug you
reported.
Here is the little perl script I use regularly to convert my formats:
#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
my $format = shift @ARGV || 'clustalw';
my $oformat = shift @ARGV || 'nexus';
my $in = new Bio::AlignIO(
-format => $format);
my @params = (-verbose=> 0,
-format => $oformat);
if( $oformat =~ /phylip/i ) {
push @params, ('-interleaved' => 0);
}
my $out = new Bio::AlignIO( @params );
while( my $aln = $in->next_aln ) {
$out->write_aln($aln);
}
-jason
--
Jason Stajich
Duke University
jason@cgt.mc.duke.edu
---------- Forwarded message ----------
Date: Thu, 17 Jan 2002 07:48:59 -0500
From: bioperl-bugs@bioperl.org
To: bioperl-guts-l@bioperl.org
Subject: [Bioperl-guts-l] Notification: incoming/1065
JitterBug notification
new message incoming/1065
Message summary for PR#1065
From: cann0010@tc.umn.edu
Subject: Incorrect phylip output format
Date: Thu, 17 Jan 2002 07:48:59 -0500
0 replies 0 followups
====> ORIGINAL MESSAGE FOLLOWS <====
>From cann0010@tc.umn.edu Thu Jan 17 07:48:59 2002
Received: from localhost (localhost [127.0.0.1])
by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id g0HCmxA05024
for <bioperl-bugs@pw600a.bioperl.org>; Thu, 17 Jan 2002 07:48:59 -0500
Date: Thu, 17 Jan 2002 07:48:59 -0500
Message-Id: <200201171248.g0HCmxA05024@pw600a.bioperl.org>
From: cann0010@tc.umn.edu
To: bioperl-bugs@bioperl.org
Subject: Incorrect phylip output format
Full_Name: Steve Cannon
Module: phylip.pm
Version: 0.9.0
PerlVer: 5.6.0
OS: Mac OS X 10.1.2
Submission from: ecannon.dsl.visi.com (208.42.18.252)
I have noticed three problems in AlignIO alignment format conversions.
First, phylip.pm is placing three line returns between sequence blocks.
Felsenstein's programs in the Phylip suite can't deal with this -- they require
two returns between blocks (that is, one blank line rather than two; illustrated
below).
Second (just an annoyance), when converting from, say, fasta to phylip format,
any dashes in the fasta-format alignment generate STDIO warnings -- one warning
per sequence (annoying, since any decent alignment will have gaps, usually
indicated by dashes). Typical warning:
-------------------- WARNING ---------------------
MSG: In sequence MtTC36450 residue count gives value 64.
Overriding value [65] with value 64 for Bio::LocatableSeq::end().
---------------------------------------------------
Third (just an annoyance), some garbage is inserted into the phylip-formatted
sequence names, in the form of truncated "start-end position" numbers. For
example, if the original sequence name has the 7 characters 'ABCDEFG', three
characters indicating the start position of the sequence will be added to the
name, bringing the name to the allowed 10-character phylip name length:
'ABCDEFG/1-'. This added information is never useful in the 10-character names,
and will usually have to be subsequently stripped out.
Example output from alignIO.pm / phylip.pm :
8 65
H122_HMM/1 tyvklatlav fmltqflivq tknveagqcp ragracsqae snacgdieec
MtNP212753 tyvklatlav fmltqflivq tknveegqcp fagrvcsqye snacgdseec
MtTC36450/ tyvklvtlav fmlttflive tmniearlcp tagtacsqrr gnscggie-c
MtTC42530/ tyvklatlav fmlttflivq tknveagecp svgrgctqll lnpcgnilec
MtTC29400/ tyvklatlav fmltqflivq iknveagqca rvgmrcsral pnpcgdivtc
MtTC30424/ ---------- ---------- ---iearecp sfgtvcsilr snscgniiey
MtTC41409/ tyvklailav lhltiflifq tknveaascp nvgavcspfe tkpcgnvkdc
MtTC28522/ tyvklitlal flvttllmfq tknveaefcs svgsfcspfn tnpcgylgnc
icvsegshyd ggick
icvsewshyd ggick
icvsqgypyd ggick
icvsrwi-yg gsicq
rcvh--lhlv gstc-
iciphwih-- ggick
rclpwglff- -gtc-
rcvpy--yly ggtce
... would be better (notice line returns) as:
8 65
H122_HMM tyvklatlav fmltqflivq tknveagqcp ragracsqae snacgdieec
MtNP212753 tyvklatlav fmltqflivq tknveegqcp fagrvcsqye snacgdseec
MtTC36450 tyvklvtlav fmlttflive tmniearlcp tagtacsqrr gnscggie-c
MtTC42530 tyvklatlav fmlttflivq tknveagecp svgrgctqll lnpcgnilec
MtTC29400 tyvklatlav fmltqflivq iknveagqca rvgmrcsral pnpcgdivtc
MtTC30424 ---------- ---------- ---iearecp sfgtvcsilr snscgniiey
MtTC41409 tyvklailav lhltiflifq tknveaascp nvgavcspfe tkpcgnvkdc
MtTC28522 tyvklitlal flvttllmfq tknveaefcs svgsfcspfn tnpcgylgnc
icvsegshyd ggick
icvsewshyd ggick
icvsqgypyd ggick
icvsrwi-yg gsicq
rcvh--lhlv gstc-
iciphwih-- ggick
rclpwglff- -gtc-
rcvpy--yly ggtce
_______________________________________________
Bioperl-guts-l mailing list
Bioperl-guts-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-guts-l