[Bioperl-l] How to change a fasta format alignment into clustalw format?
Fields, Christopher J
cjfields at illinois.edu
Wed Sep 12 13:37:46 UTC 2012
The below worked fine for me using the latest bioperl-live. Are you using an older version?
chris
[cjfields at pyrimidine-laptop clustalw]$ cat convert.pl
#!/usr/bin/env perl
use Modern::Perl;
use Bio::AlignIO;
my $in = Bio::AlignIO->new(-file => shift,
-format => 'fasta');
my $out = Bio::AlignIO->new(-format => 'clustalw');
while (my $aln = $in->next_aln) {
$out->write_aln($aln);
}
[cjfields at pyrimidine-laptop clustalw]$ cat test.fa
>SPOG_04578#scry
MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT-
>SOCG_01498#soct
----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE
>SPAC1002.07c#spom
-----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID
>SJAG_03288#sjap
--MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE
[cjfields at pyrimidine-laptop clustalw]$ perl convert.pl test.fa
CLUSTAL W (1.81) multiple sequence alignment
SPOG_04578#scry/1-54 MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT-
SOCG_01498#soct/1-51 ----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE
SPAC1002.07c#spom/1-50 -----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID
SJAG_03288#sjap/1-53 --MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE
:. :* : .:* ..* **** ***:::. :***** .* .***
On Sep 12, 2012, at 7:28 AM, Tao Zhu <taozhu at mail.bnu.edu.cn> wrote:
> Hello, everyone
>
> I have an multiple protein sequence alignment in FASTA format:
>
>> SPOG_04578#scry
> MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT-
>> SOCG_01498#soct
> ----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE
>> SPAC1002.07c#spom
> -----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID
>> SJAG_03288#sjap
> --MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE
>
> I want to change it to CLUSTALW format. It could have been easy:
>
> my $in = shift;
> my $out = shift;
> my $alignio = Bio::AlignIO->new(-file=>$in, -format=>'fasta');
> my $writeio = Bio::AlignIO->new(-file=>">$out", -format=>'clustalw');
> while ( my $align_obj = $alignio->next_aln ) {
> $writeio->write_aln($align_obj);
> }
>
> That'OK. However it doesn't work, because it says "seq doesn't validate".
>
> In fact there has letter "2" in the alignment. Such "2" is intentionally
> marked by myself, meaning a phase-2 intron exists here. I hope to keep
> these markers in the output clustalw format. Is there any methods?
>
> --
> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
> 100875, China
> Email: tzhu at mail.bnu.edu.cn
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list