[Bioperl-l] How to change a fasta format alignment into clustalw format?

Tao Zhu taozhu at mail.bnu.edu.cn
Wed Sep 12 12:28:31 UTC 2012


Hello, everyone

I have an multiple protein sequence alignment in FASTA format:

>SPOG_04578#scry
MESRMTNSVRIRSITKKDVSVVFQFI2IELADFEDARDQVEATEESLLHAFGFT-
>SOCG_01498#soct
----MTNSVRVRPITNKDISTVIQFI2IELADFEEARDQVEATEESLLNVFGFNE
>SPAC1002.07c#spom
-----MGSVRIRSVIKEDLPTVYQFI2KELAEFEKCEDQVEATIPNLEVAFGFID
>SJAG_03288#sjap
--MTNKTTAVVRRLKREDCPVVLQFI2KELAEYQKEPQQVEATVEKLEKAFGFVE

I want to change it to CLUSTALW format. It could have been easy:

my $in  = shift;
my $out = shift;
my $alignio = Bio::AlignIO->new(-file=>$in, -format=>'fasta');
my $writeio = Bio::AlignIO->new(-file=>">$out", -format=>'clustalw');
while ( my $align_obj = $alignio->next_aln ) {
    $writeio->write_aln($align_obj);
}

That'OK. However it doesn't work, because it says "seq doesn't validate".

In fact there has letter "2" in the alignment. Such "2" is intentionally
marked by myself, meaning a phase-2 intron exists here. I hope to keep
these markers in the output clustalw format. Is there any methods?

-- 
Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
100875, China
Email: tzhu at mail.bnu.edu.cn




More information about the Bioperl-l mailing list