[Bioperl-l] Re: Question: single out fasta format seq from .aln(Clustalw) file

Jason Stajich jason@cgt.mc.duke.edu
Sun, 5 Jan 2003 08:58:50 -0500 (EST)


With or without the gap characters? ('-') One file per sequence?

Use Bio::AlignIO to read in the file(s) and the each_seq() method in
Bio::SimpleAlign to iterate through the sequences in the alignment and
Bio::SeqIO for outputting the sequences.  If you wanted to be rid of the
gap characters you would just need a simple s/\-//g regexp applied to the
sequence string (see the seq() method in Bio::PrimarySeq).

It is even easier if you want to just convert your clustalw alignment to a
fasta file and keep the gaps.  Use the Bio::AlignIO factory and the
'fasta' output format as the driver to write the alignment back out after
you have read it in in clustalw format.

-jason

On Sat, 4 Jan 2003, Jinhua Wang wrote:

> Hi, Jason,
>
> I have hundreds of sequence alignment files in Clustalw output format,
> .aln,
> I am wondering if there is any easy way to recover those sequences in
> the
> alignment in separate fasta format sequence file?
>
> Thanks,
> Jinhua
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu