[Bioperl-l] Re: Question: single out fasta format seq from .aln(Clustalw) file
Jason Stajich
jason@cgt.mc.duke.edu
Sun, 5 Jan 2003 08:58:50 -0500 (EST)
With or without the gap characters? ('-') One file per sequence?
Use Bio::AlignIO to read in the file(s) and the each_seq() method in
Bio::SimpleAlign to iterate through the sequences in the alignment and
Bio::SeqIO for outputting the sequences. If you wanted to be rid of the
gap characters you would just need a simple s/\-//g regexp applied to the
sequence string (see the seq() method in Bio::PrimarySeq).
It is even easier if you want to just convert your clustalw alignment to a
fasta file and keep the gaps. Use the Bio::AlignIO factory and the
'fasta' output format as the driver to write the alignment back out after
you have read it in in clustalw format.
-jason
On Sat, 4 Jan 2003, Jinhua Wang wrote:
> Hi, Jason,
>
> I have hundreds of sequence alignment files in Clustalw output format,
> .aln,
> I am wondering if there is any easy way to recover those sequences in
> the
> alignment in separate fasta format sequence file?
>
> Thanks,
> Jinhua
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu