[Bioperl-l] remove gapped colums
Iain Wallace
iain.wallace at ucd.ie
Wed May 12 06:20:22 EDT 2004
Hi,
I use the following function called filtering() to remove columns from
an alignment which are all gap characters.
It probably isn't the best way to do it but it works for me. It takes in
a file which contains the original alignment, and outputs the file which
has all of the columns removed. e.g.
filtering('profile1.aln', '>out.aln');
Hope it makes sense
Iain
sub filtering(){
#function to eliminate columns that just contain gaps.....
#pass the filename containing the profile to be cleaned,and the output
#filename.
#filtering('profile1.aln', '>out.aln');
use Bio::AlignIO;
$withgaps=$_[0];
$withoutgaps=$_[1];
my $in = new Bio::AlignIO ( -file => $withgaps, -format => 'clustalw' );
my $out = new Bio::AlignIO ( -file => $withoutgaps, -format
=>'clustalw');
my $aln = $in->next_aln();
# if you need to work on the columns, create a list containing all
#columns as
# strings
my @aln_cols = ();
foreach my $seq ( $aln->each_alphabetically() ) {
my $colnr = 0;
foreach my $chr ( split("", $seq->seq()) ) {
$aln_cols[$colnr] .= $chr;
$colnr++;
}
}
# then do the work:
# we want to eliminate all the columns containing gaps
# 1/ we create a list containing all the columns without any gap
my $gapchar = $aln->gap_char();
my @no_gap_cols = ();
foreach my $col ( @aln_cols ) {
my $columnlength = length($col);
next if $col =~ /\Q$gapchar\E{$columnlength}/;
push @no_gap_cols, $col;
}
# 2/ we modify the sequences in the alignment
# 2a/ we reconstruct the sequence strings
my @seq_strs = ();
foreach my $col ( @no_gap_cols ) {
my $colnr = 0;
foreach my $chr ( split"", $col ) {
$seq_strs[$colnr] .= $chr;
$colnr++; }}
# 2b/ we replace the old sequences strings with the new ones
foreach my $seq ( $aln->each_alphabetically() ) {
$seq->seq(shift @seq_strs);}
$out->write_aln($aln);
}
On Tue, 2004-05-11 at 16:04, luisa pugliese wrote:
> Hi,
> I am new at bioperl and I am working on alignments. I am splitting a multiple alignment into several pairwise alignments and I would like to remove the columns in which there are gaps in all the sequences. Does anybody knows if a method exists doing this or a method finding the columns in which there are gaps in all the sequences similar gap_line that find the gaps present in any columns?
> Thank you
> Luisa Pugliese
> =============================
> Luisa Pugliese, Ph.D.
> luisa.pugliese at safan-bioinformatics.it
> S.A.F.AN. BIOINFORMATICS
> Corso Tazzoli 215/13 -10137 Torino
> tel +39 011 3026230
> cell. +39 333 6130644
>
> ______________________________________________________________________
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list