[Bioperl-l] remove gapped colums

Iain Wallace iain.wallace at ucd.ie
Wed May 12 06:20:22 EDT 2004


Hi, 

I use the following function called filtering() to remove columns from
an alignment which are all gap characters.

It probably isn't the best way to do it but it works for me. It takes in
a file which contains the original alignment, and outputs the file which
has all of the columns removed. e.g. 
	filtering('profile1.aln', '>out.aln');

Hope it makes sense

Iain 


sub filtering(){

#function to eliminate columns that just contain gaps.....
#pass the filename containing the profile to be cleaned,and the output
#filename.
#filtering('profile1.aln', '>out.aln');
use Bio::AlignIO;

$withgaps=$_[0];
$withoutgaps=$_[1];

my $in = new Bio::AlignIO ( -file => $withgaps, -format => 'clustalw' );
my $out = new Bio::AlignIO ( -file => $withoutgaps, -format
=>'clustalw');

my $aln = $in->next_aln();

# if you need to work on the columns, create a list containing all
#columns as
# strings

my @aln_cols = ();

foreach my $seq ( $aln->each_alphabetically() ) {
  my $colnr = 0;
  foreach my $chr ( split("", $seq->seq()) ) {
    $aln_cols[$colnr] .= $chr;
    $colnr++;
  }
}


# then do the work:
# we want to eliminate all the columns containing gaps
# 1/ we create a list containing all the columns without any gap

my $gapchar = $aln->gap_char();
my @no_gap_cols = ();
foreach my $col ( @aln_cols ) {
 my $columnlength = length($col);
 next if $col =~ /\Q$gapchar\E{$columnlength}/;
 push @no_gap_cols, $col;
}

# 2/  we modify the sequences in the alignment
# 2a/ we reconstruct the sequence strings

my @seq_strs = ();
foreach my $col ( @no_gap_cols ) {
   my $colnr = 0;
   foreach my $chr ( split"", $col ) {
     $seq_strs[$colnr] .= $chr;
     $colnr++;    }}

# 2b/ we replace the old sequences strings with the new ones
foreach my $seq ( $aln->each_alphabetically() ) {
    $seq->seq(shift @seq_strs);}

$out->write_aln($aln);
}



On Tue, 2004-05-11 at 16:04, luisa pugliese wrote:
> Hi,
>    I am new at bioperl and I am working on alignments. I am splitting a multiple alignment into several pairwise alignments and I would like to remove the columns in which there are gaps in all the sequences. Does anybody knows if a method exists doing this or a method finding the columns in which there are gaps in all the sequences similar gap_line that find the gaps present in any columns?
> Thank you 
> Luisa Pugliese
> =============================
> Luisa Pugliese, Ph.D.
> luisa.pugliese at safan-bioinformatics.it
> S.A.F.AN. BIOINFORMATICS
> Corso Tazzoli 215/13 -10137 Torino
> tel +39 011 3026230
> cell. +39 333 6130644
> 
> ______________________________________________________________________
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list