[Bioperl-l] get the sequence of a column in a multiple alignment

Wed Feb 14 15:29:02 UTC 2007

there is a slice method:

  $mini_aln = $aln->slice(20,30);  # get a block of columns

 Title     : slice
 Usage     : $aln2 = $aln->slice(20,30)
 Function  : Creates a slice from the alignment inclusive of start and
             end columns, and the first column in the alignment is denoted 1.
             Sequences with no residues in the slice are excluded from the
             new alignment and a warning is printed. Slice beyond the length of
             the sequence does not do padding.
 Returns   : A Bio::SimpleAlign object
 Args      : Positive integer for start column, positive integer for end column,
             optional boolean which if true will keep gap-only columns
in the newly
             created slice. Example:

             $aln2 = $aln->slice(20,30,1)

but I don't know how well it behaves for lots of sequences :)


On 2/14/07, Mathieu Rouard <mrouard at gmail.com> wrote:
> Dear all,
>
> I am starting to use the bioperl API to parse multiple alignments and I am
> wondering what is the most effective way to splice all the columns from an
> alignment (all the AA at the postion 1, position 2 etc.). I quickly
> implemented this simple code but it becomes quite slow when the length of
> sequences increases.
>
> my $stream  = Bio::AlignIO->new(-file => $inputfilename,
>                         '-format' => 'stockholm');
>
> my $aln = $stream->next_aln();
>
> my $length = $aln->length();
> my %column;
>
> for (my $i=1;$i<=$length;$i++) {
>        my $aa;
>         foreach my $seq ($aln->each_seq()) {
>           my $obj = $seq->trunc($i,$i);
>           $aa .=$obj->seq;
>         }
>      # need to track the column number and the sequence of the column
>      push $column,  $aa;
> }
>
> Would you have any other suggestion?
>
> thanks
> Mathieu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>