[Bioperl-l] counting gaps in a column of alignment
Francisco J. Ossandón
fossandonc at hotmail.com
Thu Jul 25 15:50:21 UTC 2013
The best and easiest way to count characters in Perl is using the
"transliterate" function, writing the characters in the search-list and
using an empty replace-list
(http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators), so it
searches for a list of characters but dont modify the string. The tr///
function always returns the number of characters found in the string:
###
my $string = 'ACGTN---actgn';
my $count_A = ($string =~ tr/Aa//); # <-- Note that by searching both
uppercase and lowercase characters, the search is made case insensitive
my $count_C = ($string =~ tr/Cc//);
my $count_T = ($string =~ tr/Tt//);
my $count_G = ($string =~ tr/Gg//);
my $count_N = ($string =~ tr/Nn//);
my $count_gap = ($string =~ tr/-//);
print "$count_A\t$count_C\t$count_T\t$count_G\t$count_N\t$count_gap\n";
exit;
###
Cheers,
Francisco J. Ossandon
-----Mensaje original-----
De: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de sk sweety
Enviado el: jueves, 25 de julio de 2013 2:17
CC: Bioperl-l at lists.open-bio.org
Asunto: Re: [Bioperl-l] counting gaps in a column of alignment
Thanks for the reply.
I did try other regex like,
$col eq $gap_char
$col =~ /-/
$col =~ /\-/
$col = "-"
Nothing seems to recognize the '-' gap character. The program returns a
warning like this,
"MSG: Slice [5-5] of sequence [3/1-6] contains no residues. Sequence
excluded from the new alignment".
and doesn't seem to count the gaps in the column.
No worries, that was a typo while getting to the mail list ..have got the
correct one:)
On Thu, Jul 25, 2013 at 3:54 PM, Alexey Morozov <alexeymorozov1991 at gmail.com
> wrote:
> Why would you use a regexp (which by the way I cannot understand: what
> are \Q and \E?) and not simply
>
> elsif($col eq $gap_char)
>
> $col is always one char long, so KISS.
> Also I hope I'll not sound like your school teacher picking all minor
> bugs, but you have just \ and not \t in header printing. And %count
> doubles your five $count_foo variables.
>
>
>
> 2013/7/25 subha kalyanamoorthy <sksweety24 at gmail.com>
>
> > Hi there,
> >
> > I am a new bioperl user. I made a script to count the nucleotide
> > composition in each column of the alignment. I am able to get the
> > count
> for
> > the nucleotides, but not for the gap characters from the following
> program.
> > I would greatly appreciate any suggestions to correct this program.
> >
> > Thanks.
> >
> > #***************************My
> > Program**********************************
> >
> > #!/bin/perl -w
> > use strict;
> > use warnings;
> >
> > use List::Util 'max';
> > use Bio::SimpleAlign;
> > use Bio::Align::AlignI;
> > use Bio::AlignIO;
> > use Bio::SeqIO;
> >
> > my $in= Bio::AlignIO->new( -file => "seq.fst", -format => "fasta");
> >
> > my $align = $in->next_aln();
> >
> > print "column\tA's\tT's\C's\G's\n";
> >
> > for (my $i = 1; $i <= $align->length; $i++) {
> >
> > my %count;
> >
> > my $seqs = $align->slice($i,$i);
> >
> > my $gap_char = $seqs->gap_char();
> >
> > my $count_A=0;
> > my $count_C=0;
> > my $count_T=0;
> > my $count_G=0;
> > my $count_N=0;
> > my $count_gap=0;
> >
> > foreach my $seq ($seqs->each_seq) {
> >
> > my $col=$seq->seq;
> >
> > if ($col eq 'A'){
> > $count_A++;
> > }elsif ($col eq 'C'){
> > $count_C++;
> > }elsif ($col eq 'T'){
> > $count_T++;
> > }elsif ($col eq 'G'){
> > $count_G++;
> > }elsif ( $col eq 'N'){
> > $count_N++;
> > }elsif ($col =~ m/^\Q$gap_char\E$/){
> > $count_gap++;
> > }
> > $count{$seq->seq} += 1;
> > }
> >
> > print"$i\t$count_A\t$count_T\t$count_C\t$count_G\n";
> >
> > }
> >
> > #*******************************************************************
> > ****` _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> Alexey Morozov,
> LIN SB RAS, bioinformatics group.
> Irkutsk, Russia.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list