[Bioperl-l] Error when calling remove_gaps

Jason Stajich jason.stajich at duke.edu
Tue Feb 15 18:25:30 EST 2005


I really hate the way _guess_alphabet works.  It completely falls over 
with all X, or empty sequences.  It needs to not throw when it 
encounters  a problem like this.

I have overridden it and validate_seq in many of my scripts.
Try putting this at the top of your script before your code, but after 
the 'use' statements.

sub Bio::PrimarySeq::_guess_alphabet {
    my ($self) = @_;
    my $type;

    my $str = $self->seq();
         # Remove char's that clearly denote ambiguity
    $str =~ s/[-.?x]//gi;

    my $total = CORE::length($str);
    if( $total == 0 ) {
        $self->warn("Got a sequence with no letters in it ".
                     "cannot guess alphabet [$str]");
      return 'dna'; # just make dna the default for now
    }

    my $u = ($str =~ tr/Uu//);
         # The assumption here is that most of sequences comprised of 
mainly
    # ATGC, with some N, will be 'dna' despite the fact that N could
         # also be Asparagine
    my $atgc = ($str =~ tr/ATGCNatgcn//);

    if( ($atgc / $total) > 0.85 ) {
        $type = 'dna';
    } elsif( (($atgc + $u) / $total) > 0.85 ) {
        $type = 'rna';
    } else {
        $type = 'protein';
    }

    $self->alphabet($type);
    return $type;
}



--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Feb 16, 2005, at 3:32 AM, michael watson ((IAH-C)) wrote:

> Hi
>
> I'm using bioperl-1.4 on Linux.  I get the following error after 
> calling
> remove_gaps on an alignment I have read in using AlignIO.  The 
> alignment
> is in fasta format, and some sequences contain "N"'s and "-"'s as gap
> characters, but some sequences do not include any, including the first
> sequence.  This problem occurs when I call:
>
> $al->remove_gaps("-")
>
> ------------- EXCEPTION  -------------
> MSG: Got a sequence with no letters in - cannot guess alphabet []
> STACK Bio::PrimarySeq::_guess_alphabet
> /usr/local/bioperl-1.4/Bio/PrimarySeq.pm:839
> STACK Bio::PrimarySeq::seq /usr/local/bioperl-1.4/Bio/PrimarySeq.pm:280
> STACK Bio::SimpleAlign::_remove_col
> /usr/local/bioperl-1.4/Bio/SimpleAlign.pm:959
> STACK Bio::SimpleAlign::remove_gaps
> /usr/local/bioperl-1.4/Bio/SimpleAlign.pm:922
> STACK toplevel create_blastable.pl:14
>
> --------------------------------------
>
> Any ideas?
>
> Thanks in advance
>
> Mick
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050216/40b3efd5/PGP-0001.bin


More information about the Bioperl-l mailing list