[Bioperl-l] Alphabet guessing

Chervitz, Steve Steve_Chervitz at affymetrix.com
Tue Oct 18 14:16:42 EDT 2005


Back in the Bio::Root::Object days, one could decide to take one's fate in
one's own hands and have all throw() calls converted to warn() using
$object->strict(-2), see

http://doc.bioperl.org/bioperl-live/Bio/Root/Object.html#POD18

But you can't do this with current bioperl objects which are based on
Bio::Root::Root which lacks strict(). I suppose we left it out of Root.pm to
ensure it wouldn't fall into the wrong hands.

What do people think about reviving strict() for those 'damn the torpedoes'
situations, where you don't want to be interrupted by any unforeseen
exception and you're willing to assume the risk for any consequences?

Perhaps bioperl strict() could be responsive to the 'use strict' pragma so
that bioperl could become more strict when people turn on perl strictness
(as well they should most of the time). Of course it wouldn't be advertised
in the general docs, but only in the POD.

Steve

> From: Jason Stajich <jason.stajich at duke.edu>
> Date: Tue, 18 Oct 2005 08:06:41 -0400
> To: Dmitri Bichko <dbichko at aveopharma.com>
> Cc: <bioperl-l at portal.open-bio.org>
> Subject: Re: [Bioperl-l] Alphabet guessing
> 
>  From the Bio::SeqIO documentation
> 
> -alphabet
> 
> Sets the alphabet ('dna', 'rna', or 'protein'). When the alphabet is
> set then Bioperl will not attempt to guess what the alphabet is. This
> may be important because Bioperl does not always guess correctly.
> 
> 
> You can pre-specify the alphabet:
> 
> $seqio = Bio::SeqIO->new(-format => 'fasta',
>                                                  -file =>
> "fifteen_million_sequence_file.fa",
>                                                  -alphabet => 'dna');
> 
> -jason
> On Oct 18, 2005, at 3:49 AM, Dmitri Bichko wrote:
> 
>> Hi,
>> 
>> Is being unable to guess the sequence alphabet really an unrecoverable
>> error?  I'm referring to this bit in PrimarySeq.pm:
>> 
>>   my $str = $self->seq();
>>   $str =~ s/[-.?x]//gi;
>>   my $total = CORE::length($str);
>>   if( $total == 0 ) {
>>     $self->throw("Got a sequence with no letters in it ".
>>       "cannot guess alphabet [$str]");
>>   }
>> 
>> Problem is that if you happen on a seq that's all X's, you get a fatal
>> exception, which can be very annoying when you are in the middle of
>> a 15
>> million sequence fasta stream (where you don't care about, nor even
>> expect the alphabet type; and the docs suggest that you can't
>> necessarily recover after catching exceptions).
>> 
>> Might not something along these lines make more sense:
>> 
>>   if( $total == 0 ) {
>>     $self->warn("Got a sequence with no letters in it, assuming 'dna'
>> alphabet.");
>>     $self->alphabet('dna');
>>     return 'dna';
>>   }
>> 
>> Or should the seqio factories catch the guessing exceptions?
>> 
>> Thanks,
>> Dmitri
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list