[Bioperl-l] SeqIO fails on masked sequences

Marc Logghe Marc.Logghe at devgen.com
Thu Dec 16 04:27:00 EST 2004

Hi Wes,

> > Guess you can do it by setting the alphabet explicitely:
> > $seq_in->alphabet('dna'); # or 'rna' or 'protein'
> Sorry, that does not work.  I tried this and got the same error:

Yeah, some strange things seem to happen. You can set it this way but it is not taken into account anyhow by Bio::SeqIO::fasta: when it is set and there is a sequence found, it is boldly set to undef !!!
In object creation the type is guessed anyhow and in your case it ends up as protein because of the X's. It would end up as dna if it were N's, though. 

> > Indirectly, you can do it also by setting the alphabet for 
> the factory object and passing the factory object with the 
> Bio::SeqIO constructor.
> Would you provide an example?
Think that did not make sense, sorry for that.

On the other hand I was not able to mimick your problem generating the error. I got no errors, only the fact that the alphabet was reset to 'protein'. Initially I got a similar error but that was caused by the fact that $format was not set yet and I did not run using the strict pragma.

The script I used to find that out:

use strict;
use Bio::SeqIO;
use Data::Dumper;

my $format = 'fasta';

my $seq_in  = Bio::SeqIO->new(-format=>$format, -fh => \*DATA);
my $seq_out = Bio::SeqIO->new(-format=>$format, -fh => \*STDOUT);
my $seq = $seq_in->next_seq;

print Data::Dumper->Dump([$seq],['seq']);


I am afraid I can not be of more help here.

More information about the Bioperl-l mailing list