[Bioperl-l] Bio::SeqIO, genbank -> fasta, protein only?

Sat Oct 14 15:25:05 UTC 2006

Brian Osborne wrote:
> Yes, people use the -alphabet parameter. If you set it to something then
> Bioperl will not try to determine whether the sequence is protein, rna, or
> dna and this is particularly useful when the sequence contains characters
> that Bioperl would object to (sequences with distasteful characters can be
> created by various applications, for example, or you might introduce some
> weird character for some reason). Setting the -alphabet would also speed up
> Bioperl a bit, for the same reason.

Huh. That's what I assumed when I stumbled into the -alphabet parameter. So I thought this would read the protein sequences out of my genbank file and write a fasta file for me:

my $seq_in  = Bio::SeqIO->new(
   -file     => "<$file",  
   -format   => "genbank",
   -alphabet => "protein"  # No effect?
);
my $seq_out = Bio::SeqIO->new(
   -file     => ">$outfile",
   -format   => "fasta",
   -alphabet => "protein"  # No effect?
);
while (my $inseq = $seq_in->next_seq) {
   $inseq->molecule("protein");    # No effect?
   $seq_out->write_seq($inseq);
}

It didn't. Would it be a Good Thing if it did what I was expecting? (Like I said I rolled my own, but I'm always looking for ways to enhance BioPerl that other people might find useful... Someday I will contribute something useful, by golly. -grin-)

(Background: I'm doing protein BLASTs from genbank files. To make formatdb happy I have to have fasta files full of the protein sequences.)

j