[Bioperl-l] Bio::SeqIO, genbank -> fasta, protein only?
Jay Hannah
jay at jays.net
Sat Oct 14 15:25:05 UTC 2006
Brian Osborne wrote:
> Yes, people use the -alphabet parameter. If you set it to something then
> Bioperl will not try to determine whether the sequence is protein, rna, or
> dna and this is particularly useful when the sequence contains characters
> that Bioperl would object to (sequences with distasteful characters can be
> created by various applications, for example, or you might introduce some
> weird character for some reason). Setting the -alphabet would also speed up
> Bioperl a bit, for the same reason.
Huh. That's what I assumed when I stumbled into the -alphabet parameter. So I thought this would read the protein sequences out of my genbank file and write a fasta file for me:
my $seq_in = Bio::SeqIO->new(
-file => "<$file",
-format => "genbank",
-alphabet => "protein" # No effect?
);
my $seq_out = Bio::SeqIO->new(
-file => ">$outfile",
-format => "fasta",
-alphabet => "protein" # No effect?
);
while (my $inseq = $seq_in->next_seq) {
$inseq->molecule("protein"); # No effect?
$seq_out->write_seq($inseq);
}
It didn't. Would it be a Good Thing if it did what I was expecting? (Like I said I rolled my own, but I'm always looking for ways to enhance BioPerl that other people might find useful... Someday I will contribute something useful, by golly. -grin-)
(Background: I'm doing protein BLASTs from genbank files. To make formatdb happy I have to have fasta files full of the protein sequences.)
j
More information about the Bioperl-l
mailing list