[Bioperl-l] Bio::SeqIO, genbank -> fasta, protein only?
Kevin Brown
Kevin.M.Brown at asu.edu
Mon Oct 16 16:14:51 UTC 2006
> > Yes, people use the -alphabet parameter. If you set it to
> something then
> > Bioperl will not try to determine whether the sequence is
> protein, rna, or
> > dna and this is particularly useful when the sequence
> contains characters
> > that Bioperl would object to (sequences with distasteful
> characters can be
> > created by various applications, for example, or you might
> introduce some
> > weird character for some reason). Setting the -alphabet
> would also speed up
> > Bioperl a bit, for the same reason.
>
> Huh. That's what I assumed when I stumbled into the -alphabet
> parameter. So I thought this would read the protein sequences
> out of my genbank file and write a fasta file for me:
>
> my $seq_in = Bio::SeqIO->new(
> -file => "<$file",
> -format => "genbank",
> -alphabet => "protein" # No effect?
> );
> my $seq_out = Bio::SeqIO->new(
> -file => ">$outfile",
> -format => "fasta",
> -alphabet => "protein" # No effect?
> );
> while (my $inseq = $seq_in->next_seq) {
> $inseq->molecule("protein"); # No effect?
> $seq_out->write_seq($inseq);
> }
>
> It didn't. Would it be a Good Thing if it did what I was
> expecting? (Like I said I rolled my own, but I'm always
> looking for ways to enhance BioPerl that other people might
> find useful... Someday I will contribute something useful, by
> golly. -grin-)
>
> (Background: I'm doing protein BLASTs from genbank files. To
> make formatdb happy I have to have fasta files full of the
> protein sequences.)
This might work for your needs (CDS to protein FASTA).
my $seq_in = Bio::SeqIO->new(
-file => "<$file",
-format => "genbank",
);
open my $seq_out, ">$outfile";
while (my $inseq = $seq_in->next_seq) {
print $seq_out ">". $inseq->display_id(). "\n";
print $seq_out $inseq->translate() ."\n";
}
More information about the Bioperl-l
mailing list