[Bioperl-l] Bio::SeqIO, genbank -> fasta, protein only?

Fri Oct 13 18:56:38 UTC 2006

Brian Osborne wrote:
> You're looking for the "translation" string in the CDS section, yes? You
> need to delve a bit into features, the CDS is considered to be a feature of
> the main or parent nucleotide sequence and the translation is part of CDS
> feature:
> 
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Features_from_Genbank

Yes. Thanks. I "rolled my own" -- I'm now doing this:

while (my $inseq = $seq_in->next_seq) {
   my @features = $inseq->get_SeqFeatures();
   foreach my $feat ( @features ) {
      next unless ($feat->primary_tag eq "CDS");
      my @db_xrefs = $feat->annotation->get_Annotations("db_xref");
      @db_xrefs = grep { /^GI:/ } @db_xrefs;
      die "Panic! More than one GI: db_xref?"     if (@db_xrefs > 1);
      die "Panic! No GI: db_xref?"            unless (@db_xrefs == 1);
      my $gi = $db_xrefs[0];
      $gi =~ s/^GI://;
      my @translations = $feat->annotation->get_Annotations("translation");
      die "Panic! More than one translation?" if (@translations > 1);
      my @protein_ids = $feat->annotation->get_Annotations("protein_id");
      die "Panic! More than one protein_id?"  if (@protein_ids > 1);
      my @product = $feat->annotation->get_Annotations("product");
      die "Panic! More than one product?"  if (@product > 1);
      print ">gi|$gi|gb|$protein_ids[0]|";
      print $inseq->id . " $product[0]\n";
      print "$translations[0]\n";
   }
}

To generate a homebrew fasta file for a protein BLAST.

I just thought that -alphabet and molecule() would do that stuff for me? What else would "protein" mean in those? Does anyone use -alphabet and/or molecule()? For what? How? Again, here's what I'm talking about:

==========
my $seq_out_protein = Bio::SeqIO->new(
   -file => ">out",
   -format => 'fasta',
   -alphabet => 'protein'    # No effect?
);
while (my $inseq = $seq_in->next_seq) {
   $inseq->molecule("protein");    # No effect?
==========

Thanks,

j