[Bioperl-l] Bio::SeqIO, genbank -> fasta, protein only?
Jay Hannah
jay at jays.net
Fri Oct 13 18:56:38 UTC 2006
Brian Osborne wrote:
> You're looking for the "translation" string in the CDS section, yes? You
> need to delve a bit into features, the CDS is considered to be a feature of
> the main or parent nucleotide sequence and the translation is part of CDS
> feature:
>
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Features_from_Genbank
Yes. Thanks. I "rolled my own" -- I'm now doing this:
while (my $inseq = $seq_in->next_seq) {
my @features = $inseq->get_SeqFeatures();
foreach my $feat ( @features ) {
next unless ($feat->primary_tag eq "CDS");
my @db_xrefs = $feat->annotation->get_Annotations("db_xref");
@db_xrefs = grep { /^GI:/ } @db_xrefs;
die "Panic! More than one GI: db_xref?" if (@db_xrefs > 1);
die "Panic! No GI: db_xref?" unless (@db_xrefs == 1);
my $gi = $db_xrefs[0];
$gi =~ s/^GI://;
my @translations = $feat->annotation->get_Annotations("translation");
die "Panic! More than one translation?" if (@translations > 1);
my @protein_ids = $feat->annotation->get_Annotations("protein_id");
die "Panic! More than one protein_id?" if (@protein_ids > 1);
my @product = $feat->annotation->get_Annotations("product");
die "Panic! More than one product?" if (@product > 1);
print ">gi|$gi|gb|$protein_ids[0]|";
print $inseq->id . " $product[0]\n";
print "$translations[0]\n";
}
}
To generate a homebrew fasta file for a protein BLAST.
I just thought that -alphabet and molecule() would do that stuff for me? What else would "protein" mean in those? Does anyone use -alphabet and/or molecule()? For what? How? Again, here's what I'm talking about:
==========
my $seq_out_protein = Bio::SeqIO->new(
-file => ">out",
-format => 'fasta',
-alphabet => 'protein' # No effect?
);
while (my $inseq = $seq_in->next_seq) {
$inseq->molecule("protein"); # No effect?
==========
Thanks,
j
More information about the Bioperl-l
mailing list