[Bioperl-l] retrieving coding sequences from swissprot protein accessions

Michael Bradley mebradley at chem.ufl.edu
Tue Jun 1 11:43:56 EDT 2004


Hello all,

I would like to get at the coding sequence for a given protein with a 
swissprot accession. I have done this with GenBank file in the past 
using the following code. Does anyone know how to do this with swissprot ?

my $gp = new Bio::DB::GenPept;
my $gb = new Bio::DB::GenBank;
my $loc_factory = new Bio::Factory::FTLocationFactory;
	
my $prot_stream = $gp->get_Stream_by_acc($protein_gi);
	while ( my $prot_seq = $prot_stream->next_seq() ) {
		foreach my $feat ( $prot_seq->top_SeqFeatures ) {
		if ( $feat->primary_tag eq 'CDS' ) {
		# example: 'coded_by="U05729.1:1..122"'
		my @coded_by = $feat->each_tag_value('coded_by');
		my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0];
		my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc);
		# create Bio::Location object from a string
		my $loc_object = $loc_factory->from_string($loc_str);
		# create a Feature object by using a Location
		my $feat_obj = new Bio::SeqFeature::Generic(-location =>$loc_object);
		# associate the Feature object with the nucleotide Seq object
		$nuc_obj->add_SeqFeature($feat_obj);
		my $cds_obj = $feat_obj->spliced_seq;
		print "CDS sequence is ",$cds_obj->seq,"\n\n";
		} else {
		print "No CDS for ", $prot_seq->id,"\n\n";
		}
		}
	}

Thanks,

Michael Bradley



More information about the Bioperl-l mailing list