[Bioperl-l] Getting nucleotide seq from protein accession
Hilmar Lapp
hlapp at gmx.net
Mon Jun 27 12:17:49 EDT 2005
Features aren't getting their annotation bundles populated by the
genbank/genpept parsers. Instead, the feature table tags end up as
tag/value pairs, i.e., use $feature->get_all_tags() followed by
get_tag_values() for each tag returned.
Alternatively, you can look at the features tag and annotation bundle
as a single bundle using Bio::SeqFeature::AnnotationAdaptor:
use Bio::SeqFeature::AnnotationAdaptor;
# ... code to obtain feature $feat here
my $ac = Bio::SeqFeature::AnnotationAdaptor->new(-feature => $feat);
# ... use $ac as you would any annotation collection
Hth,
-hilmar
On Jun 27, 2005, at 6:47 AM, Kat Hull wrote:
> Hi Stefan,
> I'm still having problems!
> I get some features from the sequence object but can't get past this
> part in the code:
>
> $ac=$feat->annotation(); # This is not returning anything.
>
> Could you look at the code to tell me where its going wrong?
>
> Thanks again,
> #________________________________________
> #!/biol/programs/perl580/bin/perl
>
> use Bio::Annotation::Collection;
> use Bio::DB::GenPept;
> $gb = new Bio::DB::GenPept;
>
> my $seqio =
> $gb->get_Stream_by_id(['13474692']);
> my $ac;
> my $c;
>
> while( my $seq = $seqio->next_seq ) {
> print "seq is ", $seq->display_id, "\n";
> my @f=$seq->get_all_SeqFeatures; #This gives you the
> annotation of the retrieved sequence object
>
> foreach my $feat (@f) {
> ++$c;
> print "Feature ",$feat->primary_tag," starts ",$feat->start,"
> ends ",
> $feat->end," strand ",$feat->strand,"\n";
>
> # features retain link to underlying sequence object
> #print "Feature sequence is ",$feat->seq->seq(),"\n";
>
> my $t = $feat->feature_count;
> print "Feature count is $t\n";
> print "count is $c\n";
>
> $ac=$feat->annotation(); # PROBLEM SEEMS TO BE HERE
>
> my $blah=$ac->get_all_annotation_keys();
> print "GETS TO HERE WITH NO KEYS $blah\n";
>
> foreach $key ( $ac->get_all_annotation_keys() ) {
> print "DOESN'T GET TO HERE\n";
> @values = $ac->get_Annotations($key);
> foreach $value ( @values ) {
> # value is an Bio::AnnotationI, and defines a "as_text"
> method
> print "Annotation ",$key," stringified value
> ",$value->as_text,"\n";
>
> # also defined hash_tree method, which allows data
> orientated
> # access into this object
> $hash = $value->hash_tree();
> }
> }
>
> # commented out for now
> # next unless ($ac->get_Annotations('coded_by'));
> # my @coded=$ac->get_Annotations('coded_by');
> # foreach my $location (@coded) {
> # print $location->value, " is the location that codes this
> protein\n";
> # }
> }
> }
>
>
>
>
>
> Stefan Kirov wrote:
>
>> my $seqio = $gb->get_Stream_by_id(['13474692']);
>> while( my $seq = $seqio->next_seq ) {
>> print "seq is ", $seq->display_id, "\n";
>> my $ann=$seq->annotation; #This gives you the annotation of the
>> retrieved sequence object
>> foreach my $dblink ($ann->get_Annotations('DBLink')) {
>> if ($dblink->database =~/refseq/i) {
>> print $database->primary_id, " is the mRNA accession
>> number\n";
>> }
>> }
>> }
>> However, the gene you are looking at is not associated with any NM_
>> sequence, but rather comes from NC_. Therefore the above will not
>> work for you. You will have to descend through the sequence features
>> and find teh feature that says 'coded_by':
>> use Bio::DB::GenPept;
>> my $gb=new Bio::DB::GenPept;
>> my $seqio = $gb->get_Stream_by_id(['13474692']);
>> while( my $seq = $seqio->next_seq ) {
>> print "seq is ", $seq->display_id, "\n";
>> my @f=$seq->get_SeqFeatures; #This gives you the annotation of
>> the retrieved sequence object
>> foreach my $feat (@f) {
>> my $ann=$feat->annotation;
>> next unless ($ann->get_Annotations('coded_by'));
>> my @coded=$ann->get_Annotations('coded_by');
>> foreach my $location (@coded) {
>> print $location->value, " is the location that codes this
>> protein\n";
>> }
>> }
>> }
>> No guarantees the code is typo free :-)
>> Stefan
>>
>> Kat Hull wrote:
>>
>>> Hi Stefan,
>>> Thanks for your advice but i'm still struggling! I have used
>>> Bio::DB::GenPept to get the protein accession number given the
>>> protein gi number. However, I don't understand how
>>> Bio::Annotation::DBLink works. Does it fetch the url of a link on
>>> the web-site? Basically, if I could use this (or something else)
>>> to get the url of the CDS link for my protein of interest, I can get
>>> the corresponding nucleotide accession from this, as it is encoded
>>> in the url.
>>> Do you know how to use this module? Is this what you were
>>> suggesting I try yesterday (I didn't really understand what you were
>>> getting at).
>>> Many thanks,
>>>
>>> Kat
>>>
>>> ps. Here's where i'm at so far:
>>>
>>> use Bio::Annotation::DBLink;
>>> use Bio::DB::GenBank;
>>> use Bio::DB::GenPept;
>>> $gb = new Bio::DB::GenPept;
>>>
>>> # given the gi number, this returns the accession
>>> my $seqio = $gb->get_Stream_by_id(['13474692']);
>>> while( my $seq = $seqio->next_seq ) {
>>> print "seq is ", $seq->display_id, "\n";
>>> }
>>> # not sure what i'm doing here
>>> $link2 = new Bio::Annotation::DBLink();
>>> $link2->database('dbSNP');
>>> $link2->primary_id('2367');
>>>
>>>
>>>
>>>
>>>
>>> Stefan Kirov wrote:
>>>
>>>> Kat,
>>>> If you are familiar with Bioperl it is kind of easy-
>>>> look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on
>>>> how to get the protein record
>>>> Go through the dblinks and find the appropriate accession number
>>>> (where the database method returns GenBank).
>>>> Then retrieve this accession number(s) through Bio::DB::GenBank. If
>>>> you are not familiar with Bioperl- read the docs for
>>>> Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and
>>>> Bio::Annotation::DBLink).
>>>> Hope this helps,
>>>> Stefan
>>>>
>>>> Kat Hull wrote:
>>>>
>>>>> Hi there,
>>>>> I was wondering whether anyone has a solution to my problem. I
>>>>> have a list of protein assession numbers and want to retrieve the
>>>>> corresponding nucleotide sequences automatically. I thought it
>>>>> would be possible to do this by changing the NCBI url, but this
>>>>> doesn't seem to be the case.
>>>>> Is there a bio-perl module that can do this?
>>>>>
>>>>> Kind regards,
>>>>> Kat
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Stefan
>>>
>>>
>>>
>>>
>>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list