[Bioperl-l] Getting nucleotide seq from protein accession

Mon Jun 27 12:17:49 EDT 2005

Features aren't getting their annotation bundles populated by the 
genbank/genpept parsers. Instead, the feature table tags end up as 
tag/value pairs, i.e., use $feature->get_all_tags() followed by 
get_tag_values() for each tag returned.

Alternatively, you can look at the features tag and annotation bundle 
as a single bundle using Bio::SeqFeature::AnnotationAdaptor:

use  Bio::SeqFeature::AnnotationAdaptor;
# ... code to obtain feature $feat here
my $ac = Bio::SeqFeature::AnnotationAdaptor->new(-feature => $feat);
# ... use $ac as you would any annotation collection

Hth,

	-hilmar

On Jun 27, 2005, at 6:47 AM, Kat Hull wrote:

> Hi Stefan,
> I'm still having problems!
> I get some features from the sequence object but can't get past this 
> part in the code:
>
>  $ac=$feat->annotation();   # This is not returning anything.
>
> Could you look at the code to tell me where its going wrong?
>
> Thanks again,
> #________________________________________
> #!/biol/programs/perl580/bin/perl
>
> use Bio::Annotation::Collection;
> use Bio::DB::GenPept;
> $gb = new Bio::DB::GenPept;
>
>                                    my $seqio = 
> $gb->get_Stream_by_id(['13474692']);
> my $ac;
> my $c;
>
>  while( my $seq = $seqio->next_seq ) {
>      print "seq is  ", $seq->display_id, "\n";
>      my @f=$seq->get_all_SeqFeatures;     #This gives you the 
> annotation of the retrieved sequence object
>
>      foreach my $feat (@f) {
>        ++$c;
>       print "Feature ",$feat->primary_tag," starts ",$feat->start," 
> ends ",
>       $feat->end," strand ",$feat->strand,"\n";
>
>       # features retain link to underlying sequence object
>       #print "Feature sequence is ",$feat->seq->seq(),"\n";
>
>        my $t = $feat->feature_count;
>        print "Feature count is $t\n";
>        print "count is $c\n";
>
>        $ac=$feat->annotation();  #  PROBLEM SEEMS TO BE HERE
>
>        my $blah=$ac->get_all_annotation_keys();
>        print "GETS TO HERE WITH NO KEYS $blah\n";
>
>           foreach $key ( $ac->get_all_annotation_keys() ) {
>                print "DOESN'T GET TO HERE\n";
>               @values = $ac->get_Annotations($key);
>               foreach $value ( @values ) {
>                # value is an Bio::AnnotationI, and defines a "as_text" 
> method
>                print "Annotation ",$key," stringified value 
> ",$value->as_text,"\n";
>
>                # also defined hash_tree method, which allows data 
> orientated
>                # access into this object
>                $hash = $value->hash_tree();
>               }
>           }
>
> # commented out for now
>   #    next unless ($ac->get_Annotations('coded_by'));
>    #   my @coded=$ac->get_Annotations('coded_by');
>     #  foreach my $location (@coded) {
>      #   print $location->value, " is the location that codes this 
> protein\n";
>     #  }
>      }
>  }
>
>
>
>
>
> Stefan Kirov wrote:
>
>>   my $seqio = $gb->get_Stream_by_id(['13474692']);
>>   while( my $seq = $seqio->next_seq ) {
>>       print "seq is  ", $seq->display_id, "\n";
>>       my $ann=$seq->annotation; #This gives you the annotation of the 
>> retrieved sequence object
>>      foreach my $dblink ($ann->get_Annotations('DBLink')) {
>>          if ($dblink->database =~/refseq/i) {
>>             print $database->primary_id, " is the mRNA accession 
>> number\n";
>>          }
>>       }
>>   }
>> However, the gene you are looking at is not associated with any NM_ 
>> sequence, but rather comes from NC_. Therefore the above will not 
>> work for you. You will have to descend through the sequence features 
>> and find teh feature that says 'coded_by':
>> use Bio::DB::GenPept;
>> my $gb=new Bio::DB::GenPept;
>>  my $seqio = $gb->get_Stream_by_id(['13474692']);
>>   while( my $seq = $seqio->next_seq ) {
>>       print "seq is  ", $seq->display_id, "\n";
>>       my @f=$seq->get_SeqFeatures; #This gives you the annotation of 
>> the retrieved sequence object
>>      foreach my $feat (@f) {
>>        my $ann=$feat->annotation;
>>        next unless ($ann->get_Annotations('coded_by'));
>>        my @coded=$ann->get_Annotations('coded_by');
>>        foreach my $location (@coded) {
>>          print $location->value, " is the location that codes this 
>> protein\n";
>>        }
>>       }
>>   }
>> No guarantees the code is typo free :-)
>> Stefan
>>
>> Kat Hull wrote:
>>
>>> Hi Stefan,
>>> Thanks for your advice but i'm still struggling!   I have used 
>>> Bio::DB::GenPept to get the protein accession number given the 
>>> protein gi number.  However, I don't understand how 
>>> Bio::Annotation::DBLink works.  Does it fetch the url of a link on 
>>> the web-site?   Basically, if I could use this (or something else) 
>>> to get the url of the CDS link for my protein of interest, I can get 
>>> the corresponding nucleotide accession from this, as it is encoded 
>>> in the url.
>>> Do you know how to use this module?  Is this what you were 
>>> suggesting I try yesterday (I didn't really understand what you were 
>>> getting at).
>>> Many thanks,
>>>
>>> Kat
>>>
>>> ps. Here's where i'm at so far:
>>>
>>> use Bio::Annotation::DBLink;
>>> use Bio::DB::GenBank;
>>> use Bio::DB::GenPept;
>>>   $gb = new Bio::DB::GenPept;
>>>
>>>    # given the gi number, this returns the accession
>>>    my $seqio = $gb->get_Stream_by_id(['13474692']);
>>>    while( my $seq = $seqio->next_seq ) {
>>>        print "seq is  ", $seq->display_id, "\n";
>>>    }
>>>     # not sure what i'm doing here
>>>   $link2 = new Bio::Annotation::DBLink();
>>>   $link2->database('dbSNP');
>>>   $link2->primary_id('2367');
>>>
>>>
>>>
>>>
>>>
>>> Stefan Kirov wrote:
>>>
>>>> Kat,
>>>> If you are familiar with Bioperl it is kind of easy-
>>>> look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on 
>>>> how to get the protein record
>>>> Go through the dblinks and find the appropriate accession number 
>>>> (where the database method returns  GenBank).
>>>> Then retrieve this accession number(s) through Bio::DB::GenBank. If 
>>>> you are not familiar with Bioperl- read the docs for 
>>>> Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and 
>>>> Bio::Annotation::DBLink).
>>>> Hope this helps,
>>>> Stefan
>>>>
>>>> Kat Hull wrote:
>>>>
>>>>> Hi there,
>>>>> I was wondering whether anyone has a solution to my problem. I 
>>>>> have a list of protein assession numbers and want to retrieve the 
>>>>> corresponding nucleotide sequences automatically.  I thought it 
>>>>> would be possible to do this by changing the NCBI url, but this 
>>>>> doesn't seem to be the case.
>>>>> Is there a bio-perl module that can do this?
>>>>>
>>>>> Kind regards,
>>>>> Kat
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Stefan
>>>
>>>
>>>
>>>
>>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------