[Bioperl-l] Getting nucleotide seq from protein accession

Kat Hull khh103 at york.ac.uk
Mon Jun 27 06:47:57 EDT 2005


Hi Stefan,
I'm still having problems!
I get some features from the sequence object but can't get past this 
part in the code:

  $ac=$feat->annotation();   # This is not returning anything.

Could you look at the code to tell me where its going wrong?

Thanks again,
#________________________________________
#!/biol/programs/perl580/bin/perl
                                                                                              

use Bio::Annotation::Collection;
use Bio::DB::GenPept;
$gb = new Bio::DB::GenPept;
                                                                                              

                                    
my $seqio = $gb->get_Stream_by_id(['13474692']);
my $ac;
my $c;
                                                                                              

  while( my $seq = $seqio->next_seq ) {
      print "seq is  ", $seq->display_id, "\n";
      my @f=$seq->get_all_SeqFeatures;     #This gives you the 
annotation of the retrieved sequence object
                                                                                              

      foreach my $feat (@f) {
        ++$c;
       print "Feature ",$feat->primary_tag," starts ",$feat->start," ends ",
       $feat->end," strand ",$feat->strand,"\n";
                                                                                              

       # features retain link to underlying sequence object
       #print "Feature sequence is ",$feat->seq->seq(),"\n";
                                                                                              

        my $t = $feat->feature_count;
        print "Feature count is $t\n";
        print "count is $c\n";
                                                                                              

        $ac=$feat->annotation();  #  PROBLEM SEEMS TO BE HERE
                                                                                              

        my $blah=$ac->get_all_annotation_keys();
        print "GETS TO HERE WITH NO KEYS $blah\n";
                                                                                              

           foreach $key ( $ac->get_all_annotation_keys() ) {
                print "DOESN'T GET TO HERE\n";
               @values = $ac->get_Annotations($key);
               foreach $value ( @values ) {
                # value is an Bio::AnnotationI, and defines a "as_text" 
method
                print "Annotation ",$key," stringified value 
",$value->as_text,"\n";
                                                                                              

                # also defined hash_tree method, which allows data 
orientated
                # access into this object
                $hash = $value->hash_tree();
               }
           }
                                                                                              

# commented out for now          
                                                                                  

   #    next unless ($ac->get_Annotations('coded_by'));
    #   my @coded=$ac->get_Annotations('coded_by');
     #  foreach my $location (@coded) {
      #   print $location->value, " is the location that codes this 
protein\n";
     #  }
      }
  }





Stefan Kirov wrote:

>   my $seqio = $gb->get_Stream_by_id(['13474692']);
>   while( my $seq = $seqio->next_seq ) {
>       print "seq is  ", $seq->display_id, "\n";
>       my $ann=$seq->annotation; #This gives you the annotation of the 
> retrieved sequence object
>      foreach my $dblink ($ann->get_Annotations('DBLink')) {
>          if ($dblink->database =~/refseq/i) {
>             print $database->primary_id, " is the mRNA accession 
> number\n";
>          }
>       }
>   }
> However, the gene you are looking at is not associated with any NM_ 
> sequence, but rather comes from NC_. Therefore the above will not work 
> for you. You will have to descend through the sequence features and 
> find teh feature that says 'coded_by':
> use Bio::DB::GenPept;
> my $gb=new Bio::DB::GenPept;
>  my $seqio = $gb->get_Stream_by_id(['13474692']);
>   while( my $seq = $seqio->next_seq ) {
>       print "seq is  ", $seq->display_id, "\n";
>       my @f=$seq->get_SeqFeatures; #This gives you the annotation of 
> the retrieved sequence object
>      foreach my $feat (@f) {
>        my $ann=$feat->annotation;
>        next unless ($ann->get_Annotations('coded_by'));
>        my @coded=$ann->get_Annotations('coded_by');
>        foreach my $location (@coded) {
>          print $location->value, " is the location that codes this 
> protein\n";
>        }
>       }
>   }
> No guarantees the code is typo free :-)
> Stefan
>
> Kat Hull wrote:
>
>> Hi Stefan,
>> Thanks for your advice but i'm still struggling!   I have used 
>> Bio::DB::GenPept to get the protein accession number given the 
>> protein gi number.  However, I don't understand how 
>> Bio::Annotation::DBLink works.  Does it fetch the url of a link on 
>> the web-site?   Basically, if I could use this (or something else) to 
>> get the url of the CDS link for my protein of interest, I can get the 
>> corresponding nucleotide accession from this, as it is encoded in the 
>> url.
>> Do you know how to use this module?  Is this what you were suggesting 
>> I try yesterday (I didn't really understand what you were getting at).
>> Many thanks,
>>
>> Kat
>>
>> ps. Here's where i'm at so far:
>>                                                                                                                                                                                    
>>
>> use Bio::Annotation::DBLink;
>> use Bio::DB::GenBank;
>> use Bio::DB::GenPept;
>>   $gb = new Bio::DB::GenPept;
>>                                                                                                                                                                                                                                                                                                                                                                                                                                                  
>>
>>    # given the gi number, this returns the accession
>>    my $seqio = $gb->get_Stream_by_id(['13474692']);
>>    while( my $seq = $seqio->next_seq ) {
>>        print "seq is  ", $seq->display_id, "\n";
>>    }
>>     # not sure what i'm doing here 
>>                                                                                                                                                                                                                                                                                                                                                               
>>
>>   $link2 = new Bio::Annotation::DBLink();
>>   $link2->database('dbSNP');
>>   $link2->primary_id('2367');
>>
>>
>>
>>
>>
>> Stefan Kirov wrote:
>>
>>> Kat,
>>> If you are familiar with Bioperl it is kind of easy-
>>> look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on how 
>>> to get the protein record
>>> Go through the dblinks and find the appropriate accession number 
>>> (where the database method returns  GenBank).
>>> Then retrieve this accession number(s) through Bio::DB::GenBank. If 
>>> you are not familiar with Bioperl- read the docs for 
>>> Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and 
>>> Bio::Annotation::DBLink).
>>> Hope this helps,
>>> Stefan
>>>
>>> Kat Hull wrote:
>>>
>>>> Hi there,
>>>> I was wondering whether anyone has a solution to my problem. I have 
>>>> a list of protein assession numbers and want to retrieve the 
>>>> corresponding nucleotide sequences automatically.  I thought it 
>>>> would be possible to do this by changing the NCBI url, but this 
>>>> doesn't seem to be the case.
>>>> Is there a bio-perl module that can do this?
>>>>
>>>> Kind regards,
>>>> Kat
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>>
>>> Stefan
>>
>>
>>
>>
>>
>




More information about the Bioperl-l mailing list