[Bioperl-l] Getting nucleotide seq from protein accession

Stefan Kirov skirov at utk.edu
Mon Jun 27 11:56:02 EDT 2005


Kat,
In my experience works OK:
60 mendel /home/sao> perl test.pl
seq is  NP_106261
Feature source starts 1 ends 243 strand 1
Feature count is 0
count is 1
GETS TO HERE WITH NO KEYS 3
DOESN'T GET TO HERE
Annotation db_xref stringified value Value: taxon:266835
DOESN'T GET TO HERE
Annotation strain stringified value Value: MAFF303099
DOESN'T GET TO HERE
Annotation organism stringified value Value: Mesorhizobium loti MAFF303099
Feature Protein starts 1 ends 243 strand 1
Feature count is 0
count is 2
GETS TO HERE WITH NO KEYS 1
DOESN'T GET TO HERE
Annotation product stringified value Value: transcriptional regulator
Feature CDS starts 1 ends 243 strand 1
Feature count is 0
count is 3
GETS TO HERE WITH NO KEYS 4
DOESN'T GET TO HERE
Annotation locus_tag stringified value Value: mlr5637
DOESN'T GET TO HERE
Annotation db_xref stringified value Value: GeneID:1228922
DOESN'T GET TO HERE
Annotation coded_by stringified value Value: NC_002678.2:4529413..4530144
DOESN'T GET TO HERE
Annotation transl_table stringified value Value: 11

What you need is
Annotation coded_by stringified value Value: NC_002678.2:4529413..4530144
What bioperl version are you using?

Stefan

Kat Hull wrote:

> Hi Stefan,
> I'm still having problems!
> I get some features from the sequence object but can't get past this 
> part in the code:
>
>  $ac=$feat->annotation();   # This is not returning anything.
>
> Could you look at the code to tell me where its going wrong?
>
> Thanks again,
> #________________________________________
> #!/biol/programs/perl580/bin/perl
>                                                                                              
>
> use Bio::Annotation::Collection;
> use Bio::DB::GenPept;
> $gb = new Bio::DB::GenPept;
>                                                                                              
>
>                                    my $seqio = 
> $gb->get_Stream_by_id(['13474692']);
> my $ac;
> my $c;
>                                                                                              
>
>  while( my $seq = $seqio->next_seq ) {
>      print "seq is  ", $seq->display_id, "\n";
>      my @f=$seq->get_all_SeqFeatures;     #This gives you the 
> annotation of the retrieved sequence object
>                                                                                              
>
>      foreach my $feat (@f) {
>        ++$c;
>       print "Feature ",$feat->primary_tag," starts ",$feat->start," 
> ends ",
>       $feat->end," strand ",$feat->strand,"\n";
>                                                                                              
>
>       # features retain link to underlying sequence object
>       #print "Feature sequence is ",$feat->seq->seq(),"\n";
>                                                                                              
>
>        my $t = $feat->feature_count;
>        print "Feature count is $t\n";
>        print "count is $c\n";
>                                                                                              
>
>        $ac=$feat->annotation();  #  PROBLEM SEEMS TO BE HERE
>                                                                                              
>
>        my $blah=$ac->get_all_annotation_keys();
>        print "GETS TO HERE WITH NO KEYS $blah\n";
>                                                                                              
>
>           foreach $key ( $ac->get_all_annotation_keys() ) {
>                print "DOESN'T GET TO HERE\n";
>               @values = $ac->get_Annotations($key);
>               foreach $value ( @values ) {
>                # value is an Bio::AnnotationI, and defines a "as_text" 
> method
>                print "Annotation ",$key," stringified value 
> ",$value->as_text,"\n";
>                                                                                              
>
>                # also defined hash_tree method, which allows data 
> orientated
>                # access into this object
>                $hash = $value->hash_tree();
>               }
>           }
>                                                                                              
>
> # commented out for now          
>                                                                                  
>
>   #    next unless ($ac->get_Annotations('coded_by'));
>    #   my @coded=$ac->get_Annotations('coded_by');
>     #  foreach my $location (@coded) {
>      #   print $location->value, " is the location that codes this 
> protein\n";
>     #  }
>      }
>  }
>
>
>
>
>
> Stefan Kirov wrote:
>
>>   my $seqio = $gb->get_Stream_by_id(['13474692']);
>>   while( my $seq = $seqio->next_seq ) {
>>       print "seq is  ", $seq->display_id, "\n";
>>       my $ann=$seq->annotation; #This gives you the annotation of the 
>> retrieved sequence object
>>      foreach my $dblink ($ann->get_Annotations('DBLink')) {
>>          if ($dblink->database =~/refseq/i) {
>>             print $database->primary_id, " is the mRNA accession 
>> number\n";
>>          }
>>       }
>>   }
>> However, the gene you are looking at is not associated with any NM_ 
>> sequence, but rather comes from NC_. Therefore the above will not 
>> work for you. You will have to descend through the sequence features 
>> and find teh feature that says 'coded_by':
>> use Bio::DB::GenPept;
>> my $gb=new Bio::DB::GenPept;
>>  my $seqio = $gb->get_Stream_by_id(['13474692']);
>>   while( my $seq = $seqio->next_seq ) {
>>       print "seq is  ", $seq->display_id, "\n";
>>       my @f=$seq->get_SeqFeatures; #This gives you the annotation of 
>> the retrieved sequence object
>>      foreach my $feat (@f) {
>>        my $ann=$feat->annotation;
>>        next unless ($ann->get_Annotations('coded_by'));
>>        my @coded=$ann->get_Annotations('coded_by');
>>        foreach my $location (@coded) {
>>          print $location->value, " is the location that codes this 
>> protein\n";
>>        }
>>       }
>>   }
>> No guarantees the code is typo free :-)
>> Stefan
>>
>> Kat Hull wrote:
>>
>>> Hi Stefan,
>>> Thanks for your advice but i'm still struggling!   I have used 
>>> Bio::DB::GenPept to get the protein accession number given the 
>>> protein gi number.  However, I don't understand how 
>>> Bio::Annotation::DBLink works.  Does it fetch the url of a link on 
>>> the web-site?   Basically, if I could use this (or something else) 
>>> to get the url of the CDS link for my protein of interest, I can get 
>>> the corresponding nucleotide accession from this, as it is encoded 
>>> in the url.
>>> Do you know how to use this module?  Is this what you were 
>>> suggesting I try yesterday (I didn't really understand what you were 
>>> getting at).
>>> Many thanks,
>>>
>>> Kat
>>>
>>> ps. Here's where i'm at so far:
>>>                                                                                                                                                                                    
>>>
>>> use Bio::Annotation::DBLink;
>>> use Bio::DB::GenBank;
>>> use Bio::DB::GenPept;
>>>   $gb = new Bio::DB::GenPept;
>>>                                                                                                                                                                                                                                                                                                                                                                                                                                                  
>>>
>>>    # given the gi number, this returns the accession
>>>    my $seqio = $gb->get_Stream_by_id(['13474692']);
>>>    while( my $seq = $seqio->next_seq ) {
>>>        print "seq is  ", $seq->display_id, "\n";
>>>    }
>>>     # not sure what i'm doing here 
>>>                                                                                                                                                                                                                                                                                                                                                               
>>>
>>>   $link2 = new Bio::Annotation::DBLink();
>>>   $link2->database('dbSNP');
>>>   $link2->primary_id('2367');
>>>
>>>
>>>
>>>
>>>
>>> Stefan Kirov wrote:
>>>
>>>> Kat,
>>>> If you are familiar with Bioperl it is kind of easy-
>>>> look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on 
>>>> how to get the protein record
>>>> Go through the dblinks and find the appropriate accession number 
>>>> (where the database method returns  GenBank).
>>>> Then retrieve this accession number(s) through Bio::DB::GenBank. If 
>>>> you are not familiar with Bioperl- read the docs for 
>>>> Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and 
>>>> Bio::Annotation::DBLink).
>>>> Hope this helps,
>>>> Stefan
>>>>
>>>> Kat Hull wrote:
>>>>
>>>>> Hi there,
>>>>> I was wondering whether anyone has a solution to my problem. I 
>>>>> have a list of protein assession numbers and want to retrieve the 
>>>>> corresponding nucleotide sequences automatically.  I thought it 
>>>>> would be possible to do this by changing the NCBI url, but this 
>>>>> doesn't seem to be the case.
>>>>> Is there a bio-perl module that can do this?
>>>>>
>>>>> Kind regards,
>>>>> Kat
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Stefan
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov at utk.edu
sao at ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"



More information about the Bioperl-l mailing list