[Bioperl-l] Getting nucleotide seq from protein accession

Stefan Kirov skirov at utk.edu
Fri Jun 24 12:06:54 EDT 2005


   my $seqio = $gb->get_Stream_by_id(['13474692']);
   while( my $seq = $seqio->next_seq ) {
       print "seq is  ", $seq->display_id, "\n";
       my $ann=$seq->annotation; #This gives you the annotation of the 
retrieved sequence object
      foreach my $dblink ($ann->get_Annotations('DBLink')) {
          if ($dblink->database =~/refseq/i) {
             print $database->primary_id, " is the mRNA accession number\n";
          }
       }
   }
However, the gene you are looking at is not associated with any NM_ 
sequence, but rather comes from NC_. Therefore the above will not work 
for you. You will have to descend through the sequence features and find 
teh feature that says 'coded_by':
use Bio::DB::GenPept;
my $gb=new Bio::DB::GenPept;
  my $seqio = $gb->get_Stream_by_id(['13474692']);
   while( my $seq = $seqio->next_seq ) {
       print "seq is  ", $seq->display_id, "\n";
       my @f=$seq->get_SeqFeatures; #This gives you the annotation of 
the retrieved sequence object
      foreach my $feat (@f) {
        my $ann=$feat->annotation;
        next unless ($ann->get_Annotations('coded_by'));
        my @coded=$ann->get_Annotations('coded_by');
        foreach my $location (@coded) {
          print $location->value, " is the location that codes this 
protein\n";
        }
       }
   }
No guarantees the code is typo free :-)
Stefan

Kat Hull wrote:

> Hi Stefan,
> Thanks for your advice but i'm still struggling!   I have used 
> Bio::DB::GenPept to get the protein accession number given the protein 
> gi number.  However, I don't understand how Bio::Annotation::DBLink 
> works.  Does it fetch the url of a link on the web-site?   Basically, 
> if I could use this (or something else) to get the url of the CDS link 
> for my protein of interest, I can get the corresponding nucleotide 
> accession from this, as it is encoded in the url.
> Do you know how to use this module?  Is this what you were suggesting 
> I try yesterday (I didn't really understand what you were getting at).
> Many thanks,
>
> Kat
>
> ps. Here's where i'm at so far:
>                                                                                                                                                                                    
>
> use Bio::Annotation::DBLink;
> use Bio::DB::GenBank;
> use Bio::DB::GenPept;
>   $gb = new Bio::DB::GenPept;
>                                                                                                                                                                                                                                                                                                                                                                                                                                                  
>
>    # given the gi number, this returns the accession
>    my $seqio = $gb->get_Stream_by_id(['13474692']);
>    while( my $seq = $seqio->next_seq ) {
>        print "seq is  ", $seq->display_id, "\n";
>    }
>     # not sure what i'm doing here 
>                                                                                                                                                                                                                                                                                                                                                               
>
>   $link2 = new Bio::Annotation::DBLink();
>   $link2->database('dbSNP');
>   $link2->primary_id('2367');
>
>
>
>
>
> Stefan Kirov wrote:
>
>> Kat,
>> If you are familiar with Bioperl it is kind of easy-
>> look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on how 
>> to get the protein record
>> Go through the dblinks and find the appropriate accession number 
>> (where the database method returns  GenBank).
>> Then retrieve this accession number(s) through Bio::DB::GenBank. If 
>> you are not familiar with Bioperl- read the docs for 
>> Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and 
>> Bio::Annotation::DBLink).
>> Hope this helps,
>> Stefan
>>
>> Kat Hull wrote:
>>
>>> Hi there,
>>> I was wondering whether anyone has a solution to my problem. I have 
>>> a list of protein assession numbers and want to retrieve the 
>>> corresponding nucleotide sequences automatically.  I thought it 
>>> would be possible to do this by changing the NCBI url, but this 
>>> doesn't seem to be the case.
>>> Is there a bio-perl module that can do this?
>>>
>>> Kind regards,
>>> Kat
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>>
>> Stefan
>
>
>
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov at utk.edu
sao at ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"



More information about the Bioperl-l mailing list