[Bioperl-l] Getting nucleotide seq from protein accession
Kat Hull
khh103 at york.ac.uk
Mon Jun 27 06:47:57 EDT 2005
Hi Stefan,
I'm still having problems!
I get some features from the sequence object but can't get past this
part in the code:
$ac=$feat->annotation(); # This is not returning anything.
Could you look at the code to tell me where its going wrong?
Thanks again,
#________________________________________
#!/biol/programs/perl580/bin/perl
use Bio::Annotation::Collection;
use Bio::DB::GenPept;
$gb = new Bio::DB::GenPept;
my $seqio = $gb->get_Stream_by_id(['13474692']);
my $ac;
my $c;
while( my $seq = $seqio->next_seq ) {
print "seq is ", $seq->display_id, "\n";
my @f=$seq->get_all_SeqFeatures; #This gives you the
annotation of the retrieved sequence object
foreach my $feat (@f) {
++$c;
print "Feature ",$feat->primary_tag," starts ",$feat->start," ends ",
$feat->end," strand ",$feat->strand,"\n";
# features retain link to underlying sequence object
#print "Feature sequence is ",$feat->seq->seq(),"\n";
my $t = $feat->feature_count;
print "Feature count is $t\n";
print "count is $c\n";
$ac=$feat->annotation(); # PROBLEM SEEMS TO BE HERE
my $blah=$ac->get_all_annotation_keys();
print "GETS TO HERE WITH NO KEYS $blah\n";
foreach $key ( $ac->get_all_annotation_keys() ) {
print "DOESN'T GET TO HERE\n";
@values = $ac->get_Annotations($key);
foreach $value ( @values ) {
# value is an Bio::AnnotationI, and defines a "as_text"
method
print "Annotation ",$key," stringified value
",$value->as_text,"\n";
# also defined hash_tree method, which allows data
orientated
# access into this object
$hash = $value->hash_tree();
}
}
# commented out for now
# next unless ($ac->get_Annotations('coded_by'));
# my @coded=$ac->get_Annotations('coded_by');
# foreach my $location (@coded) {
# print $location->value, " is the location that codes this
protein\n";
# }
}
}
Stefan Kirov wrote:
> my $seqio = $gb->get_Stream_by_id(['13474692']);
> while( my $seq = $seqio->next_seq ) {
> print "seq is ", $seq->display_id, "\n";
> my $ann=$seq->annotation; #This gives you the annotation of the
> retrieved sequence object
> foreach my $dblink ($ann->get_Annotations('DBLink')) {
> if ($dblink->database =~/refseq/i) {
> print $database->primary_id, " is the mRNA accession
> number\n";
> }
> }
> }
> However, the gene you are looking at is not associated with any NM_
> sequence, but rather comes from NC_. Therefore the above will not work
> for you. You will have to descend through the sequence features and
> find teh feature that says 'coded_by':
> use Bio::DB::GenPept;
> my $gb=new Bio::DB::GenPept;
> my $seqio = $gb->get_Stream_by_id(['13474692']);
> while( my $seq = $seqio->next_seq ) {
> print "seq is ", $seq->display_id, "\n";
> my @f=$seq->get_SeqFeatures; #This gives you the annotation of
> the retrieved sequence object
> foreach my $feat (@f) {
> my $ann=$feat->annotation;
> next unless ($ann->get_Annotations('coded_by'));
> my @coded=$ann->get_Annotations('coded_by');
> foreach my $location (@coded) {
> print $location->value, " is the location that codes this
> protein\n";
> }
> }
> }
> No guarantees the code is typo free :-)
> Stefan
>
> Kat Hull wrote:
>
>> Hi Stefan,
>> Thanks for your advice but i'm still struggling! I have used
>> Bio::DB::GenPept to get the protein accession number given the
>> protein gi number. However, I don't understand how
>> Bio::Annotation::DBLink works. Does it fetch the url of a link on
>> the web-site? Basically, if I could use this (or something else) to
>> get the url of the CDS link for my protein of interest, I can get the
>> corresponding nucleotide accession from this, as it is encoded in the
>> url.
>> Do you know how to use this module? Is this what you were suggesting
>> I try yesterday (I didn't really understand what you were getting at).
>> Many thanks,
>>
>> Kat
>>
>> ps. Here's where i'm at so far:
>>
>>
>> use Bio::Annotation::DBLink;
>> use Bio::DB::GenBank;
>> use Bio::DB::GenPept;
>> $gb = new Bio::DB::GenPept;
>>
>>
>> # given the gi number, this returns the accession
>> my $seqio = $gb->get_Stream_by_id(['13474692']);
>> while( my $seq = $seqio->next_seq ) {
>> print "seq is ", $seq->display_id, "\n";
>> }
>> # not sure what i'm doing here
>>
>>
>> $link2 = new Bio::Annotation::DBLink();
>> $link2->database('dbSNP');
>> $link2->primary_id('2367');
>>
>>
>>
>>
>>
>> Stefan Kirov wrote:
>>
>>> Kat,
>>> If you are familiar with Bioperl it is kind of easy-
>>> look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on how
>>> to get the protein record
>>> Go through the dblinks and find the appropriate accession number
>>> (where the database method returns GenBank).
>>> Then retrieve this accession number(s) through Bio::DB::GenBank. If
>>> you are not familiar with Bioperl- read the docs for
>>> Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and
>>> Bio::Annotation::DBLink).
>>> Hope this helps,
>>> Stefan
>>>
>>> Kat Hull wrote:
>>>
>>>> Hi there,
>>>> I was wondering whether anyone has a solution to my problem. I have
>>>> a list of protein assession numbers and want to retrieve the
>>>> corresponding nucleotide sequences automatically. I thought it
>>>> would be possible to do this by changing the NCBI url, but this
>>>> doesn't seem to be the case.
>>>> Is there a bio-perl module that can do this?
>>>>
>>>> Kind regards,
>>>> Kat
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>>
>>> Stefan
>>
>>
>>
>>
>>
>
More information about the Bioperl-l
mailing list