[Bioperl-l] Using Utilities to retrieve multiple coding sequences for identical (WP_) protein sequences

Fields, Christopher J cjfields at illinois.edu
Fri May 1 02:54:24 UTC 2015


Warren, Peter, 

The below works for me; you should be able to grab the IDs from the link sets returned if you iterate through them (3 different link sets in this example, you may want to see if there is a specific subset you need).

I’m guessing the Biopython version just lacked the ‘db’ setting?  Or does it default to ‘nuccore’?

-----------------------------------
use Bio::DB::EUtilities;

my $eutil = Bio::DB::EUtilities->new(-eutil     => 'elink',
                                     -dbfrom    => 'protein',
                                     -db        => 'nuccore',
                                     -id        => '446211235’, # WP_000289090.1 
                                     -email     => 'cjfields at illinois.edu');

$eutil->print_all;
-----------------------------------

chris

> On Apr 30, 2015, at 7:51 PM, Fields, Christopher J <cjfields at illinois.edu> wrote:
> 
> According to that post, Ivan wasn’t able to access the data via elink (see question at bottom); any idea whether he received an answer?  I’ll have to look at whether this is possible via Bio::DB::EUtilities.
> 
> chris
> 
>> On Apr 27, 2015, at 3:52 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> 
>> Good point. Ivan Erill asked about this on the Biopython list late
>> last year - presumably the same solution would apply there too?:
>> 
>> http://lists.open-bio.org/pipermail/biopython/2014-October/015438.html
>> 
>> See also:
>> ftp://ftp.ncbi.nlm.nih.gov/refseq/release/announcements/WP-proteins-06.10.2013.pdf
>> 
>> Peter
>> 
>> On Mon, Apr 27, 2015 at 6:35 PM, Warren Gallin <wgallin at ualberta.ca> wrote:
>>> With the advent of the WP_   accession series in RefSeq there is no longer a direct link between a single protein sequence and its encoding nucleotide sequence.
>>> 
>>> It is possible to find the multiple individual nucleotide records encoding the identical protein sequences on the Web interface through the “Identical Proteins” link, which generates a list of all of the coding sequences for the identical protein sequence.
>>> 
>>> Is there any way to work through these linkages using Bio::DB::Utilities?
>>> 
>>> Thanks,
>>> 
>>> Warren Gallin
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list