[Bioperl-l] How to set "complexity" param using EUtilities
Phillip San Miguel
pmiguel at purdue.edu
Wed Mar 24 13:49:55 UTC 2010
Just a little FYI that might help someone using GenBank efetch (here
with bioperl EUtilities) and, contrary to expectation, retrieving a
bunch of accessions (or GIs) when that single accession is what is
wanted. The trick is to change the "complexity" parameter from its
apparent default of "1" to "0".
Actually, this parameter might be worth adding to the HOWTO because it
causes the EUtilities efetch to perform similar to a normal Entrez
search. Which, to me, would be the expected behavior.
Details below.
Some accessions/GIs appear to be embedded in bundles of related
sequences. Here is an example:
gi|158819346|gb|EU011641.1|
If I search Entrez Nucleotide
http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar
with the either "158819346" (the GI) or "EU011641.1", I get a single
record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal RNA
gene, partial sequence". This what I want.
If I use the following code derived from the Eutils HOWTO:
use Bio::DB::EUtilities;
use Bio::SeqIO;
my @ids;
my $id ='gb|EU011641.1|';
push @ids ,$id;
my $factory = Bio::DB::EUtilities->new(
-eutil => 'efetch',
-db => 'nucleotide',
-rettype => 'genbank',
-id => \@ids);
my $file = "test.gb";
$factory->get_Response(-file => $file);
I get a bundle of accessions: EU011584-EU011663.
Same result using the GI number instead.
From reading:
http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam
it looks like I would get what I want were I to set the efetch
"complexity" parameter to "1".
But how do I set that parameter? Below is how I did it. Not the most
efficient path, but did not take that long to traverse...
The HowTo does not mention it. I usually look to the the Deobfuscator:
http://bioperl.org/cgi-bin/deob_interface.cgi
to help me when I want some documentation for a method. But this is a
parameter not a class. What class sets this parameter? Not sure. So I
googled:
complexity eutil site:bioperl.org
The top ranked hit is actually to the deprecated 1.5.2 version of
EUtilities. But the 2nd hit is to the (auto generatated?) email posted
to the bioperl-guts email list by Chris Fields upon his commit of the
new EUtilities overhaul:
http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html
From here it looks like the obvious way to set the parameter would be
possible. And indeed:
use Bio::DB::EUtilities;
use Bio::SeqIO;
my @ids;
my $id ='gb|EU011641.1|';
push @ids ,$id;
my $factory = Bio::DB::EUtilities->new(
-eutil => 'efetch',
-db => 'nucleotide',
-rettype => 'genbank',
-complexity =>1,
-id => \@ids);
my $file = "test.gb";
$factory->get_Response(-file => $file);
works!
Also a good idea to add -email parameter so that Genbank might chastise
me via email, rather than banning my IP, if I try to send more than 100
requests in a series outside of the acceptable 9PM-5AM Eastern Time hours.
Phillip
More information about the Bioperl-l
mailing list