[Bioperl-l] How to set "complexity" param using EUtilities
Phillip San Miguel
pmiguel at purdue.edu
Wed Mar 24 14:59:50 UTC 2010
Sorry, I got that backwards. The default is "0", apparently. But to get
entrez-like performance you want "complexity" to be set to "1".
Phillip
Phillip San Miguel wrote:
> Just a little FYI that might help someone using GenBank efetch (here
> with bioperl EUtilities) and, contrary to expectation, retrieving a
> bunch of accessions (or GIs) when that single accession is what is
> wanted. The trick is to change the "complexity" parameter from its
> apparent default of "1" to "0".
>
> Actually, this parameter might be worth adding to the HOWTO because it
> causes the EUtilities efetch to perform similar to a normal Entrez
> search. Which, to me, would be the expected behavior.
>
> Details below.
>
> Some accessions/GIs appear to be embedded in bundles of related
> sequences. Here is an example:
>
> gi|158819346|gb|EU011641.1|
>
>
> If I search Entrez Nucleotide
>
> http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar
>
> with the either "158819346" (the GI) or "EU011641.1", I get a single
> record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal
> RNA gene, partial sequence". This what I want.
>
> If I use the following code derived from the Eutils HOWTO:
>
> use Bio::DB::EUtilities;
> use Bio::SeqIO;
> my @ids;
> my $id ='gb|EU011641.1|';
> push @ids ,$id;
> my $factory = Bio::DB::EUtilities->new(
> -eutil => 'efetch',
> -db => 'nucleotide',
> -rettype => 'genbank',
> -id => \@ids);
>
> my $file = "test.gb";
> $factory->get_Response(-file => $file);
>
> I get a bundle of accessions: EU011584-EU011663.
> Same result using the GI number instead.
>
> From reading:
>
> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam
>
>
> it looks like I would get what I want were I to set the efetch
> "complexity" parameter to "1".
>
> But how do I set that parameter? Below is how I did it. Not the most
> efficient path, but did not take that long to traverse...
>
> The HowTo does not mention it. I usually look to the the Deobfuscator:
>
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> to help me when I want some documentation for a method. But this is a
> parameter not a class. What class sets this parameter? Not sure. So I
> googled:
>
> complexity eutil site:bioperl.org
>
> The top ranked hit is actually to the deprecated 1.5.2 version of
> EUtilities. But the 2nd hit is to the (auto generatated?) email posted
> to the bioperl-guts email list by Chris Fields upon his commit of the
> new EUtilities overhaul:
>
> http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html
>
>
> From here it looks like the obvious way to set the parameter would be
> possible. And indeed:
>
>
> use Bio::DB::EUtilities;
> use Bio::SeqIO;
> my @ids;
> my $id ='gb|EU011641.1|';
> push @ids ,$id;
> my $factory = Bio::DB::EUtilities->new(
> -eutil => 'efetch',
> -db => 'nucleotide',
> -rettype => 'genbank',
> -complexity =>1,
> -id => \@ids);
>
> my $file = "test.gb";
> $factory->get_Response(-file => $file);
>
> works!
>
> Also a good idea to add -email parameter so that Genbank might
> chastise me via email, rather than banning my IP, if I try to send
> more than 100 requests in a series outside of the acceptable 9PM-5AM
> Eastern Time hours.
>
> Phillip
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list