[Bioperl-l] EUtilities term handling

Chris Fields cjfields at uiuc.edu
Thu Oct 5 14:51:28 UTC 2006

On Oct 5, 2006, at 9:08 AM, Sendu Bala wrote:

>>> This is actually a general question and not limited to  
>>> EUtilities. As I
>>> see it EUtiltiies lets you do queries in Bioperl that you can do  
>>> on a
>>> website. The question is, should a Bioperl module always work with
>>> queries that the website it is a front-end to works with?
>> I think yes, but stick to this definition.
>> Using your example, if you input 'BRCA2+9606[taxid]' on the Entrez
>> website it will actually not work. Hence, it should be no surprise  
>> that
>> it doesn't work either using Bio::DB::EUtilities.
> On the contrary, I find it a surprise because EUtilities is an  
> interface
> to NCBI's eutils, not the entrez website.

It uses NCBI's CGI interface for eutils, not the SOAP interface.   
Very different.  I have considered using the NCBI SOAP-based  
interface, but the web services are still somewhat incomplete, unlike  
the CGI interface.

> If I had previously read instructions on using eutils:
> http://www.ncbi.nlm.nih.gov/books/bv.fcgi? 
> rid=coursework.section.constructing-urls
> I might (do) expect that I /should/ use + in my term.

You are looking at part of the naked URL on that page.  Here's what  
that page says:

"When constructing URLs for the eUtils, please use lowercase  
characters for all parameters except &WebEnv. There is no required  
order for the URL parameters in an eUtils URL, and null values or  
inappropriate parameters are ignored. Avoid placing spaces in the  
URLs, particularly in queries. If a space is required, use a plus  
sign (+) instead of a space:

     * Incorrect: &id=352, 25125, 234, ...
     * Correct: &id=352,25125,234,...
     * Incorrect: &term=biomol mrna[properties] AND mouse[organism]
     * Correct: &term=biomol+mrna[properties]+AND+mouse[organism]

Other special characters, such as the # symbol used in referring to a  
query key on the History server, should be represented by their URL  
encodings (%23 for #).top link"

I use URI for building the URL with the parameters.  URI specifically  
encodes all of this for you, so spaces convert to '+' and '+'  
converts to %2B.

>> Aside from that, one of the advantages of having the service  
>> wrapped in
>> Bioperl is in fact that you can have it accept a wider variety of
>> parameters that the actual service would allow you to have, e.g.,
>> arrays, hashes, or whatever seems appropriate.
> I was going to suggest that terms be supplied as an array, leaving
> Bioperl code to decide how to 'AND' all the terms (elements in the
> array) together. It would also further force the user not to think of
> how eutils normally works, but to only consider the Bioperl  
> instructions
> on how to form a query. But I'm not sure of the value of all that.

Why do we need to intuit what the user is thinking at an particular  
time?  How would I know that someone actually wanted to search using  
the literal string 'abc+123' as opposed to 'abc 123'?

I see value in your last suggestion but I think a class or set of  
classes would be best suited for that:

MySQL Query     |  in                      out   | MySQL Query
Entrez Query    |-----> Generic Query class----->| Entrez Query
SRS Query       |                                | SRS Query
ad infinitum...

The generic query object could then be used in DB searches as an  
option besides using a raw string.  Though it would get tricky with  
SQL's complexity...

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign

More information about the Bioperl-l mailing list