[Bioperl-l] Entrez queries

Lincoln Stein lstein@cshl.org
Wed, 19 Jun 2002 09:06:41 -0400


Hi,

I got a very nice note from Jim Ostell explaining how the eutils allow you to 
perform Entrez queries.  It is quite simple.  You use the ESearch utility to 
perform the query and return the number of hits and a key for later 
retrieval.  You then use the key as an argument to EFetch to get the actual 
hits.

I can code this into Biofetch if no one has a burning desire to do it 
themselves.  Given my schedule it will probably happen Saturday.

Lincoln

> But I have not heard anything from you (again). However, I do have a
> response which perhaps you would consider, or post if you think it is
> sufficient.
>
> Basically we have purposely separated the function of Searching from
> Fetching. This is because very different tools with very different demands
> perform those two functions. When you only want one of them, we do not want
> to have to invoke the other.
>
> That said, we are well aware that it is convenient to do the search and
> retrieve the result as fetched data with minimal effort. However, for
> example, we would like it if people would check the search result before
> downloading, say, a million records.
>
> What we did is provide an argument to URL called "usehistory" which means
> put the URL hit list into storage on the NCBI server and return a key to
> it. You can then use that key to fetch the sequences. We put together a
> little sample PERL code to do this. The actual calls are just a couple
> lines. You can see it at the bottom of the documentation for the EUtil set
> at
> http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
>
> Does that fill the bill?
>
>    Jim
>
> > From: Lincoln Stein <lstein@cshl.org>
> > To: Jim Ostell <ostell@ncbi.nlm.nih.gov>
> > Subject: Re: Chinese Rice Genome
> > Date: Tue, 18 Jun 2002 13:44:35 -0400
> > MIME-Version: 1.0
> > X-Virus-Scanned: by amavisd-milter (http://amavis.org/)
> > X-Virus-Scanned: by amavisd-milter (http://amavis.org/)
> > X-Filter-Version: 1.8 (mail-blade5)
> > X-Spam-Status: No, hits=-2.0 required=5.5 tests=IN_REP_TO version=2.01
> > Content-Transfer-Encoding: 8bit
> > X-MIME-Autoconverted: from quoted-printable to 8bit by object.nlm.nih.gov
> > id
>
> g5II1H914896
>
> > Hi Jim,
> >
> > I reported it to the Beijing group directly and later indirectly via
> > intermediaries after they did not respond to me.  I had hoped that they
> > would fix it themselves.
> >
> > Please do not mention me when you discuss this with the Beijing group.  I
> > do not want to come across as a tattle-tale.
> >
> > Lincoln
> >
> > On Monday 17 June 2002 08:32 am, you wrote:
> > > Thanks for the information Lincoln. We will follow up on this. Did you
> > > report this to someone here and we didn't take action to correct it?
> > > This is a serious problem with the usability of this data and we
> > > certainly don't want to leave it out there if this is indeed the case.
> > >
> > > We did check the sequence they sent us, and there are no XXX in it. We
> > > released what they sent.
> > >
> > >   Jim
> > >
> > > > From: Lincoln Stein <lstein@cshl.org>
> > > > To: Jim Ostell <ostell@ncbi.nlm.nih.gov>
> > > > Subject: Re: Chinese Rice Genome
> > > > Date: Fri, 14 Jun 2002 17:20:21 -0400
> > > > User-Agent: KMail/1.4.1
> > > > MIME-Version: 1.0
> > > > X-Virus-Scanned: by amavisd-milter (http://amavis.org/)
> > > > X-Virus-Scanned: by amavisd-milter (http://amavis.org/)
> > > > X-Filter-Version: 1.8 (mail-blade5)
> > > > X-Spam-Status: No, hits=-2.0 required=5.5 tests=IN_REP_TO
> > > > version=2.01 Content-Transfer-Encoding: 8bit
> > > > X-MIME-Autoconverted: from quoted-printable to 8bit by
> > > > object.nlm.nih.gov id
> > >
> > > g5ELNA900745
> > >
> > > > Hi Jim,
> > > >
> > > > Yes, we've found by comparing the GenBank entries to the entries
> > > > downloadable from the Beijing site that they spliced out the
> > > > repeat-masked areas.  Areas that are shown as XXXX after their repeat
> > > > masking, are simply deleted in the GenBank entries.
> > > >
> > > > I don't think that GenBank did this, but that the chinese group did
> > > > it in the submission.
> > > >
> > > > If you compare the total length of the Beijing sequence entries, it
> > > > is much less than the published length in their paper.
> > > >
> > > > Hope this is helpful,
> > > >
> > > > Lincoln
> > > >
> > > > On Friday 14 June 2002 09:58, Jim Ostell wrote:
> > > > > Hi Lincoln,
> > > > >
> > > > > I recently heard a comment about the Chinese rice genome submission
> > > > > to GenBank that was attributed to you. I didn't fully understand
> > > > > it, so I hope you could give me more information.
> > > > >
> > > > > What was said to me was that you felt the Chinese rice genome
> > > > > sequence was not particularly useful because regions of it had been
> > > > > masked with XXXX, that the masking was left in when it was
> > > > > submitted to GenBank, and that we converted the masked regions to
> > > > > gaps and collapsed the sequence.
> > > > >
> > > > > Does this sound familiar? Could you tell me anymore details?
> > > > >
> > > > >    thanks,
> > > > >    Jim
> > > > >
> > > > > _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_
> > > > >/_/_ /_/_ /_/
> > > > >
> > > > >     James Ostell, Ph.D.
> > > > >     Chief, Information Engineering Branch
> > > > >     National Center For Biotechnology Information
> > > > >     Bldg 38A, NIH
> > > > >     8600 Rockville Pike              _/        _/  _/_/_/_/  _/_/_/
> > > > > _/_/_/ Bethesda, MD 20894              _/_/      _/  _/    _/  _/  
> > > > > _/ _/ USA                            _/  _/    _/  _/       
> > > > > _/_/_/_/ _/ _/ _/  _/  _/    _/  _/    _/    _/ 301-435-5978       
> > > > >          _/ _/_/  _/_/_/_/  _/_/_/_/  _/_/_/ 301-480-9241  FAX
> > > > >     ostell@ncbi.nlm.nih.gov
> > > >
> > > > --
> > > > =====================================================================
> > > >=== Lincoln D. Stein                           Cold Spring Harbor
> > > > Laboratory lstein@cshl.org			                  Cold Spring Harbor, NY
> > > > =====================================================================
> > > >===
> > >
> > > _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_
> > >/_/_ /_/
> > >
> > >     James Ostell, Ph.D.
> > >     Chief, Information Engineering Branch
> > >     National Center For Biotechnology Information
> > >     Bldg 38A, NIH
> > >     8600 Rockville Pike              _/        _/  _/_/_/_/  _/_/_/
> > > _/_/_/ Bethesda, MD 20894              _/_/      _/  _/    _/  _/   _/
> > > _/ USA                            _/  _/    _/  _/        _/_/_/_/   
> > > _/ _/ _/  _/  _/    _/  _/    _/    _/ 301-435-5978                 _/
> > > _/_/  _/_/_/_/  _/_/_/_/  _/_/_/ 301-480-9241  FAX
> > >     ostell@ncbi.nlm.nih.gov
>
> _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_
>/_/
>
>     James Ostell, Ph.D.
>     Chief, Information Engineering Branch
>     National Center For Biotechnology Information
>     Bldg 38A, NIH
>     8600 Rockville Pike              _/        _/  _/_/_/_/  _/_/_/   
> _/_/_/ Bethesda, MD 20894              _/_/      _/  _/    _/  _/   _/    
> _/ USA                            _/  _/    _/  _/        _/_/_/_/    _/ _/
>    _/  _/  _/    _/  _/    _/    _/ 301-435-5978                 _/     
> _/_/  _/_/_/_/  _/_/_/_/  _/_/_/ 301-480-9241  FAX
>     ostell@ncbi.nlm.nih.gov