[Open-bio-l] Re: network USA
Jason Stajich
jason@cgt.mc.duke.edu
Wed, 17 Apr 2002 20:21:12 -0400 (EDT)
We've tried to piece together a standard for sequence access called OBDA
(obda.open-bio.org) at the Open Bio Hackathons earlier this month. Things
of course haven't progressed a whole lot since we were all able to sit in
the same room. However we made a lot of noise about the NCBI's lack of
playing nicely wrt to proving a simple HTTP access to data - unlink the
EBI http://www.ebi.ac.uk/cgi-bin/dbfetch system which lets you provide an
accession, format, and database in the URL.
As part of this we've
designed a bio-registry project which would list database resources that
essentially complied to a standard of sorts. Like I said, things have
been a bit slow, but I'd like to see EMBOSS utilize this standard too if
possible. Happy to discuss more details on our mailing list -
open-bio-l@open-bio.org if you are interested.
http://open-bio.org/mailman/listinfo/open-bio-l
-jason
On Wed, 17 Apr 2002, David Mathog wrote:
> Today I finally realized that the NCBi's PmFetch cgi
>
> http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html
>
> can be used to retrieve data via gi using a "simple" URL like this:
>
> wget -O dmwhite.genbank \
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text'
>
> Unfortunately it seems not to be able to retrieve by either accession
> number or
> locus name - I'm still waiting to hear if there is some other NCBI
> interface for that.
>
> Which is a long way of coming around to considering how a USA could be
> used to retrieve remote sequences without exposing end users to truly
> hideous
> constructs. The semantics of accessing arbitrary network databases are
> probably much too complex to include in the USA but one can imagine
> burying
> these details under new types of "database" entries in the defaults
> file. Something like this:
>
> DB gigenbank [
> method: remoteurlbyid
> comment: "GENBANK at NCBI by gi number"
> format: -
> dir: -
> file: -
> type: N
> #optional
> target:
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=$ID&report=gen&mode=text'
> filter: 'wget -O - $target'
> ]
>
> Which would then allow something like this to work transparently:
>
> % seqret gigenbank:10873
>
> The USA already has the "program" option but I think in a situation like
> this it's
> much too complex to actually use. How many users are going to be able
> to successfully negotiate this:
>
> % seqret -sequence=fasta::"wget -O -
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=fasta&mode=text'
> |" -filter
>
> Anyway, what I'm proposing is that the database definition be extended
> slightly
> to allow remote accesss methods. This would be particularly helpful for
> people
> running EMBOSS on their own PCs or Macs, who tend not to have large
> local databases installed.
>
> Regards,
>
> David Mathog
> mathog@caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>
--
Jason Stajich
Duke University
jason@cgt.mc.duke.edu