[Open-bio-l] Re: network USA

Jason Stajich jason@cgt.mc.duke.edu
Wed, 17 Apr 2002 20:21:12 -0400 (EDT)


We've tried to piece together a standard for sequence access called OBDA
(obda.open-bio.org) at the Open Bio Hackathons earlier this month.  Things
of course haven't progressed a whole lot since we were all able to sit in
the same room.  However we made a lot of noise about the NCBI's lack of
playing nicely wrt to proving a simple HTTP access to data - unlink the
EBI http://www.ebi.ac.uk/cgi-bin/dbfetch system which lets you provide an
accession, format, and database in the URL.

As part of this we've
designed a bio-registry project which would list database resources that
essentially complied to a standard of sorts.   Like I said, things have
been a bit slow, but I'd like to see EMBOSS utilize this standard too if
possible.  Happy to discuss more details on our mailing list -
open-bio-l@open-bio.org if you are interested.

http://open-bio.org/mailman/listinfo/open-bio-l


-jason

On Wed, 17 Apr 2002, David Mathog wrote:

> Today I finally realized that the NCBi's PmFetch cgi
>
>   http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html
>
> can be used to retrieve data via gi using a "simple" URL like this:
>
> wget -O dmwhite.genbank \
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text'
>
> Unfortunately it seems not to be able to retrieve by either accession
> number or
> locus name - I'm still waiting to hear if there is some other NCBI
> interface for that.
>
> Which is a long way of coming around to considering how a USA could be
> used to retrieve remote sequences without exposing end users to truly
> hideous
> constructs.  The semantics of accessing arbitrary network databases are
> probably much too complex to include in the USA but one can imagine
> burying
> these details under new types of "database" entries in the defaults
> file. Something like this:
>
> DB gigenbank [
>   method: remoteurlbyid
>   comment: "GENBANK at NCBI by gi number"
>   format: -
>   dir: -
>   file: -
>   type: N
> #optional
>   target:
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=$ID&report=gen&mode=text'
>   filter: 'wget -O - $target'
> ]
>
> Which would then allow something like this to work transparently:
>
> % seqret gigenbank:10873
>
> The USA already has the "program" option but I think in a situation like
> this it's
> much too complex to actually use.  How many users are going to be able
> to successfully negotiate this:
>
> % seqret -sequence=fasta::"wget -O -
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=fasta&mode=text'
> |" -filter
>
> Anyway, what I'm proposing is that the database definition be extended
> slightly
> to allow remote accesss methods.  This would be particularly helpful for
> people
> running EMBOSS on their own PCs or Macs, who tend not to have large
> local databases installed.
>
> Regards,
>
> David Mathog
> mathog@caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu