network USA
David Mathog
mathog at mendel.bio.caltech.edu
Wed Apr 17 19:05:47 UTC 2002
Today I finally realized that the NCBi's PmFetch cgi
http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html
can be used to retrieve data via gi using a "simple" URL like this:
wget -O dmwhite.genbank \
'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text'
Unfortunately it seems not to be able to retrieve by either accession
number or
locus name - I'm still waiting to hear if there is some other NCBI
interface for that.
Which is a long way of coming around to considering how a USA could be
used to retrieve remote sequences without exposing end users to truly
hideous
constructs. The semantics of accessing arbitrary network databases are
probably much too complex to include in the USA but one can imagine
burying
these details under new types of "database" entries in the defaults
file. Something like this:
DB gigenbank [
method: remoteurlbyid
comment: "GENBANK at NCBI by gi number"
format: -
dir: -
file: -
type: N
#optional
target:
'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=$ID&report=gen&mode=text'
filter: 'wget -O - $target'
]
Which would then allow something like this to work transparently:
% seqret gigenbank:10873
The USA already has the "program" option but I think in a situation like
this it's
much too complex to actually use. How many users are going to be able
to successfully negotiate this:
% seqret -sequence=fasta::"wget -O -
'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=fasta&mode=text'
|" -filter
Anyway, what I'm proposing is that the database definition be extended
slightly
to allow remote accesss methods. This would be particularly helpful for
people
running EMBOSS on their own PCs or Macs, who tend not to have large
local databases installed.
Regards,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the EMBOSS
mailing list