[EMBOSS] accessing remote databases

Hamish McWilliam hpm at ebi.ac.uk
Thu Feb 13 17:01:29 UTC 2014


Hi Richard,

[Pushing the thread back over to the list so other interested parties
can participate]

> I was wondering if you could provide a new emboss.default file defined 
> access to uniprot and swissprot (or at least the code to insert in my 
> current file). Does the latest version of EMBOSS come with this by 
> default? I am running 6.3.1 with the Ubuntu 12.04 OS.

Data server support was added in EMBOSS 6.4.0, along with the associated
default server definitions. These include a number of servers which
provide access to UniProtKB or UniProtKB/SwissProt. Newer EMBOSS
versions are available in more recent Ubuntu versions, and the next
Ubuntu LTS will likely provide EMBOSS 6.6.0.

For older versions of EMBOSS without server support, you can find a set
of EMBOSS database definitions for databases available via dbfetch at:

http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/emboss4.databases

And for really old versions (i.e. pre EMBOSS 4.0.0), which do not
support the 'dbfetch' access method, you could use the 'url' based
equivalents described at:

http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/emboss1.databases

These dbfetch pages provide definitions for all the sequence databases
available via dbfetch (see
http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/dbfetch.databases) including
EMBL-Bank, EMBLCDS, UniParc, UniProtKB, and the UniRef databases.

For direct access to UniProtKB data from http://www.uniprot.org/ you can
use something like:

# UniProtKB
DB uniprot [
        type: P
        comment: "UniProtKB (UniProt.org)"
        method: url
        format: swiss
        url: "http://www.uniprot.org/uniprot/%s.txt"
        fields: "id acc"
]

to get basic entry name and accession based entry look-up.

For more complex search options see the UniProt.org Web Service
documentation: http://www.uniprot.org/faq/28

> In the old setup I had a script (not written by myself!) that allowed me 
> to pull taxonomies from uniprot/swissprot accessions.

Depending on how the script was implemented and exactly what it did, one
of the options detailed above may provide a suitable data source
replacement. Alternatively many of the public SRS servers provide
UniProtKB, so you could just switch to one of them.

All the best,

Hamish

> Thanks!
> 
> Richard Rothery
> 
> On 14-02-13 07:46 AM, Hamish McWilliam wrote:
>> Hi Iddo,
>>
>>> Since the SRS server at EBI was retired, I am looking for other remote
>>> databasest to accessvia EMBOSS. The DKFZ server seems to do a mostly good
>>> job (although slow from where I'm at):
>>>
>>> http://www.dkfz.de/menu/cgi-bin/srs7.1.3.1/wgetz
>>>
>>> However, I was wondering how to access genbank via EMBOSS (thorugh any
>>> protocoal) , what would be the entry in .embossrc?
>>>
>>> Also, are there SRS servers I can use in N. America that would hopefully be
>>> faster?
>> For details of public SRS servers, see the "Public SRS Installations" at:
>>
>> http://bioblog.instem.com/download/srs-parser-and-software-downloads/public-srs-installations/
>>
>> Current versions of EMBOSS come with a number of data sources configured
>> which are accessed via the data server support. You can see details of
>> the configured servers using the showserver command:
>>
>> http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/showserver.html
>>
>> And you can access entries via these services by using the slightly
>> extended USA which specifies the server as well as the database, for
>> example:
>>
>> entret -stdout -auto dbfetch:embl:L12345
>>
>> to get EMBL-Bank data from dbfetch, or to get the same entry from NCBI
>> Entrez:
>>
>> entret -stdout -auto entrez:nucleotide:L12345
>>
>> Since NCBI's GenBank is part of the INSDC (http://www.insdc.org/), the
>> data in GenBank is also available in ENA EMBL-Bank and DDBJ. So you
>> could use the existing server definitions containing EMBL-Bank or DDBJ.
>>
>> Alternatively you can define your own (see
>> http://emboss.open-bio.org/html/adm/ch04s01.html) to access GenBank via
>> NCBI's E-Utitlites (http://eutils.ncbi.nlm.nih.gov/), for example:
>>
>> # NCBI GenBank+RefSeq via NCBI Entrez
>> DB nucleotide [
>>     type: nucleotide
>>     method: entrez
>>     format: genbank
>> ]
>>
>> Since NCBI have also recently released command-line clients for their
>> E-Utilities Web Services
>> (http://www.ncbi.nlm.nih.gov/news/02-06-2014-entrez-direct-released/)
>> another option would be to use these directly or wrap them as EMBOSS
>> database definitions for your commonest queries.
>>
>> All the best,
>>
>> Hamish
> 
> 


-- 
============================================================
Mr Hamish McWilliam,
Web Production,
European Bioinformatics Institute (EMBL-EBI),
European Molecular Biology Laboratory,
Wellcome Trust Genome Campus,
Hinxton, Cambridge, CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk/
============================================================




More information about the EMBOSS mailing list