[EMBOSS] Batch retrieval of taxonomy/species names using entret.....

Peter Rice pmr at ebi.ac.uk
Tue Oct 31 18:53:00 UTC 2006

Hi Richard,

Richard Rothery wrote:
> I am interested in using entret to retrieve single field entries from
> swissprot or sptrembl. Specifically, I would like to feed entret a list
> of accessions and have it return a file with the species names and/or
> taxonomies. I intend to use this information to compare with my
> phylogeny analyses of clustalw alignments.

EMBOSS stores the full text in entret without parsing.

We could try to extract specific fields but it is not easy to define them for 
all formats.

You can do this with SRS. Try the EBI server for example:

Go to the library page

Select UniProtKB/SwissProt (or UniProtKB/TrEMBL)

Select "standard query form"

Enter your query in the top part (e.g. accession number)

In the "create a view" section click the "list" button to egt the original 
lines. Select anything taxonomic from the pull down list (control-click to 
select more than one)

Press "search".

refine your query. You will see the URL at the top that can be used to retrieve 
data when you are happy.

Failing that, you could just parse out the ID and O* lines from entret using a 
simple perl script.

Hope that helps,


