[EMBOSS] getting organism from accession

Tiwari, Bela btiwari at ceh.ac.uk
Fri Jun 18 12:07:09 UTC 2010


Dear all, 

I have a set of accession numbers and I want to retrieve the organism that the sequence is associated with - i.e. the content of the OS line in an embl file.  I don't need the taxonomic id, and I don't need to start traversing taxonomy trees. I want to do this by accessing remote databases (via srs, as configured in my emboss.defaults file), rather than indexing databases locally.  So the output I want would be a text mapping like:

accession : species    

where species is taken from the OS line of a database entry.

The closest I've made it to using Emboss is to get the gff output file containing feature information using a command along the lines of:

seqret -feature embl:XXXX -oufo2 myfeat.txt

(embl is a database I can search using srs as configured in my emboss.defaults file.) The first non-hashed line in the file myfeat.txt contains the term 

"organism="Whateverus thingus"  

so I could parse that out. However, this file still contains a lot of extra (unwanted) information and requires parsing.

Does anyone know if I'm missing something obvious in Emboss that I could use for this?

(I have tried the BioPerl route to get this info from the NCBI, and apart from being unwieldly, I'm managing to get the wrong organism returned for the type of identifer I have. No, I haven't spent time tracking down the problem - frankly, I'd rather resove it using Emboss and/or srs calls.)

If there isn't anything that will do the job in Emboss at the moment, is there any chance I can put in a development request for an extra flag for seqret, or an extra utility tool that might accomplish this task?

cheers,

Bela

*************************
Dr. Bela Tiwari
Lead Bioinformatician
NERC Environmental
Bioinformatics Centre
http://nebc.nerc.ac.uk
tel: 01491 69 2705

Centre for Ecology and Hydrology
Maclean Bldg, Benson Lane
Crowmarsh Gifford
Wallingford, England
OX10 8BB
*************************
-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.





More information about the EMBOSS mailing list