[EMBOSS] getting organism from accession

Tiwari, Bela btiwari at ceh.ac.uk
Fri Jun 18 15:05:07 UTC 2010


Thanks Richard. The script looks handy and at minimum may help me deal with searching for my identifiers in more than one database.

cheers,

Bela


*************************
Dr. Bela Tiwari
Lead Bioinformatician
NERC Environmental
Bioinformatics Centre
http://nebc.nerc.ac.uk
tel: 01491 69 2705

Centre for Ecology and Hydrology
Maclean Bldg, Benson Lane
Crowmarsh Gifford
Wallingford, England
OX10 8BB
*************************
________________________________________
From: Richard Rothery [rrothery at ualberta.ca]
Sent: 18 June 2010 15:54
To: Tiwari, Bela
Cc: emboss at lists.open-bio.org
Subject: Re: [EMBOSS] getting organism from accession

I have a perl script written by Craig Knox of the University of Alberta
Bioinformatics help desk that does this. I have attached it FYI. It is
slow, but gets the job done.

Output can be fed to gnumeric etc.

Richard




On Fri, 2010-06-18 at 13:07 +0100, Tiwari, Bela wrote:
> Dear all,
>
> I have a set of accession numbers and I want to retrieve the organism that the sequence is associated with - i.e. the content of the OS line in an embl file.  I don't need the taxonomic id, and I don't need to start traversing taxonomy trees. I want to do this by accessing remote databases (via srs, as configured in my emboss.defaults file), rather than indexing databases locally.  So the output I want would be a text mapping like:
>
> accession : species
>
> where species is taken from the OS line of a database entry.
>
> The closest I've made it to using Emboss is to get the gff output file containing feature information using a command along the lines of:
>
> seqret -feature embl:XXXX -oufo2 myfeat.txt
>
> (embl is a database I can search using srs as configured in my emboss.defaults file.) The first non-hashed line in the file myfeat.txt contains the term
>
> "organism="Whateverus thingus"
>
> so I could parse that out. However, this file still contains a lot of extra (unwanted) information and requires parsing.
>
> Does anyone know if I'm missing something obvious in Emboss that I could use for this?
>
> (I have tried the BioPerl route to get this info from the NCBI, and apart from being unwieldly, I'm managing to get the wrong organism returned for the type of identifer I have. No, I haven't spent time tracking down the problem - frankly, I'd rather resove it using Emboss and/or srs calls.)
>
> If there isn't anything that will do the job in Emboss at the moment, is there any chance I can put in a development request for an extra flag for seqret, or an extra utility tool that might accomplish this task?
>
> cheers,
>
> Bela
>
> *************************
> Dr. Bela Tiwari
> Lead Bioinformatician
> NERC Environmental
> Bioinformatics Centre
> http://nebc.nerc.ac.uk
> tel: 01491 69 2705
>
> Centre for Ecology and Hydrology
> Maclean Bldg, Benson Lane
> Crowmarsh Gifford
> Wallingford, England
> OX10 8BB
> *************************


-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.





More information about the EMBOSS mailing list