[EMBOSS] How to find protein sequences in a given genome using CDS information
Peter Rice
pmr at ebi.ac.uk
Wed Feb 4 09:21:37 UTC 2009
Nermin Celik wrote:
> Hi,
>
> I have the CDS section of a feature table and a genome of an organism.
> Which EMBOSS program will allow me to extract the coding regions defined
> in the CDS file from the genome and then translate them to protein
> sequences?
>
> Example of CDS file:
> FT CDS 166..231
> FT /systematic_id="ROD00001"
> FT CDS 313..2775
> FT /systematic_id="ROD00011"
> FT CDS 2778..3707
Ah, that highlights something we meant to fix.
We have the application coderet that, in theory, will read the sequence and
the feature table and do exactly what you want.
Unfortunately the original author of coderet used a shortcut - it reads a
sequence database entry and parses the feature table. Not good.
However, what you can do is convert your genomic sequence and feature table
into an EMBL entry:
seqret -feature genomic.fasta -ufo embl::feature.table embl.entry
coderet embl.entry
GenBank entries also work in coderet.
We will be working on coderet to fix this and read feature data normally.
Any other suggestions for improvements are welcome.
regards,
Peter Rice
More information about the EMBOSS
mailing list