[EMBOSS] How to find protein sequences in a given genome using CDS information
Rodrigo Lopez
rls at ebi.ac.uk
Wed Feb 4 09:32:15 UTC 2009
Hi Nermin,
To complement Guy's reply: You could also use the EMBLCDS database. This
one contains all CDSs in EMBL-Bank (soon to be called ENA = European
Nucleotide Archive). This one is available via the EBI's ftp server at
pub/databases/embl/cds. The identifiers in this database correspond to
the protein_id feature in the EMBL-Bank Feature Table which maps each
CDS to corresponding protein translation. These in turn can be
identified in UniProtKB. Please see the README.txt file at:
ftp.ebi.ac.uk/pub/databases/embl/cds/README.txt
for further details.
Further to the above, and depending on the proteome in question, you
could have a look at the integr8 directory on the ftp server as well:
ftp.ebi.ac.uk/pub/databases/integr8
In here you will find the proteomes of more than 1600 organisms, mainly
bacteria and archea, but also human, rat, mouse, etc.
R:)
Nermin Celik wrote:
> Hi,
>
> I have the CDS section of a feature table and a genome of an organism.
> Which EMBOSS program will allow me to extract the coding regions defined
> in the CDS file from the genome and then translate them to protein
> sequences?
>
> Example of CDS file:
> FT CDS 166..231
> FT /systematic_id="ROD00001"
> FT CDS 313..2775
> FT /systematic_id="ROD00011"
> FT CDS 2778..3707
>
> Thank you.
> Nermin
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
More information about the EMBOSS
mailing list