[EMBOSS] How to find protein sequences in a given genome using CDS information

Magdy Alabady malabady at gmail.com
Wed Feb 4 12:27:24 UTC 2009


Hi,
How about using fastacmd. If you have your genome sequence formatted with
formatdb (in blast algorithm), you can easily use fastacmd to retrieve any
sequence you want :

fastacmd -d (your formatted genome db) -s (sequence name) (location)



On Wed, Feb 4, 2009 at 3:32 AM, Rodrigo Lopez <rls at ebi.ac.uk> wrote:

> Hi Nermin,
>
> To complement Guy's reply: You could also use the EMBLCDS database. This
> one contains all CDSs in EMBL-Bank (soon to be called ENA = European
> Nucleotide Archive). This one is available via the EBI's ftp server at
> pub/databases/embl/cds. The identifiers in this database correspond to the
> protein_id feature in the EMBL-Bank Feature Table which maps each CDS to
> corresponding protein translation. These in turn can be identified in
> UniProtKB. Please see the README.txt file  at:
>
> ftp.ebi.ac.uk/pub/databases/embl/cds/README.txt
>
> for further details.
>
> Further to the above, and depending on the proteome in question, you could
> have a look at the integr8 directory on the ftp server as well:
>
> ftp.ebi.ac.uk/pub/databases/integr8
>
> In here you will find the proteomes of more than 1600 organisms, mainly
> bacteria and archea, but also human, rat, mouse, etc.
>
> R:)
>
>
>
> Nermin Celik wrote:
>
>> Hi,
>>
>> I have the CDS section of a feature table and a genome of an organism.
>> Which EMBOSS program will allow me to extract the coding regions defined
>> in the CDS file from the genome and then translate them to protein
>> sequences?
>>
>> Example of CDS file:
>> FT   CDS             166..231
>> FT                   /systematic_id="ROD00001"
>> FT   CDS             313..2775
>> FT                   /systematic_id="ROD00011"
>> FT   CDS             2778..3707
>>
>> Thank you.
>> Nermin
>>
>> _______________________________________________
>> EMBOSS mailing list
>> EMBOSS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/emboss
>>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>



-- 
--------------------------------------------------------
Magdy S. Alabady, PhD
Energy Bioscience Institute (EBI)
Institute for Genome Biology (IGB)
University of Illinois At Urbana-Champaign, Illinois
------------------------------------------------------
Imagination is more important than knowledge. For knowledge is limited,
whereas imagination embraces the entire world, stimulating progress, giving
birth to evolution.. .....Albert Einstein
-------------------------------------------------------------



More information about the EMBOSS mailing list