[EMBOSS] Antwort: protein sequence format question

David.Bauer at SCHERING.DE David.Bauer at SCHERING.DE
Wed Sep 6 05:45:59 UTC 2006


Hi,

the file which you try to use is a mysql dump from the biomart database.
So this is not a format which you can use with EMBOSS.
But the uniprot is also available in other formats.
Please have a look at the directory:
ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/

There you will find uniprot in fasta and embl (.dat.gz) format which can
be used with EMBOSS.
You can also index these files with the EMBOSS tools dbxfasta or dbxflat
so you can efficently retrieve individual sequences from the database.

Some more information about the sequence format supported by EMBOSS you
can find at the emboss documentation pages:
http://emboss.sourceforge.net/docs/themes/SequenceFormats.html


HTH,
David.

emboss-bounces at lists.open-bio.org schrieb am 06/09/2006 01:54:24:

> Hi,
>
>       I try to use DIGEST function in EMBOSS for tryptic digest
> of protein sequence. The sequence file I download from the following
> link:
> ftp://ftp.ebi.ac.
>
uk/pub/databases/biomart/current/uniprot_mart_17/uniprot_sequence__sequence__main.

> txt.gz
>
>      It is a tab delimited flat file which includes all protein
> sequences. It seems
> that it is not any of the formats EMBOSS support. I wonder is it
> still possible
> to use DIGEST function?




More information about the EMBOSS mailing list