[EMBOSS] Output from seqret in fastaformat.

pmr at ebi.ac.uk pmr at ebi.ac.uk
Fri Jan 19 15:54:41 UTC 2007


Hi Jesper,

> I've godt dbxflat to index the swissprot database.. but I'd like to have
> the output
> formatted with the USA as the fasta ID.
>
> Current..:
>
> seqret UNIPROT:Q12345
> Reads and writes (returns) sequences
> output sequence(s) [ies3_yeast.fasta]:
>
>>IES3_YEAST Q12345 Ino eighty subunit 3.
> MKFEDLLATNKQVQFAHAATQHYKSVKTPDFLEKDPHHKKFHNADGLNQQGSSTPSTATD
> ANAASTASTHTNTTTFKRHIVAVDDISKMNYEMIKNSPGNVITNANQDEIDISTLKTRLY
> KDNLYAMNDNFLQAVNDQIVTLNAAEQDQETEDPDLSDDEKIDILTKIQENLLEEYQKLS
> QKERKWFILKELLLDANVELDLFSNRGRKASHPIAFGAVAIPTNVNANSLAFNRTKRRKI
> NKNGLLENIL
>
> .. but I'd like..
>
>>UNIPROT:Q12345 Ino eighty subunit 3.
> MKFEDLLATNKQVQFAHAATQHYKSVKTPDFLEKDPHHKKFHNADGLNQQGSSTPSTATD
> ANAASTASTHTNTTTFKRHIVAVDDISKMNYEMIKNSPGNVITNANQDEIDISTLKTRLY
> KDNLYAMNDNFLQAVNDQIVTLNAAEQDQETEDPDLSDDEKIDILTKIQENLLEEYQKLS
> QKERKWFILKELLLDANVELDLFSNRGRKASHPIAFGAVAIPTNVNANSLAFNRTKRRKI
> NKNGLLENIL
>
> Is that possible?

Tricky to do. Q12345 is not the sequence ID, it is only the accession
number. There are ways to rewrite UNiProt as a FASTA format file and index
with dbxfasta but that loses the rest of the information in the entries.

A simple perl script to rearrange the ID lines is your easiest solution.

Alternativelyj, you could invent a new EMBOSS output format that uses the
DBname and accession to create the ID. But EMBOSS would still want to
write to a file called "ies3_yeast.*" because it uses the ID to make up
the default filename.

If you insist, you can try:

seqret UNIPROT:Q12345 -sid Q12345 -osdbname UNIPROT

which gives me the result you expect with the current developers code (I
am away from the office todaty, and there have been changes to the way
database names are propagated to the output so release 4.0.0 may behave
slightly differently).

Hope that helps

Peter




More information about the EMBOSS mailing list