[EMBOSS] Output from seqret in fastaformat

Fri Jan 19 16:19:44 UTC 2007

Peter Rice wrote:

> "JK (Jesper Agerbo Krogh)" <JK at novozymes.com,<pmr at ebi.ac.uk>

I'm with Peter on this one.  There are way too many possible formats
for fasta comment lines for any software to support all of them.
This command line reformatting is exactly the sort of task my
'extract' program was written to handle (having faced the same
task myself more times than I can count).  Example:

% cat >foo.pfa <<EOD
>IES3_YEAST Q12345 Ino eighty subunit 3.
MKFEDLLATNKQVQFAHAATQHYKSVKTPDFLEKDPHHKKFHNADGLNQQGSSTPSTATD
ANAASTASTHTNTTTFKRHIVAVDDISKMNYEMIKNSPGNVITNANQDEIDISTLKTRLY
KDNLYAMNDNFLQAVNDQIVTLNAAEQDQETEDPDLSDDEKIDILTKIQENLLEEYQKLS
QKERKWFILKELLLDANVELDLFSNRGRKASHPIAFGAVAIPTNVNANSLAFNRTKRRKI
NKNGLLENIL
EOD
% cat foo.pfa | extract -if '>' -mt -cols 'UNIPROT:[2,]'
UNIPROT:Q12345 Ino eighty subunit 3.
MKFEDLLATNKQVQFAHAATQHYKSVKTPDFLEKDPHHKKFHNADGLNQQGSSTPSTATD
ANAASTASTHTNTTTFKRHIVAVDDISKMNYEMIKNSPGNVITNANQDEIDISTLKTRLY
KDNLYAMNDNFLQAVNDQIVTLNAAEQDQETEDPDLSDDEKIDILTKIQENLLEEYQKLS
QKERKWFILKELLLDANVELDLFSNRGRKASHPIAFGAVAIPTNVNANSLAFNRTKRRKI
NKNGLLENIL

So you can process the whole thing in a pipe or in two stages through
a temporary file. Your choice.

Extract is part of drm_tools (these have nothing to do with
"digital rights management", they were my initials long before drm
took on its current common meaning) from here:

ftp://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/drm_tools.tar.gz

The man page is here:

  http://saf.caltech.edu/saf_manuals/extract.html

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech