pmr at ebi.ac.uk
Tue Dec 18 09:23:18 UTC 2007
Bernd Web wrote:
> I'd like to run iep on a sequence and use either pir or osformat gifasta.
> The following gives an error (using emboss 5.0.0 on Debian):
> iep -filter -osformat gifasta -sequence seq.txt
> This returns "Died: Unknown qualifier -osformat"
-osformat is for sequence outputs (and iep has no sequence outputs)
iep writes a plain text file as output and no special options
but we will add more information (accession and description) for a
future release ... and to other plain text output files too.
> iep -filter -sformat pir seq.txt or iep -sformat pir -sequence seq.txt
> also give an error:
> "Died: iep terminated: Bad value for '-sequence' with -auto defined"
> (with or without the sequence flag)
> However, iep -sformat fasta seq.txt works. What am I doing wrong?
It appears your sequence can be read in fasta format but not in pir
format. PIR format has special characters after the first '>'
> My FastA definition line is e.g.
> The IEP report would me more useful if it contains the ENSG number
> instead of "protein coding or the entire definition line.
Not a nice format. NCBI made up a lot of FASTA file identifiers with '|'
characters and we try to follow their rules. That causes us to ignore
the first part (it should be a database name) and reas the ID from the end.
You could reformat the FASTA files (e.g. with a perl script) to remove
the '|' characters and leave something useful as the plain ID (perhaps
ENSG00000205090_1 in this case) and the rest as description.
Hope that helps,
More information about the EMBOSS