[EMBOSS] seqret problem?

Tue Jul 20 10:42:43 UTC 2004

> -----Original Message-----
> From: Zhiqiang Ye [mailto:yezq at mail.cbi.pku.edu.cn] 
> Sent: 20 July 2004 11:11
> To: emboss
> Subject: [EMBOSS] seqret problem?
> 
> 
> hi all
>       I find that if there is a semicolon in the description 
> line, seqret works wrong.
> 
> [yezq at pro]$ more test 
> >P1;Z1BPC2
> MELTSTRKKANAITSSILNR IAIRGQRKVA DALGINESQI SRWKGDFIPK MGMLLAVLEW
> GVEDEELAEL AKKVAHLLTK EKPQDCGNSF EA
>
> Is there anyone think that this is a bug?

Yes, I'd say it's a bug, but possibly not in the form you presented it.
In your example the problem is that seqret is incorrectly guessing the
format of your sequence as NBRF.  In this case your sequence is
formatted in a way which could make it NBRF format so this is
acceptable.  What I think is a bug though is this:

$ cat test.seq
>P1;Z1BPC2
MELTSTRKKANAITSSILNR IAIRGQRKVA DALGINESQI SRWKGDFIPK MGMLLAVLEW
GVEDEELAEL AKKVAHLLTK EKPQDCGNSF EA

$ seqret fasta::test.seq fasta::stdout
Reads and writes (returns) sequences
Error: Unable to read sequence 'fasta::test.seq'
Died: seqret terminated: Bad value for '-sequence' and no prompt

Having looked at a few definitions of the fasta format I can't find one
which says that your sequence doesn't conform to the standard.  The
semi-colon can be used in Fasta as a comment delimiter, so it could be
argued that it would be correct to remove the Z1BPC completely in the
output, but it should still parse OK.

The problem seems to be that seqret is interpreting your sequence as the
PIR subvariant of the FastA format (Strange, since the test FastA
sequence I just got from PIR didn't have a semi-colon in the description
line) and its subsequent tests on the file then fail.

Since there doesn't seem to be a USA syntax for allowing you to specify
which kind of FastA format you're using then I'd say this was a bug.  If
it guesses PIR and this subsequently fails then it should go back and
use "dumb" FastA instead.

Also, it's worth noting that whilst seqret can't read FastA files in
this format it's more than happy to write them like that:

$ cat test3.seq
LOCUS       P1;Z1BPC2
BASE COUNT       10 a      1 c      6 g      4 t     71 others
ORIGIN
       1  MELTSTRKKA NAITSSILNR IAIRGQRKVA DALGINESQI SRWKGDFIPK
MGMLLAVLEW
      61  GVEDEELAEL AKKVAHLLTK EKPQDCGNSF EA
//

$ seqret genbank::test3.seq fasta::test4.seq
Reads and writes (returns) sequences

$ seqret fasta::test4.seq fasta::stdout
Reads and writes (returns) sequences
Error: Unable to read sequence 'fasta::test4.seq'
Died: seqret terminated: Bad value for '-sequence' and no prompt

$ cat test4.seq
>P1;Z1BPC2
MELTSTRKKANAITSSILNRIAIRGQRKVADALGINESQISRWKGDFIPKMGMLLAVLEW
GVEDEELAELAKKVAHLLTKEKPQDCGNSFEA