[EMBOSS] Bug in fuzznuc output?

Mon Mar 15 16:51:05 UTC 2004

Hi!

I'm searching for a nucleic acid pattern in a set of
sequences. The sequences are in a multiple fasta file, and
I'm searching for the pattern in both strands.

While testing my searches and adjusting the number of
mismatches that I would allow, I noticed that in the
standard output format, if the pattern matches in the
reverse strand, then Start > End (i.e. the position in the
sequence where the pattern begins is given with respect to
the plus strand).

Now, for my work I need to have the output in an easy to
parse format. But if I choose to have the output in
tab-delimited form (-rformat excel), then the Start and End
positions are always (i.e. in all cases) shown with Start <
End. Thus, there is no way to tell which are the cases of
matches to the reverse strand.

Is this an already known issue?

Since we are already here, let's go for a few suggestions:
in the standard output format (seqtable), there is a header
which includes the program, rundate, report format, etc. It
seems to me that this header is there to summarize global
info about the fuzznuc run.

Then, for each sequence analyzed the report shows the
pattern, the number of mismatches allowed, the number of
hits, and whether or not the complementary strand has been
searched. At least for me, there is unnecessary repetition.

For example: in all cases we have look for the same pattern,
allowing for the same number of mismatches and asking to
search also on the reverse strand. So why not put these
runtime values in the header, and avoid the repetition?

Also, why not report if the match was on the forward or
reverse strand _for each sequence_? (I mean, to make it more
clear, and not just let the user note that Start > End).

Thanks in advance,

Fernan

PS: BTW, I'm running EMBOSS-2.6.0 on FreeBSD-4.9

-- 
F e r n a n   A g u e r o
http://genoma.unsam.edu.ar/~fernan