[EMBOSS] Gap cost restrictions for needle/water/stretcher?

Mon Aug 24 11:52:48 UTC 2009

Frank Förster wrote:
> Hi,
> 
> I have only one question about the allowed gap costs in several
> programs. I using needle, water and stretcher for example.
> 
> There are some restrictions to the gap costs a have to use:
> 
> 1) needle: float from 0-100 for gapopen and 0-10 for gapextend
> 2) water: float from 0.000-10.000 for gapopen and 0.000-10.000 for
> gapextend
> 2) stretcher: positive integer
> 
> What are the meaning of these restrictions? I think you use an integer
> value for stretcher (I did not check the source code) and floats for
> needle/water.

Stretcher and matcher were imported code that used integer values for
speed. Our matrix files use integer values so we can use integer or
flats as gap penalty values.

> But why the restriction for water to three decimal places?

There is no 3 decimal places restriction, we only use 3 decimal places
to write out the values.

> But more interesting, why the restriction to 0-100/0-10 for needle/water?

We set limits for needle and water with the first release of EMBOSS and
nobody has asked for a higher value.

Zero is useful for some cases, either to not penalise the number of gaps
(for example a large number of single base gapes in a single nucleotide
read) or to not penalise the gap length (genomic sequence aligned to
mRNA/cDNA).

The upper limits are enough for the cases we have seen.

More interesting is why we have no upper limit for stretcher and
matcher. We should be consistent. These were third-party applications
(from Bill Pearson's fasta2 package) that we imported.

Does anyone object to setting the same gap penalty limits for all
applications?

Can anyone think of a use case that needs a larger maximum value?

We can add applications to suggest gap penalties for each matrix file
... or store default values in the files. Is this useful?

regards,

Peter Rice