From ableasby at hgmp.mrc.ac.uk  Wed Jul 13 10:38:06 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Wed, 13 Jul 2005 15:38:06 +0100 (BST)
Subject: [emboss-announce] New email lists ready
Message-ID: <200507131438.j6DEc6n0027708@bromine.hgmp.mrc.ac.uk>

The new email addresses for the EMBOSS lists are now set up and ready
(excluding any teething problems). They are:

   emboss at emboss.open-bio.org
   emboss-dev at emboss.open-bio.org
   emboss-bug at emboss.open-bio.org
   emboss-submit at emboss.open-bio.org

You can access the archives, subscribe/unsubscribe and alter
the way email is sent to you (e.g. digests) by visiting:

  http://emboss.open-bio.org/mailman/listinfo/emboss
  http://emboss.open-bio.org/mailman/listinfo/emboss-dev
  http://emboss.open-bio.org/mailman/listinfo/emboss-announce
  http://emboss.open-bio.org/mailman/listinfo/emboss-bug

The new FTP server is at:

  ftp://emboss.open-bio.org/pub/EMBOSS


Alan

From ableasby at hgmp.mrc.ac.uk  Thu Jul 14 19:44:05 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Fri, 15 Jul 2005 00:44:05 +0100 (BST)
Subject: [emboss-announce] EMBOSS 3.0.0 released
Message-ID: <200507142344.j6ENi5Sd002353@bromine.hgmp.mrc.ac.uk>

EMBOSS 3.0.0 is now available for download from:

   ftp://emboss.open-bio.org/pub/EMBOSS/

   and, until the 27th July, from:
   ftp://ftp.rfcgr.mrc.ac.uk/pub/EMBOSS/

The following text details some of the changes from the previous
release.

Alan


EMBOSS main package:

New database indexing programs dbxflat, dbxfasta and dbxgcg. A
dbxblast program will be added if we can extract data from the new
BLAST formatdb output. These programs allow indexing of files
larger than 2Gb.
N.B.: Indexes will be created faster if they are written through a
      different disc controller than that used to read the database
      being indexed. If that is not possible then reading from and
      writing to different hard drives on the same controller is
      recommended. Note that each index can be created independently
      of the others e.g. you can create keyword and description
      indexes after you've created the ID and ACC indexes.

To support these programs, the emboss.default and .embossrc files can
include "resource" definitions. See the documentation of these
programs for more information. "resource" definitions are intended to
define anything other than environment variables and databases.

In the emboss.default and .embossrc files the same name can be used
for variables, databases, and resources (we now store them in separate
tables). In previous versions a single table was used and name clashes
could occur. This becomes an issue with the increasing use of resource
definitions.

Sequence sets in ACD have a new attribute "aligned" that reports
whether the sequences are aligned (reading a multiple alignment in for
visualisation) or not (reading a set of sequences into memory for
further processing - perhaps for alignment).

Sequence formats have been reviewed. "experiment" format is that used
by the Staden package. "staden" and "gcg" formats now parse out
comments from anywhere in the sequence. "nexus" and "nexusnon" formats
now correctly report protein sequence datatypes. "nbrf" or "pir"
format data can now be read from an SRSWWW server (for technical
reasons, SRS servers are unable to exactly reproduce NBRF/PIR
format). "clustal" output no longer writes in blocks of 10.  "Phylip3"
output is now renamed "phylipnon" for compatibility with other
non-interleaved output format names. The "phylip3" name remains valid
for back-compatibility. The header record for phylipnon format has
been changed to that accepted by phylip 3.6 (no YF on the header line,
number of sequences specified). Sequence format information on the web
has been updated to reflect these changes.

Codon usage table formats can be in these formats (-format qualifier):
  "emboss",    "EMBOSS codon usage file",
        "All numbers read, #comments for extras"
  "cut",       "EMBOSS codon usage file",
        "Same as EMBOSS, output default format is 'cut'"
  "gcg",       "GCG codon usage file",
        "All numbers read, #comments for extras"
  "cutg",      "CUTG codon usage file",
        "All numbers (cutgaa) read or fraction calculated, extras added"
  "cutgaa",    "CUTG codon usage file with aminoacids",
        "Cutg with all numbers"
  "spsum",     CUTG species summary file",
        "Number only, species and CDSs in header"
  "cherry",    "Mike Cherry codonusage database file",
        "GCG format with species and CDSs in header"
  "transterm", "TransTerm database file",
        "GCG format with no extras"
  "codehop",   "FHCRC codehop program codon usage file",
        "Freq only, extras at end"
  "staden",    "Staden package codon usage file with percentages",
        "Freq or number only, no extras"
  "numstaden", "Staden package codon usage file with numbers",
       "Number only, no extras. Can be read as 'staden'"

Any of these formats should be readable by default. Some files are
"readable" in more than one format (staden and numstaden for example
can both be read as "staden"). The extra names are used so we can
reuse them as output format names.

For output of codon usage tables, the same formats are available
(-oformat qualifier).

A new application codcopy (not codret because coderet is already an
EMBOSS program name) will convert from one format to another in the
same way as seqret converts sequence formats.

Coderet reports the number of CDS, mRNA and translation sequences.

Correction to sequence numbering for reversed nucleotide sequences in
alignments. Correction to sequence alignment functions returning
slightly suboptimal alignments.

The entrails program reports codon usage formats. Description of
report format entrails output improved. Entrails is built by "make
check" and is provided so that developers of wrappers can obtain all
EMBOSS internal details needed, for example all ACD datatypes and
input/output format names and descriptions.

Sequence types are explicitly set in cons, sixpack and backtranseq as
some output formats failed to recognise them as protein.

EMBASSY packages:

MYEMBOSS is a new EMBASSY package for developing your own code.

Installation requires recent versions of GNU packages autoconf,
automake and libtool.

To install, you must first build the configure and make files with
these commands:

aclocal -I m4

autoconf

automake -a

When you add your own programs, do so by adding source files in
myemboss/source and ACD files in myemboss/emboss_acd and add these
filenames to the Makefile.am files in each directory. There are
"myseq" and "mytest" examples provided to guide you.

There is no need to modify configure or Makefile files - these will be
automatically updated.

To allow MYEMBOSS to be installed by one user, and linked to an EMBOSS
installation maintained for the site by someone else, new variables
are added to locate the ACD files for any EMBASSY package. If myemboss
is not installed in the same place as EMBOSS, define
EMBOSS_MYEMBOSSROOT as the location of the myemboss installed ACD
files or the myemboss/emboss_acd source directory. This requires that
EMBASSY programs call the embInitP function with the name of the
package ("myemboss"). For ACD utilities such as acdvalid or acdc to
work, as these use the EMBOSS embInit call, another variable
EMBOSS_ACDUTILROOT must be defined, pointing to the same directory.

PHYLIP is a beta release port of PHYLIP 3.6b. We welcome comments on
the EMBOSS interface to the programs. Program names are prefixed by
'f' to avoid clashes with the old PHYLIP EMBASSY package. We still
need to work on adding new tree input and output formats, and updating
the code to PHYLIP 3.63 (December 2004). We are also considering
splitting more of the programs to simplify the ACD interface. In this
release seqboot and treedist are already split. seqboot is split by
input type into seqboot, restboot, discboot and freqboot. Treedist is
split by the number of input files into treedist and
treedistpair. Acdvalid objects to the dependencies in other programs,
for example the method used by fdnadist.

The DOMAINATRIX package of earlier releases has been extended and
replaced by 5 EMBASSY packages described below (32 applications in
total).  These tools were developed as part of a research project and
are distinct from other EMBOSS apps in being intended mostly for
computational biologists rather than biologist end-users.

STRUCTURE

The STRUCTURE package is used for parsing the PDB database and
generating secondary databases of coordinate and derived data.  The
tools have the following scope: (i) For parsing PDB files and writing
clean coordinate files (CCF files) that "clean-up" many PDB
inconsistencies.  For example, residue numbers give the correct index
into the biological sequence.  (ii) To generate CCF files for whole
PDB files or individual domains from the SCOP and CATH databases.
(iii) To augment CCF files with residue solvent accessibility and
secondary structure data.  (iv) To generate contact files (CON files)
of intra-chain and inter-chain residue-residue contact data. (v) To
generate CON files of residue-ligand contact data. (vi) Miscellaneous
file handling, e.g. dictionary of heterogen groups.

DOMAINATRIX

The DOMAINATRIX package is used for handling the SCOP and CATH
databases of protein domain classification, the parsable files of
which can be inconvenient, e.g. for comparative studies, extending and
processing.  The tools have the following scope: (i) For parsing raw
SCOP and CATH parsable files and writing domain classification files
(DCF files) with a single, simple and extensible format. (ii) To add
sequence records to a DCF file. (iii) To remove low resolution
domains.  (iv) To flexibly calculate and remove redundancy.  (v)
Primitive tools for secondary structure element mapping to domains in
a DCF file.

DOMALIGN

The DOMALIGN package is used for generating alignments for families of
domains, especially across large datasets, e.g. the whole of SCOP.
The tools have the following scope: (i) For identifying representative
structures for different nodes in the SCOP and CATH hierarchies.  (ii)
For generating annotated, structure-based sequence alignments for
these nodes.  (iii) For extending these domain alignment files (DAF
files) with sequences of unknown structure. (iv) All-versus-all global
sequence alignment.

DOMSEARCH 

The DOMSEARCH package is used for deriving extended sequence families,
especially from large structural datasets such as the whole of SCOP.
The tools have the following scope: (i) To generate domain hits files
(DHF files) of sequence relatives to an alignment or other
sequences. (ii) To remove fragmentary sequences from a DHF file.
(iii) To flexibly calculate and remove redundancy.  (iv) To remove
hits hits of ambiguous classification and collate sequences into
families.

SIGNATURE

The SIGNATURE package is used for generating, scanning and evaluating
sparse signatures and other predictive elements for protein sequence
characterisation.  The tools have the following scope: (i) To generate
sparse signatures for protein families from alignments and residue
contact data.  (ii) Generate other types of discriminator (e.g. HMMs)
from alignments. (iii) Generate ligand-binding signatures from
residue-ligand contacts.  (iv) Generate domain hits files (DHF files)
and ligand hits files (LHF files) of hits (sequences) from signature
scans. (v) Interpretation and display of signature performance by
using ROC analysis.


Where data, files etc are mentioned above or in the application
documentation, data structures and functions for manipulating such are
usually provided in the AJAX and NUCLEUS C programming libraries.  For
example, there are objects for handling protein atoms, residues,
chains, for SCOP and CATH domains and so on.

From pmr at ebi.ac.uk  Fri Jul 22 11:00:01 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 22 Jul 2005 16:00:01 +0100
Subject: [emboss-announce] EMBOSS in August
Message-ID: <42E109F1.9070604@ebi.ac.uk>

We know it is close to the end of July, and we have not said what is happening 
to the EMBOSS team. We do have a solution, but it is not yet officially confirmed.

The Rosalind Franklin Centre for Genomic Research will close at the end of 
next week. The EMBOSS project will move to the European Bioinformatics 
Institute from August 1st. Development and support will continue as before.

The EMBOSS homepage will remain at http://emboss.sourceforge.net/

The FTP server (to download EMBOSS releases and updates) has moved to 
ftp://emboss.open-bio.org/pub/EMBOSS/

The EMBOSS anonymous CVS server will remain at cvs.open-bio.org hosted by the 
Open Bio Foundation, who will also continue to host the developers' CVS server.

The EMBOSS mailing lists have been moved to the Open Bio Foundation, so the 
addresses are now:

To contact the EMBOSS team:

emboss-bug at emboss.open-bio.org Bug reports and support requests
emboss-submit at emboss.open-bio.org Code submissions

Lists users/developers can subscribe to:

emboss at emboss.open-bio.org Users mailing list
emboss-dev at emboss.open-bio.org Developers mailing list
emboss-announce at emboss.open-bio.org New release announcements list

There are obvious gaps in these details ... more news as soon as we have 
confirmation.

regards,

Peter Rice, Alan Bleasby and the EMBOSS team.


From ableasby at hgmp.mrc.ac.uk  Wed Jul 13 14:38:06 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Wed, 13 Jul 2005 15:38:06 +0100 (BST)
Subject: [emboss-announce] New email lists ready
Message-ID: <200507131438.j6DEc6n0027708@bromine.hgmp.mrc.ac.uk>

The new email addresses for the EMBOSS lists are now set up and ready
(excluding any teething problems). They are:

   emboss at emboss.open-bio.org
   emboss-dev at emboss.open-bio.org
   emboss-bug at emboss.open-bio.org
   emboss-submit at emboss.open-bio.org

You can access the archives, subscribe/unsubscribe and alter
the way email is sent to you (e.g. digests) by visiting:

  http://emboss.open-bio.org/mailman/listinfo/emboss
  http://emboss.open-bio.org/mailman/listinfo/emboss-dev
  http://emboss.open-bio.org/mailman/listinfo/emboss-announce
  http://emboss.open-bio.org/mailman/listinfo/emboss-bug

The new FTP server is at:

  ftp://emboss.open-bio.org/pub/EMBOSS


Alan


From ableasby at hgmp.mrc.ac.uk  Thu Jul 14 23:44:05 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Fri, 15 Jul 2005 00:44:05 +0100 (BST)
Subject: [emboss-announce] EMBOSS 3.0.0 released
Message-ID: <200507142344.j6ENi5Sd002353@bromine.hgmp.mrc.ac.uk>

EMBOSS 3.0.0 is now available for download from:

   ftp://emboss.open-bio.org/pub/EMBOSS/

   and, until the 27th July, from:
   ftp://ftp.rfcgr.mrc.ac.uk/pub/EMBOSS/

The following text details some of the changes from the previous
release.

Alan


EMBOSS main package:

New database indexing programs dbxflat, dbxfasta and dbxgcg. A
dbxblast program will be added if we can extract data from the new
BLAST formatdb output. These programs allow indexing of files
larger than 2Gb.
N.B.: Indexes will be created faster if they are written through a
      different disc controller than that used to read the database
      being indexed. If that is not possible then reading from and
      writing to different hard drives on the same controller is
      recommended. Note that each index can be created independently
      of the others e.g. you can create keyword and description
      indexes after you've created the ID and ACC indexes.

To support these programs, the emboss.default and .embossrc files can
include "resource" definitions. See the documentation of these
programs for more information. "resource" definitions are intended to
define anything other than environment variables and databases.

In the emboss.default and .embossrc files the same name can be used
for variables, databases, and resources (we now store them in separate
tables). In previous versions a single table was used and name clashes
could occur. This becomes an issue with the increasing use of resource
definitions.

Sequence sets in ACD have a new attribute "aligned" that reports
whether the sequences are aligned (reading a multiple alignment in for
visualisation) or not (reading a set of sequences into memory for
further processing - perhaps for alignment).

Sequence formats have been reviewed. "experiment" format is that used
by the Staden package. "staden" and "gcg" formats now parse out
comments from anywhere in the sequence. "nexus" and "nexusnon" formats
now correctly report protein sequence datatypes. "nbrf" or "pir"
format data can now be read from an SRSWWW server (for technical
reasons, SRS servers are unable to exactly reproduce NBRF/PIR
format). "clustal" output no longer writes in blocks of 10.  "Phylip3"
output is now renamed "phylipnon" for compatibility with other
non-interleaved output format names. The "phylip3" name remains valid
for back-compatibility. The header record for phylipnon format has
been changed to that accepted by phylip 3.6 (no YF on the header line,
number of sequences specified). Sequence format information on the web
has been updated to reflect these changes.

Codon usage table formats can be in these formats (-format qualifier):
  "emboss",    "EMBOSS codon usage file",
        "All numbers read, #comments for extras"
  "cut",       "EMBOSS codon usage file",
        "Same as EMBOSS, output default format is 'cut'"
  "gcg",       "GCG codon usage file",
        "All numbers read, #comments for extras"
  "cutg",      "CUTG codon usage file",
        "All numbers (cutgaa) read or fraction calculated, extras added"
  "cutgaa",    "CUTG codon usage file with aminoacids",
        "Cutg with all numbers"
  "spsum",     CUTG species summary file",
        "Number only, species and CDSs in header"
  "cherry",    "Mike Cherry codonusage database file",
        "GCG format with species and CDSs in header"
  "transterm", "TransTerm database file",
        "GCG format with no extras"
  "codehop",   "FHCRC codehop program codon usage file",
        "Freq only, extras at end"
  "staden",    "Staden package codon usage file with percentages",
        "Freq or number only, no extras"
  "numstaden", "Staden package codon usage file with numbers",
       "Number only, no extras. Can be read as 'staden'"

Any of these formats should be readable by default. Some files are
"readable" in more than one format (staden and numstaden for example
can both be read as "staden"). The extra names are used so we can
reuse them as output format names.

For output of codon usage tables, the same formats are available
(-oformat qualifier).

A new application codcopy (not codret because coderet is already an
EMBOSS program name) will convert from one format to another in the
same way as seqret converts sequence formats.

Coderet reports the number of CDS, mRNA and translation sequences.

Correction to sequence numbering for reversed nucleotide sequences in
alignments. Correction to sequence alignment functions returning
slightly suboptimal alignments.

The entrails program reports codon usage formats. Description of
report format entrails output improved. Entrails is built by "make
check" and is provided so that developers of wrappers can obtain all
EMBOSS internal details needed, for example all ACD datatypes and
input/output format names and descriptions.

Sequence types are explicitly set in cons, sixpack and backtranseq as
some output formats failed to recognise them as protein.

EMBASSY packages:

MYEMBOSS is a new EMBASSY package for developing your own code.

Installation requires recent versions of GNU packages autoconf,
automake and libtool.

To install, you must first build the configure and make files with
these commands:

aclocal -I m4

autoconf

automake -a

When you add your own programs, do so by adding source files in
myemboss/source and ACD files in myemboss/emboss_acd and add these
filenames to the Makefile.am files in each directory. There are
"myseq" and "mytest" examples provided to guide you.

There is no need to modify configure or Makefile files - these will be
automatically updated.

To allow MYEMBOSS to be installed by one user, and linked to an EMBOSS
installation maintained for the site by someone else, new variables
are added to locate the ACD files for any EMBASSY package. If myemboss
is not installed in the same place as EMBOSS, define
EMBOSS_MYEMBOSSROOT as the location of the myemboss installed ACD
files or the myemboss/emboss_acd source directory. This requires that
EMBASSY programs call the embInitP function with the name of the
package ("myemboss"). For ACD utilities such as acdvalid or acdc to
work, as these use the EMBOSS embInit call, another variable
EMBOSS_ACDUTILROOT must be defined, pointing to the same directory.

PHYLIP is a beta release port of PHYLIP 3.6b. We welcome comments on
the EMBOSS interface to the programs. Program names are prefixed by
'f' to avoid clashes with the old PHYLIP EMBASSY package. We still
need to work on adding new tree input and output formats, and updating
the code to PHYLIP 3.63 (December 2004). We are also considering
splitting more of the programs to simplify the ACD interface. In this
release seqboot and treedist are already split. seqboot is split by
input type into seqboot, restboot, discboot and freqboot. Treedist is
split by the number of input files into treedist and
treedistpair. Acdvalid objects to the dependencies in other programs,
for example the method used by fdnadist.

The DOMAINATRIX package of earlier releases has been extended and
replaced by 5 EMBASSY packages described below (32 applications in
total).  These tools were developed as part of a research project and
are distinct from other EMBOSS apps in being intended mostly for
computational biologists rather than biologist end-users.

STRUCTURE

The STRUCTURE package is used for parsing the PDB database and
generating secondary databases of coordinate and derived data.  The
tools have the following scope: (i) For parsing PDB files and writing
clean coordinate files (CCF files) that "clean-up" many PDB
inconsistencies.  For example, residue numbers give the correct index
into the biological sequence.  (ii) To generate CCF files for whole
PDB files or individual domains from the SCOP and CATH databases.
(iii) To augment CCF files with residue solvent accessibility and
secondary structure data.  (iv) To generate contact files (CON files)
of intra-chain and inter-chain residue-residue contact data. (v) To
generate CON files of residue-ligand contact data. (vi) Miscellaneous
file handling, e.g. dictionary of heterogen groups.

DOMAINATRIX

The DOMAINATRIX package is used for handling the SCOP and CATH
databases of protein domain classification, the parsable files of
which can be inconvenient, e.g. for comparative studies, extending and
processing.  The tools have the following scope: (i) For parsing raw
SCOP and CATH parsable files and writing domain classification files
(DCF files) with a single, simple and extensible format. (ii) To add
sequence records to a DCF file. (iii) To remove low resolution
domains.  (iv) To flexibly calculate and remove redundancy.  (v)
Primitive tools for secondary structure element mapping to domains in
a DCF file.

DOMALIGN

The DOMALIGN package is used for generating alignments for families of
domains, especially across large datasets, e.g. the whole of SCOP.
The tools have the following scope: (i) For identifying representative
structures for different nodes in the SCOP and CATH hierarchies.  (ii)
For generating annotated, structure-based sequence alignments for
these nodes.  (iii) For extending these domain alignment files (DAF
files) with sequences of unknown structure. (iv) All-versus-all global
sequence alignment.

DOMSEARCH 

The DOMSEARCH package is used for deriving extended sequence families,
especially from large structural datasets such as the whole of SCOP.
The tools have the following scope: (i) To generate domain hits files
(DHF files) of sequence relatives to an alignment or other
sequences. (ii) To remove fragmentary sequences from a DHF file.
(iii) To flexibly calculate and remove redundancy.  (iv) To remove
hits hits of ambiguous classification and collate sequences into
families.

SIGNATURE

The SIGNATURE package is used for generating, scanning and evaluating
sparse signatures and other predictive elements for protein sequence
characterisation.  The tools have the following scope: (i) To generate
sparse signatures for protein families from alignments and residue
contact data.  (ii) Generate other types of discriminator (e.g. HMMs)
from alignments. (iii) Generate ligand-binding signatures from
residue-ligand contacts.  (iv) Generate domain hits files (DHF files)
and ligand hits files (LHF files) of hits (sequences) from signature
scans. (v) Interpretation and display of signature performance by
using ROC analysis.


Where data, files etc are mentioned above or in the application
documentation, data structures and functions for manipulating such are
usually provided in the AJAX and NUCLEUS C programming libraries.  For
example, there are objects for handling protein atoms, residues,
chains, for SCOP and CATH domains and so on.


From pmr at ebi.ac.uk  Fri Jul 22 15:00:01 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 22 Jul 2005 16:00:01 +0100
Subject: [emboss-announce] EMBOSS in August
Message-ID: <42E109F1.9070604@ebi.ac.uk>

We know it is close to the end of July, and we have not said what is happening 
to the EMBOSS team. We do have a solution, but it is not yet officially confirmed.

The Rosalind Franklin Centre for Genomic Research will close at the end of 
next week. The EMBOSS project will move to the European Bioinformatics 
Institute from August 1st. Development and support will continue as before.

The EMBOSS homepage will remain at http://emboss.sourceforge.net/

The FTP server (to download EMBOSS releases and updates) has moved to 
ftp://emboss.open-bio.org/pub/EMBOSS/

The EMBOSS anonymous CVS server will remain at cvs.open-bio.org hosted by the 
Open Bio Foundation, who will also continue to host the developers' CVS server.

The EMBOSS mailing lists have been moved to the Open Bio Foundation, so the 
addresses are now:

To contact the EMBOSS team:

emboss-bug at emboss.open-bio.org Bug reports and support requests
emboss-submit at emboss.open-bio.org Code submissions

Lists users/developers can subscribe to:

emboss at emboss.open-bio.org Users mailing list
emboss-dev at emboss.open-bio.org Developers mailing list
emboss-announce at emboss.open-bio.org New release announcements list

There are obvious gaps in these details ... more news as soon as we have 
confirmation.

regards,

Peter Rice, Alan Bleasby and the EMBOSS team.