EMBOSS 1.10.0

ableasby at hgmp.mrc.ac.uk ableasby at hgmp.mrc.ac.uk
Sun Feb 18 16:07:32 UTC 2001

EMBOSS 1.10.0

This release contains several new applications, some which are still
under active development. We hope to provide some of the data files
referred to on our ftp server soon.


   Matrix/scaffold attachment regions (MARs/SARs) are genomic elements
   thought to delineate the structural and functional organisation of the
   eukaryotic genome. Originally, MARs and SARs were identified through
   their ability to bind to the nuclear matrix or scaffold. Binding
   cannot be assigned to a unique sequence element, but is dispersed over
   a region of several hundred base pairs. These elements are found
   flanking a gene or a small cluster of genes and are located often in
   the vicinity of cis-regulatory sequences. This has led to the
   suggestion that they contribute to higher order regulation of
   transcription by defining boundaries of independently controlled
   chromatin domains. There is indirect evidence to support this notion.
   In transgenic experiments MARs/SARs dampen position effects by
   shielding the transgene from the effects of the chromatin structure at
   the site of integration. Furthermore, they may act as boundary
   elements for enhancers, restricting their long range effect to only
   the promoters that are located in the same chromatin domain.
   marscan finds a bipartite sequence element that is unique for a large
   group of eukaryotic MARs/SARs. This MAR/SAR recognition signature
   (MRS) comprises two individual sequence elements that are <200 bp
   apart and may be aligned on positioned nucleosomes in MARs. The MRS
   can be used to correctly predict the position of MARs/SARs in plants
   and animals, based on genomic DNA sequence information alone.
   Experimental evidence from the analysis of >300 kb of sequence data
   from several eukaryotic organisms show that wherever a MRS is observed
   in the DNA sequence, the corresponding genomic fragment is a
   biochemically identifiable SAR.
   The MRS is a bipartite sequence element that consists of two
   individual sequences of 8 (AATAAYAA) and 16 bp (AWWRTAANNWWGNNNC)
   within a 200 bp distance from each other. One mismatch is allowed in
   the 16 bp pattern. The patterns can occur on either strand of the DNA
   with respect to each other.
   Not all SARs contain a MRS. Analysis of >300 kb of genomic sequence
   from a variety of eukaryotic organisms shows that the MRS faithfully
   predicts 80% of MARs and SARs, suggesting that at least one other type
   of MAR/SAR may exist which does not contain a MRS.


scope parses the scop classification file available at
http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?dir=lin and writes the
scop classification to an embl-like format file.  This file
(Escop.dat) should be placed in the emboss/data directory.


nrscope parses the embl-like format scop classification file generated
by the EMBOSS application scope, and writes in the same format a file
of non-redundant domains.  The format of these files is explained in
the scope documentation.  The current version of nrscope removes
redundancy at the level of the scop family, i.e. entries belonging to
the same family will be non-redundant.


domainer parses an embl-like format scop classification file generated
by the EMBOSS applications scope or nrscope, and clean protein
coordinate files generated by the coorde application (not currently in
emboss, email Jon Ison jison at hgmp.mrc.ac.uk) and writes, for each
domain in the scop classification file, clean domain coordinate files
in embl-like and pdb formats .  Each of these files contains
coordinates for a single scop domain.

STAMPS (under development)

stamps parses an embl-like format scop classification file generated
by the EMBOSS applications scope or nrscope, and calls stamp to
generate structural alignments for each SCOP family.  It is still
under active development. You have to "make stamp" in the
applications directory to create "stamps".

Developers Notes

1. Most C datatypes have changed in the libraries. This is a prelude
   to getting true 64 bit operation. Notably ints are now "ajint"s and
   longs are now "ajlong"s. An ajint can be equal in size to an ajlong
   depending on the hardware; however, an ajlong should be used
   whenever a 64 bit int might be used.

2. The function ajFmtScanS has been added. This can be regarded as
   the EMBOSS version of the C function sscanf and operates
   similarly. It has several extensions, particularly %S is used
   for dynamically allocated string objects (AjPStr).
   This function makes reading data files considerably easier and
   many applications will be rewritten to use it rather than having
   to rely on tokenisation.

As usual I've probably forgotten to mention some things and my colleagues
will no doubt correct any oversights.


More information about the EMBOSS mailing list