[Biojava-l] genes and things

Matthew Pocock mrp@sanger.ac.uk
Tue, 26 Sep 2000 17:17:46 +0100


Dear all,

At the end of this month I will be moving part-time to ensembl
(http://www.ensembl.org). This means that I will end up having to think
of genomes as containing genes! So far, the support for genes in BioJava
is lacking. You can use StrandedFeature with a given type tag and a
well-defined set of children to represent one, but we have no agreed
structure.

How do you all want to represent genes? My aproach would be to use the
interfaces below, but this may be overkill, or may miss out important
biological posibilities. All comments & flames greatfully accepted.

Matthew

extend Feature with:
  /*
   * Generate a template object that could be used to create a feature
   * that is the same as this one. This permits features to be cloned
into
   * other contexts e.g. from one database to another, without breaking
   * the encapsulation.
   */
  Template makeTemplate()

/**
 * A gene. This will contain zero-or-more transcript features,
 * and may contain other things (e.g. propmoter elements). It also
 * maintains a list of all exons known to exist in this gene.
 */
public interface Gene extends StrandedFeature {
  /**
   * Retrieve the set of exons in this gene. These will be Exon objects.

   * Only exons in this set are legal for use by an mRNA arrising from
   * a gene.
   */
  public Set getExons();
}

/**
 * A transcript represents a region of a gene that is transcribed. It
 * will normaly be contiguous, and its strand will be identical to the
 * strand of the gene (except in odd circumstances). E.g. it is a
 * region from where polymerase attaches to where it drops off.
 * <P>
 * Each transcript will have one SpliceVariant for each possible
 * mRNA it can be turned into. A single transcript may be spliced
 * in multiple ways, some of which may be exon-identical to how
 * other transcripts are spliced.
 */
public interface Transcript extends StrandedFeature {
}

/**
 * A possible splicing pattern for a transcript. This should contain
 * exons from the gene as features, to indicate which regions to
 * splice in, and which to splice out. The location of the splice
 * variant is from the beginning of the first exon to the end of the
 * last one. It is possible that you don't know the transcript
coordinates,
 * but you do know the SpliceVariant produced, in which case we
 * should either have a dummy parent transcript, or add it direct to
 * the gene.
 */
public interface SpliceVariant extends StrandedFeature {
  /**
   * Retrieve an mRNA sequence made by splicing together this splice
variant.
   * <P>
   * The returned sequence will contain TranslatedRegion features to
indicate
   * which bits are translated.
   */
  public Sequence getSplicedSequence();
}

/**
 * A region of mRNA that is translated into protein. This should in
 * most cases have a contiguos location.
 */
public interface TranslatedRegion {
  /**
   * Retrive the translation for this region of the mRNA.
   */
  public Sequence getTranslation()
}