[Biojava-dev] Contributing chromatogram support to BioJava
Rhett Sutphin
rhett-sutphin at uiowa.edu
Mon Mar 10 09:01:43 EST 2003
Hi Matthew,
Thanks for the quick reply. I still have some questions.
On Monday, March 10, 2003, at 05:22 AM, Matthew Pocock wrote:
> back in the good old days, we made prety much everything public. Then
> we realised that was bad. unfortunately, the realy old packages have
> not been totaly spring-cleaned for cruftily exposed API. Implementing
> symbols propperly is hard, which is why we attempt to provide all the
> tools for creating your own without writing new classes. Hey ho.
I'm guessing from this that the reason you want to keep some things
package-level is to avoid them being "published API" and thereby avoid
being required to keep their interfaces stable. That could very well
be a good reason. On the other hand, making the Simple*Symbol classes
public and defining their APIs could make implementing symbols a lot
easier. For instance, subclassing from SimpleBasisSymbol I was able to
create a functioning BasisSymbol by creating a pair of alphabets and
then using them to fill in the SimpleBasisSymbol#symbols and
SimpleBasisSymbol#matches fields.
BTW, I think that the tools for creating and using Alphabets and
Symbols are well-thought and nicely documented. I just think that they
aren't sufficient for my needs in this case, as I'll explain in a
moment.
> Ok - so you want an alphabet that contains symbols that are a DNA
> nucleotide and an integer. You can do that with some variant of the
> following:
> <useful Alphabet creation/use examples snipped>
I did do this, but I did it in the context of defining an Alphabet for
this new type of BasisSymbol called BaseCall. The reason why I did
this instead of just defining the Alphabet and using getSymbol (as you
suggest) is twofold:
1) BaseCalls need to be annotatable (upon creation). SCFs, for
instance, contain seven quality values associated with each call. The
most natural way (to me) to associate those values with each base call
is through an Annotation. Is there another way that would be better?
2) I wanted to provide a way to get at the two halves of each base call
by name. That is, instead of doing:
Symbol basecall = chromat.getBaseCalls().get(3);
Symbol callDNA = basecall.getSymbols().get(1);
int callOffset = ((IntegerAlphabet.IntegerSymbol)
basecall.getSymbols().get(2)).intValue()
You could just do:
BaseCall basecall = (BaseCall) chromat.getBaseCalls().get(3);
Symbol callDNA = basecall.getNucleotide();
int callOffset = basecall.getOffset();
The problem I am most trying to avoid is requiring users of the class
to know that the first subsymbol of a base call is the nucleotide and
the second is the peak offset. It seems like that information should
be abstracted away. Since you suggested that subclassing is not the
way to go, I thought of an alternative. I could define a class call
ChromatogramTools and give it methods like these:
public static int getBaseCallOffset(Symbol basecall) throws
IllegalSymbolException;
public static Symbol getBaseCallNucleotide(Symbol basecall) throws
IllegalSymbolException;
Which would turn the example above into:
Symbol basecall = chromat.getBaseCalls().get(3);
try {
Symbol callDNA = ChromatogramTools.getBaseCallNucleotide(basecall);
int callOffset = ChromatogramTools.getBaseCallOffset(basecall);
} catch (IllegalSymbolException ise) {
throw new BioError(ise, "Can't happen unless there is a problem
with the chromatogram implementation");
}
The thing I don't like about the alternative method is that those
"tools" methods will have to throw IllegalSymbolExceptions since the
basecall parameter's type is just Symbol (and so might not be a member
of the base call alphabet). Therefore you have to wrap every
invocation of them in a try block, even though (with a well-behaved
Chromatogram implementation) you are guaranteed the exception won't be
thrown.
The basic OO-way to get around this is to have a strictly defined type
for the parameter -- that way the execution-time IllegalSymbolException
can be a compile-time error, instead.
So it seems to me that the best way to handle this is a
BasisSymbol-implementing class for BaseCalls. It is the only way I see
to handle these two issues. Do you have another suggestion?
Rhett
BTW: I've attached the code for BaseCall in case my prose argument
above wasn't clear.-------------- next part --------------
/*
* BioJava development code
*
* This code may be freely distributed and modified under the
* terms of the GNU Lesser General Public Licence. This should
* be distributed with the code. If you do not have a copy,
* see:
*
* http://www.gnu.org/copyleft/lesser.html
*
* Copyright for this code is held jointly by the individual
* authors. These should be listed in @author doc comments.
*
* For more information on the BioJava project and its aims,
* or to join the biojava-l mailing list, visit the home page
* at:
*
* http://www.biojava.org/
*
*/
/***** PRE-RELEASE VERSION *****/
package org.biojava.bio.chromatogram;
import org.biojava.bio.Annotation;
import org.biojava.bio.symbol.Symbol;
import org.biojava.bio.symbol.BasisSymbol;
import org.biojava.bio.symbol.SimpleBasisSymbol;
import org.biojava.bio.symbol.SimpleSymbolList;
import org.biojava.bio.symbol.Alphabet;
import org.biojava.bio.symbol.FiniteAlphabet;
import org.biojava.bio.symbol.IntegerAlphabet;
import org.biojava.bio.symbol.AlphabetManager;
import org.biojava.bio.symbol.SingletonAlphabet;
import org.biojava.bio.symbol.Edit;
import org.biojava.bio.symbol.IllegalSymbolException;
import org.biojava.bio.symbol.IllegalAlphabetException;
import org.biojava.bio.seq.DNATools;
import org.biojava.utils.ListTools;
import org.biojava.utils.ChangeVetoException;
import java.util.List;
/**
* A Symbol representing a called base in the context of a chromatogram. The
* alphabet for these symbols is a cross-product of DNA and the positive
* integers.
*
* @see Chromatogram
* @author Rhett Sutphin (<a href="http://genome.uiowa.edu/">UI CBCB</a>)
*/
public class BaseCall extends SimpleBasisSymbol {
private static IntegerAlphabet PEAK_OFFSET_ALPHABET = IntegerAlphabet.getInstance();
private static Alphabet BASE_CALL_ALPHABET =
AlphabetManager.getCrossProductAlphabet(
new ListTools.Doublet(
DNATools.getDNA(),
PEAK_OFFSET_ALPHABET
)
);
public BaseCall(int peakOffset, Symbol call, Annotation annotation)
throws IllegalSymbolException, IllegalArgumentException {
super(annotation);
try {
DNATools.dnaToken(call);
} catch (IllegalSymbolException ise) {
throw new IllegalSymbolException("The symbol " + call.getName() + " (" + call.getClass().getName() + ") is not in the DNA alphabet");
}
if (peakOffset < 0) {
throw new IllegalArgumentException("Peak offset must be >= 0");
}
this.symbols = new ListTools.Doublet(call, PEAK_OFFSET_ALPHABET.getSymbol(peakOffset));
// this is probably not as efficient as it could be
// it might help if there was something like SingletonAlphabet.getInstance(AtomicSymbol)
// (or a similar method on AlphabetManager)
this.matches = AlphabetManager.getCrossProductAlphabet(
new ListTools.Doublet(call.getMatches(), new SingletonAlphabet(PEAK_OFFSET_ALPHABET.getSymbol(peakOffset)))
);
}
public BaseCall(int peakOffset, Symbol call)
throws IllegalSymbolException, IllegalArgumentException {
this(peakOffset, call, Annotation.EMPTY_ANNOTATION);
}
/** Returns the offset of the peak in the chromatogram trace which is the
* source of this base call.
* @return the offset
*/
public int getOffset() {
return ((IntegerAlphabet.IntegerSymbol)symbols.get(1)).intValue();
}
/** Returns the base (DNA nucleotide) that was called at this offset.
* @return a Symbol from the DNA alphabet
*/
public Symbol getNucleotide() {
return (Symbol) symbols.get(0);
}
/**
* Determines whether this BaseCall exactly matches another. To be equal,
* the two BaseCalls must have the same called nucleotide, the same trace
* peak offset and the same annotation. All these things are compared with
* their <code>equals</code> methods, so this is a fairly strict standard
* for equality.
*/
public boolean equals(Object other) {
if (other == this) return true;
if (!(other instanceof BaseCall)) return false;
BaseCall obc = (BaseCall) other;
if (obc.symbols.get(0).equals(this.symbols.get(0))
&& obc.symbols.get(1).equals(this.symbols.get(1))
&& obc.getAnnotation().equals(this.getAnnotation())) {
return true;
}
else {
return false;
}
}
public static Alphabet getBaseCallAlphabet() {
return BASE_CALL_ALPHABET;
}
/**
* An immutable {@link org.biojava.bio.symbol.SymbolList} for {@link BaseCall}s.
* You can get an editable copy by calling {@link SimpleSymbolList#subList},
* which is inherited from the editable {@link SimpleSymbolList}.
*/
public static class ImmutableList extends SimpleSymbolList {
/**
* Creates a new ImmutableList from the given list of base calls.
* @param baseCalls the {@link BaseCall}s to include in the new list
* @throws IllegalSymbolException when any of the symbols in the list
* aren't in the base call alphabet
*/
public ImmutableList(List baseCalls) throws IllegalSymbolException {
super(getBaseCallAlphabet(), baseCalls);
}
/**
* Overridden to enforce immutability.
* @throws ChangeVetoException always
*/
public void edit(Edit edit)
throws IndexOutOfBoundsException, IllegalAlphabetException, ChangeVetoException {
throw new ChangeVetoException("Immutable list");
}
}
}
More information about the biojava-dev
mailing list