[Biojava-dev] The future of BioJava

Andy Yates ayates at ebi.ac.uk
Fri Sep 21 09:20:18 UTC 2007


> 
> Finally, I think SymbolLists (or whatever they get called) should
> implement more of the methods found in String to make them look more
> like Strings.  Ideally we should think about implementing some of the
> methods that Groovy likes to use for operator overloading. If we do
> this is would be possible to concatenate two sequences in groovy by
> doing this (I may have the syntax wrong).
> 
> Seq3 = Seq1 + Seq2

Yup that seems about right. It's on of the nice things about groovy that 
you can overload the operators and create something which approaches an 
in-language DSL (can't really call it a true DSL since it's constrained 
by the Groovy language). But anyway you can start mucking around with 
the operators to get things like:

fasta = new Fasta('id','AAAAAA')
fasta_output = new FastaWriter('some_location');
fasta_output << fasta

Assuming that the Fasta class would represent a Fasta record & the 
FastaWriter is just that; you can begin to write some very nice & tight 
code which just looks nice to use :).

> 
> The other issue with SymbolLists is that they are not intuitive to
> construct because they are not so bean like. This is not just a
> problem for newbies but also a major hinderance to the use of JEE,
> Spring, JAXB and other important frameworks. It should be possible to
> do this:
> 
> SymbolList sl = new SymbolList();
> sl.setName("AB123456");
> sl.setSequence(seqString);

Yup I'll agree with that.

> 
> The final hinderance to the use of JEE is serialization. If we keep
> Symbols flyweight (singleton) we need to make this bullet proof from
> the start. It is also practicaly impossible to make something a bean
> and make it a Singleton, some careful thought is required.  If we keep
> symbols behind the scenes they may not need to be so bean like.

I think we may need a bit of both. I would suggest something like an 
interface which back onto Symbol. Then collections of symbols are 
actually enums e.g.

public interface Symbol {
	String toString();
}

public enum DNA implements Symbol, java.io.Serializable {
	A,
	C,
	G,
	T;

	public String toString() {
		return this.name().toLowerCase();
	}

	private Object readResolve () throws java.io.ObjectStreamException {
		DNA symbol = null;
		for(DNA dna: values()) {
			if(dna.toString().equals(this.toString()) {
				symbol = dna;
				break;
			}
		}
		return symbol;
	}
}

The read resolve needs to go in here to make sure this is bullet proof 
to serialization. Otherwise we end up in a situation where you can 
serialize an enum, deserialize it & then you'll end up where 
deserialzied enum is not equal (using ==) to the statically available enum.

 From what I've done previously using Enums are a very nice way of 
working with static constants. However they are very hard to extend so 
they're fine for known constants like DNA (don't think we're going to 
stumble onto a new nucleotide) but the symbol interface does mean that 
people can extend the symbol concept if need be.



More information about the biojava-dev mailing list