[Biojava-l] GappedSymbol lists and Features

Dudgeon, Tim TDudgeon@OSIP.com
Fri, 1 Mar 2002 10:10:21 -0000


I've been doing some work on the multiple alignment stuff in biojava. I've
found that the existing org.biojava.bio.seq.io.MSFAlignmentFormat class
doesn't really seem to work, and have been re-engineering it to work and be
more generic. I'll contribute this work when complete.

My question relates to how alignments should work. Currently
org.biojava.bio.symbol.Alignment is composed of
org.biojava.bio.symbol.SymbolList objects, which seems very reasonable. To
specify an alignment with gaps, you would presumably use a
org.biojava.bio.symbol.GappedSymbolList, which is a SymbolList that allows
gaps to be inserted/deleted. This GappedSymbolList seems to wrap another
SymbolList (without gaps) that contains the underlying sequence (a
SymbolList could be generated that contained gaps, but as the sequence is
immutatable you wouldn't be able to add/delete gaps, hence the need for the
GappedSymbolList).

Now to my point. I want to be able to use an alignment that contains
sequences that can have features. I want to be able to add a feature (e.g. a
prosite motif) to the underlying sequence, but be able access this both
through the underlying coordinates of the protein and the coordinates within
the alignment i.e both the GappedSymbolList and the SymbolList it wraps.

Hence to do this I would need to create the org.biojava.bio.symbol.Alignment
that contained a GappedSymbolList that wraps a SimpleSequence to which
Features could be attached. However I see no  way of getting hold of the
SimpleSequence from the GappedSymbolList that wraps it. Are there any
solutions here?

many thanks

Tim



--------------------------------------------
Dr. Tim Dudgeon
OSI Pharmaceuticals, Watlington Road, Oxford, OX4 6LT, UK
Tel: +44 (01865) 871 244
email: tdudgeon@osip.com