[Biojava-l] circular sequences

Mark Schreiber mark_s@sanger.otago.ac.nz
Mon, 29 Jan 2001 09:29:52 +1300 (NZDT)


On Sun, 28 Jan 2001, Thomas Down wrote:

> On Sun, Jan 28, 2001 at 09:40:54PM +1300, Mark Schreiber wrote:
> > On Fri, 26 Jan 2001, Thomas Down wrote:
> > 
> > > 
> > > Well, there's no problem doing a CircularSymbolList which overrides
> > > subList and subStr (I'd be tempted to write a class which gives
> > > a circularized view onto any underlying SymbolList, rather than
> > > subclassing a specific implementation).  Point to debate: should
> > > symbolAt(1001) for a 1000-symbol circular sequence return the
> > > value of symbolAt(1), or is this an error?
> > >
> > 
> > It depends. In some ways it would be nice if iterators etc could just
> > carry on around the sequence although at some point it would get a bit
> > stupid unless a signal is given to signify the end. Might just be best to
> > throw an exception and let the implementing program decide what to do
> > about it. On the other hand you could probably use the standard symbol
> > list in this way. 6 of one ....
> > 
> > My gut feeling is that we should allow indexing of residues greater than
> > the length of the sequence and if need be less than one. Zero in this
> > instance should be an invalid argument.
> 
> Yeah, that sounds about right.  /me still wishes sequences
> were indexed from zero though (the one thing I still miss from
> my pre-biojava sequence library).
> 
> > > Circular Sequence objects are slightly more of a pain, since I
> > > might want a Feature running from, say, 900 - 100.  Not sure
> > > what the best way to handle this is -- it's not a case recognized
> > > by out current Location objects.
> > 
> > Maybe make a subclass of stranded feature since only DNA can be circular,
> > can anyone see a reason why not.
> 
> Circularity is an issue that's orthogonal to the current system
> of feature types.  Certainly, there's nothing to say that all
> features on DNA are StrandedFeatures (a CpG island, for instance,
> is fairly clearly not stranded, at least in an idealized world).  Also,
> we'll probably want to use all the other feature types on circular
> sequences (Exon, Transcript, whatever).
>

Good point
 
> My vote goes for Matthew's solution of having CircularLocations, and
> keeping the circularity issue out of the feature system itself.  It's
> not a 100% clean solution, but I think it should work out okay.
>

I'll look into making a circular location.

So far I have started making a subclass of ViewSequence to act as a view
onto a linear SymbolList. Can anyone see problems with this?

I am a little uncomfortable with the fact that all added features will not
be added to the underlying sequence so the view and sequence must always
be kept together (as long as they are circular anyhow).

A thought for the parsing experts, what is the easiest way to interpret a
circular sequence and features and then build them from say a GenBank
file?

Mark

 
>   Thomas.
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mark Schreiber			Ph: 64 3 4797875
Rm 218				email mark_s@sanger.otago.ac.nz
Department of Biochemistry	email m.schreiber@clear.net.nz
University of Otago		
PO Box 56
Dunedin
New Zealand
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~