[Biojava-l] Wrapping SimpleGappedSequence

Richard Holland holland at ebi.ac.uk
Mon Nov 26 12:55:23 UTC 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have made the changes you suggest below in CVS. Hopefully it will work
for you now.

cheers,
Richard

Ditlev Egeskov Brodersen wrote:
> Dear Richard and all,
> 
>   I've been dissecting the delegation problem encountered when instantiating
> SimpleGappedSequence(Sequence) with an already gapped sequence. The
> constructor calls the parent SimpleGappedSymbolList(), which in Richard's
> CVS update of 161107 now contains a separate overloaded constructor for the
> gapped case:
> 
>   public SimpleGappedSymbolList(GappedSymbolList gappedSource)
> 
>   However, when instantiating a new SimpleGappedSequence based on an
> existing gapped sequence (with several blocks), the blocks were lost. 
> 
>   After checking the path of code execution it appeared that for some
> reason, the old SimpleGappedSymbolList(SymbolList) was called. So I modified
> SimpleGappedSequence.java to include an overloaded constructor also for the
> descendant class, identical to the other constructor but with a
> GappedSequence argument:
> 
>   public SimpleGappedSequence(GappedSequence seq) {
>     super(seq);
>     this.sequence = seq;
>     createOnUnderlying = false;
>   }
> 
>   Now, the correct parent constructor
> (SimpleGappedSymbolList(GappedSymbolList)) was called. However, there are
> two other problems with the new SimpleGappedSymbolList constructor that
> needs to be corrected for it to work as expected: First, the initial
> introduction of a single, large block is missing from the new code, so
> insert:
> 
>   Block b = new Block(1, length, 1, length);
>   blocks.add(b);
> 
>   Secondly, the code for transferring the gaps from the sequence string need
> to use two separate indices, otherwise the gaps will be placed wrongly
> because their position is affected by previously inserted gaps:
> 
>   int n=1;
>   for(int i=1;i<=this.length();i++) {
>     if(this.alpha.getGapSymbol().equals(gappedSource.symbolAt(i)))
>       this.addGappInSource(n);
>     else
>       n++;
> 
>   In other words, the index giving the position of the gaps should only
> increment when there are NO gaps at the corresponding position in the gapped
> string.
> 
>   Following these changes, the GappedSequenceTest program from last week now
> works as expected:
> 
>  aSymbolList = MSE--KLMPRT---TWAKG
>  aSequence   = MSE--KLMPRT---TWAKG
> 
>  Gaps are not parsed when a SimpleGappedSequence is constructed from a 
>  gapped Sequence object:
>  aGapped     = MSE--KLMPRT---TWAKG
>  Gapped position 10 = Plain position 10
> 
>  aSymbolList = MSEKLMPRTTWAKG
>  aSequence   = MSEKLMPRTTWAKG
> 
>  Gaps introduced through addGapsInSource work ok:
>  aGapped     = MS--EKLMPR---TTWAKG
>  Gapped position 10 = Plain position 8
> 
>  Now a new SimpleGappedSequence object is created from the previous one:
>  aGapped2    = MS--EKLMPR---TTWAKG
>  Gapped position 10 = Plain position 8
> 
>   -- Ditlev
> 
> --
>  
> Ditlev E. Brodersen, Ph.D.
> Lektor, Associate Professor
>  
> Department of Molecular Biology   Office:  +45 89425259
> University of Aarhus              Lab:     +45 89425022
> Gustav Wieds Vej 10c              Fax:     +45 86123178
> DK-8000 Aarhus C                  Email:   deb at mb.au.dk
> Denmark                           Lab WWW: www.bioxray.dk/~deb
> 
> 
>  -----Original Message-----
>  From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-
>  bounces at lists.open-bio.org] On Behalf Of Richard Holland
>  Sent: 18 November 2007 18:12
>  To: Ditlev Egeskov Brodersen
>  Cc: biojava-l at biojava.org
>  Subject: Re: [Biojava-l] Wrapping SimpleGappedSequence
>  
>  Interesting stuff. I'm not sure why it isn't working so I'll have to
>  have
>  a closer look.
>  
>  I'm currently on annual leave but will investigate when I return (Nov
>  27th).
>  
>  cheers,
>  Richard
>  
>  On Sun, November 18, 2007 10:50 am, Ditlev Egeskov Brodersen wrote:
>   Hi Richard,
>  
>     I thought that was also correct what you say, but I can't get it to
>   work.
>   Below is a small test program to check this. First, I create a
>   SimpleGappedSequence through Text with
>   gaps-SymbolList-Sequence-GappedSequence. Gaps are there but not
>   "understood", as expected. Next, I create the same sequence non-
>  gapped in
>   the above way, then introduce gaps with addGapsInSource. A gapped
>  location
>   is now properly translated to a non-gapped sequence position.
>  Finally, I
>   create a new SimpleGappedSequence based on the working one - as you
>  can
>   see
>   the gaps are still there but not "understood"...
>  
>   aSymbolList = MSE--KLMPRT---TWAKG
>   aSequence   = MSE--KLMPRT---TWAKG
>  
>   Gaps are not parsed when a SimpleGappedSequence is constructed from a
>   gapped
>   Sequence object:
>   aGapped     = MSE--KLMPRT---TWAKG
>   Gapped position 10 = Plain position 10
>  
>   aSymbolList = MSEKLMPRTTWAKG
>   aSequence   = MSEKLMPRTTWAKG
>  
>   Gaps introduced through addGapsInSource work ok:
>   aGapped     = MS--EKLMPR---TTWAKG
>   Gapped position 10 = Plain position 8
>  
>   Now a new SimpleGappedSequence object is created from the previous
>  one:
>   aGapped2    = MS--EKLMPR---TTWAKG
>   Gapped position 10 = Plain position 10
>  
>   This should have been compiled with the new biojava.jar of 161107
>  (updated
>   via CVS), but perhaps I made a mistake updating?
>  
>   Any clues?
>  
>   Thanks,
>  
>     Ditlev
>  
>   ---
>  
>   package gappedsequencetest;
>  
>   import org.biojava.bio.*;
>   import org.biojava.bio.seq.*;
>   import org.biojava.bio.seq.impl.*;
>   import org.biojava.bio.symbol.*;
>  
>   public class Main {
>  
>       public static void main(String[] args) {
>           SymbolList aSymbolList = null;
>           try {
>               aSymbolList =
>   ProteinTools.createProtein("MSE--KLMPRT---TWAKG");
>  
>           }
>           catch(BioException ex) {}
>  
>           System.out.println("aSymbolList = " +
>  aSymbolList.seqString());
>  
>           Sequence aSequence = new SimpleSequence(aSymbolList, "",
>   "mySequence", null);
>           System.out.println("aSequence   = " + aSequence.seqString() +
>   "\n");
>  
>           SimpleGappedSequence aGapped = new
>   SimpleGappedSequence(aSequence);
>           System.out.println("Gaps are not parsed when a
>   SimpleGappedSequence
>   is constructed from a gapped Sequence object:");
>           System.out.println("aGapped     = " + aGapped.seqString());
>           System.out.println("Gapped position 10 = Plain position " +
>   aGapped.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
>  
>           try {
>               aSymbolList =
>  ProteinTools.createProtein("MSEKLMPRTTWAKG");
>           }
>           catch(BioException ex) {}
>  
>           System.out.println("aSymbolList = " +
>  aSymbolList.seqString());
>  
>           aSequence = new SimpleSequence(aSymbolList, "", "mySequence",
>   null);
>           System.out.println("aSequence   = " + aSequence.seqString() +
>   "\n");
>  
>           aGapped = new SimpleGappedSequence(aSequence);
>           aGapped.addGapsInSource(9, 3);
>           aGapped.addGapsInSource(3, 2);
>           System.out.println("Gaps introduced through addGapsInSource
>  work
>   ok:");
>           System.out.println("aGapped     = " + aGapped.seqString());
>           System.out.println("Gapped position 10 = Plain position " +
>   aGapped.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
>  
>           SimpleGappedSequence aGapped2 = new
>  SimpleGappedSequence(aGapped);
>           System.out.println("Now a new SimpleGappedSequence object is
>   created
>   from the previous one:");
>           System.out.println("aGapped2    = " + aGapped2.seqString());
>           System.out.println("Gapped position 10 = Plain position " +
>   aGapped2.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
>       }
>  
>   }
>  
>   --
>  
>   Ditlev Egeskov Brodersen
>   Lektor
>   Bakkefaldet 30, Hasle
>   8210 Århus V
>  
>   www.lindeman-brodersen.dk
>  
>  
>   -----Original Message-----
>   From: Richard Holland [mailto:holland at ebi.ac.uk]
>   Sent: 16 November 2007 13:46
>   To: Ditlev Egeskov Brodersen
>   Cc: biojava-l at biojava.org
>   Subject: Re: Wrapping SimpleGappedSequence
>  
> SimpleGappedSequence extends SimpleGappedSymbolList, and the
> constructor
> delegates to the SimpleGappedSymbolList constructor.
> 
> When you extend SimpleGappedSequence you should delegate in your new
> constructor to the existing SimpleGappedSequence constructor, which
>>  in
> turn will delegate as above and preserve the gaps.
> 
> By passing any object which implements GappedSymbolList to the
> SimpleGappedSequence constructor, e.g. SimpleGappedSequence or
> SimpleGappedSymbolList, it will automatically choose the new
> constructor
> from SimpleGappedSymbolList which you hopefully should be able to
>>  see
> in
> the code you have just checked out. If passed any other
> non-GappedSymbolList object, it will use the old constructor that
> already existed from before.
> 
> cheers,
> Richard
> 
> Ditlev Egeskov Brodersen wrote:
>  Hi again,
> 
>    I updated CVS and got the new SimpleGappedSymbolList class, but
> there
>  seems to be no changes to the SimpleGappedSequence class, which is
> the one I
>  need to extend...have I missed something?
> 
>    Ditlev
> 
>  --
> 
>  Ditlev E. Brodersen, Ph.D.
>  Lektor, Associate Professor
> 
>  Department of Molecular Biology   Office:  +45 89425259
>  University of Aarhus              Lab:     +45 89425022
>  Gustav Wieds Vej 10c              Fax:     +45 86123178
>  DK-8000 Aarhus C                  Email:   deb at mb.au.dk
>  Denmark                           Lab WWW: www.bioxray.dk/~deb
> 
> 
>  -----Original Message-----
>  From: Richard Holland [mailto:holland at ebi.ac.uk]
>  Sent: 16 November 2007 11:47
>  To: Ditlev Egeskov Brodersen
>  Cc: biojava-l at biojava.org
>  Subject: Re: Wrapping SimpleGappedSequence
> 
>  The easiest way is simply for me to alter the constructor to
>  SimpleGappedSequence (and equivalently to SimpleGappedSymbolList)
>>  to
>  copy all gaps if passed another instance of GappedSymbolList as
>>  the
>  parameter. I've just done this in CVS so you should be able to
>>  update
>  your copy and observe the new behaviour.
> 
>  cheers,
>  Richard
> 
>  Ditlev Egeskov Brodersen wrote:
>  Hi again,
> 
>    thanks for the info - will do the check just to be proper. I
> have
>  another
>  question: In my application, I would like to wrap the retrieved
>  SimpleGappedSequence objects inside another object that extends
> the
>  functionality with application-specific stuff. Ideally, I would
>>  do
>  this by
>  extending the SimpleGappedSequence object and create it by
>>  passing
>  the
>  SimpleGappedSequence from the alignment import to the
>>  constructor
> of
>  the
>  parent, like so:
> 
>    class AlignedSequence extends SimpleGappedSequence {
>      public AlignedSequence(SimpleGappedSequence aGapped) {
>        super(aGapped);
>      }
> 
>      ..custom stuff..
>    }
> 
>  However, the problem is that there is only one constructor for
>>  the
>  SimpleGappedSequence, one which takes a simple Sequence object.
>>  I
> can
>  pass
>  the derived class alright, but all gap information is lost
>>  again,
>  presumably
>  because the SimpleGappedSequence constructor just takes out the
>  seqString()
>  and puts it into its own sequence object.
> 
>  Shouldn't the constructor of the SimpleGappedSequence class
> recognise
>  when a
>  derived (and gapped) sequence object is passed, and process it
>  accordingly?
>  As it stands, I am forced to include the SimpleGappedSequence
>>  as a
>  private
>  member of the AlignedSequence class, which is not near as nice
> since
>  all
>  statement using the class will have to do something like
> 
>    class AlignedSequence extends SimpleGappedSequence {
>      private SimpleGappedSequence gapped_sequence;
> 
>      public AlignedSequence(SimpleGappedSequence aGapped) {
>        gapped_sequence = aGapped;
>      }
> 
>      public SimpleGappedSequence getGappedSequence() {
>        return(gapped_sequence);
>    }
> 
>      ..custom stuff..
>    }
> 
>    ...
> 
>    AlignedSequence aAligned = new AlignedSequence(aGapped);
>    aAligned.getGappedSequence().seqString();
> 
>  rather than simply:
> 
>    AlignedSequence aAligned = new AlignedSequence(aGapped);
>    aAligned.seqString();
> 
>  In other words, is there any solution with the current setup
>>  that
>  would
>  allow me to extend SimpleGappedSequence and not loose the gap
>  information?
>  --  Ditlev
> 
>  --
> 
>  Ditlev E. Brodersen, Ph.D.
>  Lektor, Associate Professor
> 
>  Department of Molecular Biology   Office:  +45 89425259
>  University of Aarhus              Lab:     +45 89425022
>  Gustav Wieds Vej 10c              Fax:     +45 86123178
>  DK-8000 Aarhus C                  Email:   deb at mb.au.dk
>  Denmark                           Lab WWW: www.bioxray.dk/~deb
> 
> 
>  -----Original Message-----
>  From: Richard Holland [mailto:holland at ebi.ac.uk]
>  Sent: 16 November 2007 10:50
>  To: Ditlev Egeskov Brodersen
>  Cc: biojava-l at biojava.org
>  Subject: Re: [Biojava-l] Parsing exising gaps
> 
>    The returned gapped sequences are all properly set up with
> gaps,
>  name etc.
>  But as for other users, I think there may be some problems,
> since
>  the
>  SimpleAlignment object only has a general symbol list
>>  iterator,
>  the
>  user
>  will have to cast each statement extracting a sequence
>>  object,
> and
> 
>        SimpleSequence aSimple =
> (SimpleSequence)aSequences.next();
> 
>  returns an ClassCastException at run time. So old code might
> not
>  run
>  with
>  the update as far as I can see.
>  This is true. However, such code would be unsupported by us as
>>  the
>  API
>  clearly states that SimpleAlignment returns SymbolList
>>  instances,
> and
>  does not make any guarantees about the exact implementation
> details
>  of
>  the objects it returns. To attempt to cast it to anything other
> than
>  SymbolList would be a mistake! (Although actually it is now
> returning
>  a
>  guarantee of GappedSymbolList, which is what your code can now
> take
>  advantage of). To assume it will return SimpleSequence is
>>  outside
> the
>  behaviour defined by the API and therefore should not be relied
> upon.
> 
>  A more correct behaviour would be to test each item returned:
> 
>  	SymbolList symlist = aSequences.next();
>  	if (symlist instanceof SimpleSequence) {
>  		SimpleSequence seq = (SimpleSequence)symlist;
>  		// Do simple-sequence stuff
>  	} else {
>  		// Do something else!
>  	}
> 
>  In future, I will modify the API to change the SymbolList
> guarantee
>  to
>  a
>  GappedSymbolList guarantee, but I can't do this right now as
>>  this
>  really
>  would break everyone's code!
> 
>  We are currently planning a redesign as you may be aware, so
> issues
>  like
>  this will hopefully be resolved as part of that process. For a
> start,
>  if
>  we use Java 5 generics in future as we plan, we can strictly
> specify
>  what kinds of objects will be returned by things such as the
>  alignment
>  API, making it easier for us to enforce API-compliant behaviour
>>  in
>  user's code.
> 
>  cheers,
>  Richard
> 
> --
> Richard Holland (BioMart)
> EMBL EBI, Wellcome Trust Genome Campus,
> Hinxton, Cambridgeshire CB10 1SD, UK
> Tel. +44 (0)1223 494416
> 
> http://www.biomart.org/
> http://www.biojava.org/

>  --
>  Richard Holland
>  BioMart (http://www.biomart.org/)
>  EMBL-EBI
>  Hinxton, Cambridgeshire CB10 1SD, UK

>  _______________________________________________
>  Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>  http://lists.open-bio.org/mailman/listinfo/biojava-l



- --
Richard Holland (BioMart)
EMBL EBI, Wellcome Trust Genome Campus,
Hinxton, Cambridgeshire CB10 1SD, UK
Tel. +44 (0)1223 494416

http://www.biomart.org/
http://www.biojava.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHSsI64C5LeMEKA/QRAg21AKCieEvT2KaWBFdqLFUtxazhHXmD2wCgiRwk
Bz79hrJxD/eZrrCUXUAh758=
=0Jpp
-----END PGP SIGNATURE-----



More information about the Biojava-l mailing list