[Biojava-dev] Plans for next biojava release - modularization

Scooter Willis HWillis at scripps.edu
Tue May 12 00:34:11 UTC 2009


Andreas

This is what I put together for the tree code as the interface. In the loop code of the algorithm you simply call the appropriate progress message where it could be cleaned up to have one progress method and a float for percentage complete. Passing the instance of NJTree was required for this specific case because all the work was done when the NJTree class was instantiated. It really should be cleaned up so that it has a process method and is runnable in a thread if needed. The progress listener could be generic for all long running classes. I have wrapped the NJTree code in a TreeConstructor class which bridges the biojava framework and allows the NJTree code to be replaced by something that is compatible with the BioJava open source license if needed. I am still playing around with performance optimizations and need to see if Jalview would contribute the NJTree code to BioJava. If not, I would do my own implementation as the algorithm is not difficult.

I was also thinking that we could have Java code that provides functionality such as Blast by making a web service call to an external publicly supported service. Instead of parsing Blast results flat files you can make a call to an external service http://www.ebi.ac.uk/Tools/webservices/services/wublast via web services and get well structured results. 

Scooter 


package org.biojavax.phylo;

import org.biojavax.phylo.jalview.NJTree;

/**
 *
 * @author willishf
 */
public interface NJTreeProgressListener {
    public void progress(NJTree njtree,String state, int percentageComplete);
    public void progress(NJTree njtree,String state, int currentCount,int totalCount);
    public void complete(NJTree njtree);
    public void canceled(NJTree njtree);
}

**********************************************************************************************
This code could be abstracted out into a base class or simply added into a class that needs to 
notify external listeners
**********************************************************************************************
    Vector<NJTreeProgressListener> progessListenerVector = new Vector<NJTreeProgressListener>();

    public void addProgessListener(NJTreeProgressListener treeProgessListener) {
        if (treeProgessListener != null) {
            progessListenerVector.add(treeProgessListener);
        }
    }

    public void removeProgessListener(NJTreeProgressListener treeProgessListener) {
        if (treeProgessListener != null) {
            progessListenerVector.remove(treeProgessListener);
        }
    }

    public void broadcastComplete() {
        for (NJTreeProgressListener treeProgressListener : progessListenerVector) {
            treeProgressListener.complete(this);
        }
    }

    public void updateProgress(String state, int percentage) {
        for (NJTreeProgressListener treeProgressListener : progessListenerVector) {
            treeProgressListener.progress(this,state, percentage);
        }
    }

    public void updateProgress(String state, int currentCount, int totalCount) {
        for (NJTreeProgressListener treeProgressListener : progessListenerVector) {
            treeProgressListener.progress(this,state, currentCount, totalCount);
        }
    }

***************************************************************************************











/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */
package org.biojavax.phylo;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Vector;
import org.biojava.bio.BioException;
import org.biojavax.phylo.jalview.NJTreeNew;
import org.biojavax.phylo.jalview.TreeConstructionAlgorithm;
import org.biojavax.phylo.jalview.TreeType;

import org.biojava.bio.seq.*;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequenceIterator;
import org.biojavax.phylo.jalview.NJSequence;
import org.biojavax.phylo.jalview.NJTree;

/**
 *
 * @author willishf
 */
public class TreeConstructor extends Thread {

   
    NJTree njtree = null;
    NJSequence[] sequences = null;
    TreeType treeType;
    TreeConstructionAlgorithm treeConstructionAlgorithm;
    NJTreeProgressListener treeProgessListener;

    public TreeConstructor(SequenceIterator iter, TreeType _treeType, TreeConstructionAlgorithm _treeConstructionAlgorithm, NJTreeProgressListener _treeProgessListener) {
        treeType = _treeType;
        treeConstructionAlgorithm = _treeConstructionAlgorithm;
        treeProgessListener = _treeProgessListener;
        ArrayList<NJSequence> sequenceArray = new ArrayList<NJSequence>();
        while (iter.hasNext()) {
            try {
                Sequence seq = iter.nextSequence();
                NJSequence njsequence = new NJSequence(seq.getName(), seq.seqString());
                sequenceArray.add(njsequence);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        sequences = new NJSequence[sequenceArray.size()];
        sequenceArray.toArray(sequences);
    }

    public TreeConstructor(Vector<RichSequence> sequenceVector, TreeType _treeType, TreeConstructionAlgorithm _treeConstructionAlgorithm, NJTreeProgressListener _treeProgessListener) {
        treeType = _treeType;
        treeConstructionAlgorithm = _treeConstructionAlgorithm;
        treeProgessListener = _treeProgessListener;
        sequences = new NJSequence[sequenceVector.size()];
        int index = 0;
        for (RichSequence seq : sequenceVector) {

            NJSequence njsequence = new NJSequence(seq.getName(), seq.seqString());
            sequences[index] = njsequence;
            index++;
        }

    }

    public void cancel(){
        if(njtree != null)
            njtree.cancel();
    }

    public void process() throws Exception {
        njtree = new NJTree(sequences, treeType, treeConstructionAlgorithm, treeProgessListener);
    }

    @Override
    public void run() {
        try {
            process();
        } catch (Exception e) {
            e.printStackTrace();

        }
    }

    public String getNewickString() {
        if (njtree != null) {
            return njtree.toString();
        } 
        return "";
    }

    public static void main(String[] args) {
        if (args.length == 0) {
            args = new String[3];
            args[0] = "C:\\MutualInformation\\project\\hiv\\hiv-genes-genome.fasta";


        }
        try {
            //prepare a BufferedReader for file io
            BufferedReader br = new BufferedReader(new FileReader(args[0]));
            SimpleNamespace ns = new SimpleNamespace("biojava");

            // You can use any of the convenience methods found in the BioJava 1.6 API
            RichSequenceIterator rsi = RichSequence.IOTools.readFastaProtein(br, ns);

            long readTime = System.currentTimeMillis();
            TreeConstructor treeConstructor = new TreeConstructor(rsi, TreeType.NJ, TreeConstructionAlgorithm.PID, new ProgessListenerStub());
            treeConstructor.process();
            long treeTime = System.currentTimeMillis();
            String newick = treeConstructor.getNewickString();




            System.out.println("Tree time " + (treeTime - readTime));
            System.out.println(newick);

        } catch (FileNotFoundException ex) {
            //can't find file specified by args[0]
            ex.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }

    }
}




-----Original Message-----
From: andreas.prlic at gmail.com on behalf of Andreas Prlic
Sent: Mon 5/11/2009 6:53 PM
To: Scooter Willis
Cc: biojava-dev
Subject: Re: [Biojava-dev] Plans for next biojava release - modularization
 
Hi Scooter,

I like the idea of supporting multiple threads and parallelizing code
where possible. Is there a reference implementation that you would
recommend for how progress listeners should be implemented?  I suppose
the neighbor joining code you mention below is not part of biojava...

Andreas











On Mon, May 11, 2009 at 6:50 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> Andreas
>
> Another theme that should be considered is providing a multi-thread
> version of any module with long run time. This would have a couple
> elements. A progress listener interface should be standard where core
> code would update progress messages to listeners that can be used by
> external code to display feedback to the user. I did this with the
> Neighbor Joining code for tree construction and it provides needed
> feedback in a GUI. If not the user gets frustrated because they don't
> know the code they are about to execute may take 10 minutes or 8 hours
> to complete and they think the software is not working. The reverse is
> also true for canceling an operation where you want to have core code
> stop processing a long running loop. Once the code has completed then
> the listener interface for process complete is called allowing the next
> step in the external code to continue. The developer would have the
> choice to call the "process" method or run it in a thread and wait for
> the callback complete method to be called.
>
> This is the first step in the ability to have the core/long running
> processes take advantage of multiple threads to complete the
> computational task faster. Not all code can be parallelized easily but
> if the algorithm can take advantage of running in parallel then it
> should. This then opens up a couple of cloud computing frameworks that
> extend the multi-threaded concepts in Java across a cluster
> http://www.terracotta.org/. If we put an emphasis on having code that
> runs well in a thread we are one step closer to an architecture that can
> run in a cloud. The computational problems are only going to get bigger
> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> computational IO cycles are going to be cheap as long as the
> software/libraries can easily take advantage of it.
>
> Thanks
>
> Scooter
>
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> Prlic
> Sent: Monday, May 11, 2009 12:27 AM
> To: biojava-dev
> Subject: [Biojava-dev] Plans for next biojava release - modularization
>
> Hi biojava-devs,
>
> It is time to start working on the next biojava release.  I  would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release.  Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
>
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
>
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
>
> Let me know what you think about this.
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>





More information about the biojava-dev mailing list