[Biojava-dev] First draft of a remote blast service class
Andy Yates
ayates at ebi.ac.uk
Thu Jun 11 15:53:35 UTC 2009
Really the map/enum pattern is nearly knocking on the door of the
prototype pattern & is a very good way to go for this kind of system
where target values are never set in stone (well only for a particular
release of a service).
If anyone is interested there's a very good bit of information from:
http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html
Andy
Richard Holland wrote:
> Good stuff! My 2p's worth:
>
> setSequence() should be overloaded to accept all forms of possible
> sequence input - whatever is decided on as the standard way of
> referencing sequence data in BJ3. The original plan for BJ3 was to allow
> String/CharSequence and List<Symbol> (see
> http://www.biojava.org/wiki/BioJava3:HowTo )
>
> setAdvancedOptions() should not accept a String, but rather a Properties
> or a Map<String,String>, where the keys of the Map/Properties are
> restricted to a range of acceptable values determined (and published,
> maybe as an enum?) by each of the implementation classes (e.g.
> RemoteQBlastService). The implementation class then uses this to
> construct the call string. The reason for doing it this way is that (a)
> it allows the parameters to be verified by checking them against a known
> list of allowable key/values, and (b) it allows for non-URL based remote
> requests to be constructed from the values, e.g. SOAP calls.
>
> I would also replace the static int HTML/TEXT/XML with an enum as
> numeric constants are sometimes a Bad Thing.
>
> The setProgram() method in my mind is specific to Blast, as opposed to
> being a generic Pairwise Alignment concept. Therefore it might be better
> to move this to a Blast-specific sub-interface or make it only appear in
> the implementation classes that refer to Blast.
>
> Finally, the JavaDocs for the various set() methods are incorrect -
> they're all mostly the same in fact! :)
>
> But overall it looks good.
>
> cheers,
> Richard
>
> On Thu, 2009-06-11 at 09:52 -0400, Sylvain Foisy wrote:
>> Hi to all,
>>
>> I've been working on this for the past week or so and after discussing this
>> with Andreas, I am putting my code here for critical review. I'll put this
>> stuff in biojava-live as soon as Andreas can fix my SVN access.
>>
>> First, an interface called RemotePairwiseAlignementSerivce defines the basic
>> components of a remote service: sequence/database/progam/run options/output
>> options. RemoteQBlastService implements this interface and runs remote
>> Qblast requests and creates output in either text, XML or HTML. At present
>> time, regular blastall programs work, no blastpgp/megablast support yet.
>>
>> I'll need some guidance to make it work on other type of web services like
>> EBI.
>>
>> Best regards
>>
>> Sylvain
>>
>> ===================================================================
>>
>> Sylvain Foisy, Ph. D.
>> Consultant Bio-informatique / Bioinformatics
>> Diploide.net - TI pour la vie / IT for Life
>>
>> Courriel: sylvain.foisy at diploide.net
>> Web: http://www.diploide.net
>> Tel: (514) 893-4363
>> ===================================================================
>>
>> import java.io.InputStream;
>>
>> import org.biojava.bio.BioException;
>> /**
>> * This interface specifies minimal information needed to execute a pairwise
>> alignment on a remote service.
>> *
>> * Example of service: QBlast service at NCBI
>> * Web Service at EBI
>> *
>> * @author Sylvain Foisy
>> * @since 1.8
>> *
>> */
>> public interface RemotePairwiseAlignementService {
>>
>> /**
>> * This field specifies that the output format of results
>> * is text.
>> *
>> */
>> public static final int TEXT = 0;
>>
>> /**
>> * This field specifies that the output format of results
>> * is XML.
>> *
>> */
>> public static final int XML = 1;
>>
>> /**
>> * This field specifies that the output format of results
>> * is HTML.
>> *
>> */
>> public static final int HTML = 2;
>>
>> /**
>> * Setting the database to use for doing the pairwise alignment
>> *
>> * @param db: a <code>String</code> with a valid database ID for the
>> service used.
>> *
>> */
>> public void setDatabase(String db);
>>
>> /**
>> * Setting the sequence to be align for this for this request
>> *
>> * @param seq: a <code>String</code> with a sequence to be aligned.
>> *
>> */
>> public void setSequence(String seq);
>>
>> /**
>> * Setting the program to use for this pairwise alignment
>> *
>> * @param prog: a <code>String</code> with a valid database ID for the
>> service used.
>> *
>> */
>> public void setProgram(String prog);
>>
>> /**
>> * Setting all other options to use for this pairwise alignment
>> *
>> * @param db: a <code>String</code> with a valid database ID for the
>> service used.
>> *
>> */
>> public void setAdvancedOptions(String str);
>>
>> /**
>> * Doing the actual analysis on the instantiated service
>> *
>> * @throws BioException
>> */
>> public void executeSearch() throws BioException;
>>
>> /**
>> * Getting the actual alignment results from this instantiated service
>> *
>> * @return : an <code>InputStream</code> with the actual alignment
>> results
>> * @throws BioException
>> */
>> public InputStream getAlignmentResults() throws BioException;
>> }
>>
>> import java.io.BufferedReader;
>> import java.io.IOException;
>> import java.io.InputStream;
>> import java.io.InputStreamReader;
>> import java.io.OutputStreamWriter;
>> import java.net.MalformedURLException;
>> import java.net.URL;
>> import java.net.URLConnection;
>>
>> import org.biojava.bio.BioException;
>>
>> /**
>> * RemoteQBlastService - A simple way of submitting BLAST request to the
>> QBlast
>> * service at NCBI.
>> *
>> * <p>
>> * NCBI provides a Blast server through a CGI-BIN interface.
>> RemoteQBlastService simply
>> * encapsulates an access to it by giving users access to get/set methods to
>> fix
>> * sequence, program and database as well as advanced options.
>> * </p>
>> *
>> * <p>
>> * As of version 1.0, only blastall programs are usable. blastpgp and
>> megablast are high-priorities.
>> * </p>
>> *
>> * @author Sylvain Foisy
>> * @version 1.0
>> * @since 1.8
>> *
>> *
>> */
>> public class RemoteQBlastService implements RemotePairwiseAlignementService{
>>
>> // public static final int TEXT = 0;
>> // public static final int XML = 1;
>> // public static final int HTML = 2;
>>
>> private static String baseurl =
>> "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi";
>> private URL aUrl;
>> private URLConnection uConn;
>> private OutputStreamWriter fromQBlast;
>> private BufferedReader rd;
>>
>> private String seq = null;
>> private String prog = null;
>> private String db = null;
>> private String outputFormat = null;
>> private String advanced = null;
>>
>> private String rid;
>> private long step;
>> private boolean done = false;
>> private long start;
>>
>> public RemoteQBlastService() throws BioException {
>> try {
>> aUrl = new URL(baseurl);
>> uConn = setQBlastProperties(aUrl.openConnection());
>>
>> outputFormat = "Text";
>> }
>> /*
>> * Needed but should never be thrown since the URL is static and
>> known to exist
>> */
>> catch (MalformedURLException e) {
>> throw new BioException("It looks like the URL for NCBI QBlast
>> service is bad");
>> }
>> /*
>> * Intercept if the program can't connect to QBlast service
>> */
>> catch (IOException e) {
>> throw new BioException(
>> "Impossible to connect to QBlast service at this time.
>> Check your network connection");
>> }
>> }
>>
>> /**
>> * This method execute the Blast request via the Put command of the
>> CGI-BIN
>> * interface. It gets the estimated time of completion by capturing the
>> * value of the RTOE variable and sets a loop that will check for
>> completion
>> * of analysis at intervals specified by RTOE.
>> *
>> * <p>
>> * It also capture the value for the RID variable, necessary for
>> fetching
>> * the actual results after completion.
>> * </p>
>> *
>> * @throws BioException
>> * if it is not possible to sent the BLAST command
>> */
>> public void executeSearch() throws BioException {
>>
>> if (seq == null || db == null || prog == null) {
>> throw new BioException(
>> "Impossible to execute QBlast request. One or more of
>> seq|db|prog has not been set");
>> }
>> /*
>> * sending the command to execute the Blast analysis
>> */
>> String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&"
>> + db + "&" + "FORMAT_TYPE=HTML";
>>
>> if (advanced != null) {
>> cmd += cmd + "&" + advanced;
>> }
>>
>> try {
>>
>> uConn = setQBlastProperties(aUrl.openConnection());
>>
>> fromQBlast = new OutputStreamWriter(uConn.getOutputStream());
>>
>> fromQBlast.write(cmd);
>> fromQBlast.flush();
>>
>> // Get the response
>> rd = new BufferedReader(new InputStreamReader(uConn
>> .getInputStream()));
>>
>> String line = "";
>>
>> while ((line = rd.readLine()) != null) {
>> if (line.contains("RID")) {
>> String[] arr = line.split("=");
>> rid = arr[1].trim();
>> } else if (line.contains("RTOE")) {
>> String[] arr = line.split("=");
>> step = Long.parseLong(arr[1].trim()) * 1000;
>> start = System.currentTimeMillis() + step;
>> }
>> }
>> } catch (IOException e) {
>> throw new BioException(
>> "Can't submit sequence to BLAST server at this time.");
>> }
>> /*
>> * Getting the info out of the NCBI system
>> */
>> while (!done) {
>> long prez = System.currentTimeMillis();
>> done = isReady(rid, prez);
>> }
>> }
>>
>> /**
>> * <p>This method is used only for the executeBlastSearch method to
>> check for completion of
>> * request using the NCBI specified RTOE variable</p>
>> *
>> * @param id
>> * @param present
>> * @return
>> */
>> private boolean isReady(String id, long present) {
>>
>> boolean ready = false;
>> String check = "CMD=Get&RID=" + id;
>> /*
>> * If present time is less than the start of the search added to
>> step
>> * obtained from NCBI, just do nothing ;-)
>> */
>> if (present < start) {
>> ;
>> }
>> /*
>> * If we are at least step seconds in the future from the actual
>> call of
>> * method executeBlastSearch()
>> */
>> else {
>> try {
>> uConn = setQBlastProperties(aUrl.openConnection());
>>
>> fromQBlast = new
>> OutputStreamWriter(uConn.getOutputStream());
>> fromQBlast.write(check);
>> fromQBlast.flush();
>>
>> rd = new BufferedReader(new InputStreamReader(uConn
>> .getInputStream()));
>>
>> String line = "";
>>
>> while ((line = rd.readLine()) != null) {
>> if (line.contains("READY")) {
>> ready = true;
>> } else if (line.contains("WAITING")) {
>> /*
>> * Else, move start forward in time...
>> */
>> start = present + step;
>> }
>> }
>> } catch (IOException e) {
>> e.printStackTrace();
>> }
>> }
>> return ready;
>> }
>>
>> /**
>> * <p>This method extracts this actual Blast report. The default format
>> is Text but can be changed before with the method
>> * setQBlastOutputFormat.</p>
>> *
>> *
>> * @return
>> * @throws BioException
>> */
>> public InputStream getAlignmentResults() throws BioException {
>> String srid = "CMD=Get&RID=" + rid;
>> srid += "&FORMAT_TYPE=" + outputFormat;
>>
>> if(!this.done){
>> throw new BioException("Unable to get report at this time. Your
>> Blast request has not been processed yet.");
>> }
>>
>> try {
>> uConn = setQBlastProperties(aUrl.openConnection());
>>
>> fromQBlast = new OutputStreamWriter(uConn.getOutputStream());
>> fromQBlast.write(srid);
>> fromQBlast.flush();
>>
>> return uConn.getInputStream();
>>
>> } catch (IOException ioe) {
>> throw new BioException(
>> "It is not possible to fetch Blast report from NCBI at
>> this time");
>> }
>> }
>>
>> /**
>> * <p>
>> * Set the sequence to be blasted using the String that correspond to
>> the
>> * sequence.
>> * </p>
>> *
>> * <p>
>> * Take note that this method is mutually exclusive to setGIToBlast()
>> for a
>> * given Blast request.
>> * </p>
>> *
>> * @param aStr
>> * : a String with the sequence
>> */
>> public void setSequence(String aStr) {
>> this.seq = "QUERY=" + aStr;
>> }
>>
>> /**
>> * Simply return a string with the blasted sequence.
>> *
>> * @return seq : a string with the sequence
>> */
>> public String getSeqToBlast() {
>> return this.seq;
>> }
>>
>> /**
>> * <p>
>> * Set the sequence to be blasted using the NCBI GI value. At this time,
>> * there is no effort made to check the validity of this GI.
>> * </p>
>> *
>> * <p>
>> * Take note that this method is mutually exclusive to setSeqToBlast()
>> for a
>> * given Blast request.
>> * </p>
>> *
>> * @param gi
>> * : an integer value representing a NCBI GI
>> */
>> public void setGIToBlast(String gi) {
>> this.seq = "QUERY=" + gi;
>> }
>>
>> /**
>> * <p>
>> * Simply return a string with the sequence blasted.
>> * </p>
>> *
>> * @return GI : a String with the GI of the blasted sequence
>> */
>> public String getGIToBlast() {
>> return this.seq;
>> }
>>
>> /**
>> * <p>
>> * This method set the program to be used to blast the given
>> sequence/GI. At
>> * this time, there is no attempt at checking the matching of sequence
>> type
>> * to program.
>> * </p>
>> *
>> * @param prog
>> * : a String representing the program specified for this
>> QBlast
>> * request.
>> *
>> */
>> public void setProgram(String prog) {
>> this.prog = "PROGRAM=" + prog;
>> }
>>
>> /**
>> * <p>
>> * Simply returns the program used for the given Blast request.
>> * </p>
>> *
>> * @return prog : a String with the program used for this QBlast
>> request.
>> */
>> public String getProgram() {
>> return this.prog;
>> }
>>
>> /**
>> * <p>
>> * This method set the database to be used to blast the given
>> sequence/GI.
>> * At this time, there is no attempt at checking the matching of
>> sequence
>> * type to database.
>> * </p>
>> *
>> * @param db: a String for the database specified for this QBlast
>> request
>> */
>> public void setDatabase(String db) {
>> this.db = "DATABASE=" + db;
>> }
>>
>> /**
>> * <p>
>> * Simply returns the database used for the given Blast request.
>> * </p>
>> *
>> * @return db: a String with the database used for this QBlast request.
>> */
>> public String getBlastDatabase() {
>> return this.db;
>> }
>>
>> /**
>> * <p>This method let the user specify which format to use for
>> generating the output.</p>
>> *
>> * @param type:an integer taken from the static constant of this class,
>> either be TEXT, XML or HTML
>> */
>> public void setQBlastOutputFormat(int type) {
>>
>> switch (type) {
>> case 0:
>> this.outputFormat = "Text";
>> break;
>> case 1:
>> this.outputFormat = "XML";
>> break;
>> case 2:
>> this.outputFormat = "HTML";
>> break;
>> }
>> }
>>
>> /**
>> * <p>
>> * Simply returns the output format used for the given Blast report.
>> * </p>
>> *
>> * @return outputFormat : a String with the format specified for the
>> QBlast report.
>> */
>> public String getQBlastOutputFormat() {
>> return this.outputFormat;
>> }
>>
>> /**
>> * <p>This method is to be used if a request is to use non-default
>> values at submission. According to QBlast info,
>> * the accepted parameters for PUT requests are:</p>
>> *
>> * <ul>
>> * <li>-G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) /
>> non-affine for megablast</li>
>> * <li>-E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) /
>> non-affine for megablast</li>
>> * <li>-r: integer to reward for match. Default = 1</li>
>> * <li>-q: negative integer for penalty to allow mismatch. Default =
>> -3</li>
>> * <li>-e: expectation value. Default = 10.0</li>
>> * <li>-W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28
>> (megablast)</li>
>> * <li>-y: dropoff for blast extensions in bits, using default if not
>> specified. Default = 20 for blastn, 7 for all others
>> * (except megablast for which it is not applicable).</li>
>> * <li>-X: X dropoff value for gapped alignment, in bits. Default = 30
>> for blastn/megablast, 15 for all others.</li>
>> * <li>-Z: final X dropoff value for gapped alignement, in bits. Default
>> = 50 for blastn, 25 for all others
>> * (except megablast for which it is not applicable)</li>
>> * <li>-P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass.
>> Does not apply to blastn ou megablast.</li>
>> * <li>-A: multiple hits window size. Default = 0 (for single hit
>> algorithm)</li>
>> * <li>-I: number of database sequences to save hits for. Default =
>> 500</li>
>> * <li>-Y: effective length of the search space. Default = 0 (0
>> represents using the whole space)</li>
>> * <li>-z: a real specifying the effective length of the database to
>> use. Default = 0 (0 represents the real size)</li>
>> * <li>-c: an integer representing pseudocount constant for PSI-BLAST.
>> Default = 7</li>
>> * <li>-F: any filtering directive</li>
>> * </ul>
>> *
>> * <p>You have to be aware that at not moment is there any error
>> checking on the use of these parameters by this class.</p>
>> * @param aStr: a String with any number of optional parameters with an
>> associated value.
>> *
>> */
>> public void setAdvancedOptions(String aStr) {
>> this.advanced = "OTHER_ADVANCED=" + aStr;
>> }
>>
>> /**
>> *
>> * Simply return the string given as argument via
>> setBlastAdvancedOptions
>> *
>> * @return advanced: the string with the advanced options
>> */
>> public String getBlastAdvancedOptions() {
>> return this.advanced;
>> }
>>
>> /**
>> *
>> * Simply return the QBlast RID for this specific QBlast request
>> *
>> * @return rid: the string with the RID
>> */
>> public String getBlastRID() {
>> return this.rid;
>> }
>>
>> /**
>> * A simple method to check the availability of the QBlast service
>> *
>> * @throws BioException
>> */
>> public void printRemoteBlastInfo() throws BioException {
>> try {
>> OutputStreamWriter out = new OutputStreamWriter(uConn
>> .getOutputStream());
>>
>> out.write("CMD=Info");
>> out.flush();
>>
>> // Get the response
>> BufferedReader rd = new BufferedReader(new
>> InputStreamReader(uConn
>> .getInputStream()));
>>
>> String line = "";
>>
>> while ((line = rd.readLine()) != null) {
>> System.out.println(line);
>> }
>>
>> out.close();
>> rd.close();
>> } catch (IOException e) {
>> throw new BioException(
>> "Impossible to get info from QBlast service at this
>> time. Check your network connection");
>> }
>> }
>>
>> private URLConnection setQBlastProperties(URLConnection conn) {
>>
>> URLConnection tmp = conn;
>>
>> conn.setDoOutput(true);
>> conn.setUseCaches(false);
>>
>> tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService");
>> tmp.setRequestProperty("Connection", "Keep-Alive");
>> tmp.setRequestProperty("Content-type",
>> "application/x-www-form-urlencoded");
>> tmp.setRequestProperty("Content-length", "200");
>>
>> return tmp;
>> }
>> }
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
More information about the biojava-dev
mailing list