[Biojava-dev] Biojava Interface to BLAST web/remote services

mark.schreiber at novartis.com mark.schreiber at novartis.com
Fri Jun 5 03:47:42 UTC 2009


Hi -

Just some observations from past experience:

You could write an interface called something like RemoteSimilaritySearch 
which contains minimal information that all SOAP/ CGI-BIN sequence search 
services might be expected to require and return although it's pretty hard 
to anticipate what that might be.  Possibly more useful would be 
RemoteBLAST, RemoteFASTA etc interfaces that could extend 
RemoteSimilaritySearch.  Concrete implementations of, for example, the 
RemoteBLAST could include the SOAP service at EBI and the CGI-BIN service 
at NCBI.

The RemoteBLAST and RemoteFASTA should have the possibility to modify any 
parameter of BLAST/ FASTA as appropriate and should have the option to 
throw an UnsupportedOperationException as not all interfaces will allow 
the setting of all parameters.

In general trying to make an implementation that will talk to an HTML 
interface to BLAST is asking for trouble (as they can change very easily). 
It is best to code to a SOAP/ REST service or, if you have to, a CGI-BIN 
interface. You should only make an implementation that talks to a web form 
as a last resort and even then if probably shouldn't go into BioJava 
(maybe post it on the cookbook).

The most stable version of the BLAST output is the XML. Parsing the 
text/html output has been a constant source of headaches for BioJava. 
Implementations of remote blast services should try and parse that format 
if it is available (SOAP and REST will be XML anyway although not always 
BLAST.XML).

All the BLAST services I have used will return a job number not a result. 
The client will then need to poll that job number until it is complete and 
then get the results for the job. The client will need to handle this 
sensibly without timing out (unless the user wants to allow a time out). 
Sensible threading will be required.

Converting results back into SeqSimilaritySearchResult makes sense 
although please note that Andreas has suggested renaming the packages for 
these (which I support as the old package name is not informative).

Under a mavenized system the whole Similarity search system could go into 
it's own module.

Just my $0.02

- Mark

biojava-dev-bounces at lists.open-bio.org wrote on 06/04/2009 10:16:07 PM:

> Sylvain
> 
> I think the way you submit the query/paramaters of the seearch or parse
> a BLAST file would be different and we would not worry about the SAX
> API/File dependency of parsing a file. We do need a Class that would
> contain the search parameters and this should as an object follow the
> same inputs available via the union of HTML interfaces for the supported
> BLAST engines. Some search engines will have more inputs or specificity
> over others so that will require some analysis. This search parameter
> class should be independent of a particular BLAST web service engine
> allowing a user to submit the same search to multiple services with
> minimum overhead. 
> 
> But once you get the results then having the ability to use the same
> general iteration of results/hits will allow those who have invested in
> the BLAST file parsing API to easily insert the new web services
> approach.
> 
> >From the biojava cookbook SeqSimilaritySearchHit is the class that
> contains the results and should be the class used to contain the results
> from the web service query. In the web service approach you should be
> able to get the collection of SeqSimilaritySearchResult and
> SeqSimilaritySearchHit from each of the supported BLAST web services.
> The assumption is that SeqSimilaritySearchResult and
> SeqSimilaritySearchHit have been properly designed to represent BLAST
> data. 
> 
> Scooter
> 
>       //output some blast details
>       for (Iterator i = results.iterator(); i.hasNext(); ) {
>         SeqSimilaritySearchResult result =
>             (SeqSimilaritySearchResult)i.next();
> 
>         Annotation anno = result.getAnnotation();
> 
>         for (Iterator j = anno.keys().iterator(); j.hasNext(); ) {
>           Object key = j.next();
>           Object property = anno.getProperty(key);
>           System.out.println(key+" : "+property);
>         }
>         System.out.println("Hits: ");
> 
>         //list the hits
>         for (Iterator k = result.getHits().iterator(); k.hasNext(); ) {
>           SeqSimilaritySearchHit hit =
>               (SeqSimilaritySearchHit)k.next();
>           System.out.print("\tmatch: "+hit.getSubjectID());
>           System.out.println("\te score: "+hit.getEValue());
>         }
> 
>         System.out.println("\n");
>       }
> 
>     }
> 
> -----Original Message-----
> From: Sylvain Foisy [mailto:sylvain.foisy at diploide.net] 
> Sent: Thursday, June 04, 2009 9:57 AM
> To: Scooter Willis; Andreas Prlic
> Cc: biojava-dev at lists.open-bio.org
> Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote
> services
> 
> Hi Scooter,
> 
> That is one way of doing it ;-) I was thinking of creating an object
> that
> the user would either:
> 
> - Feed into the BJ Blast parser
> - Do something else entirely.
> 
> Best regards
> 
> Sylvain
> 
> On 04/06/09 09:28, "[NAME]" <[ADDRESS]> wrote:
> 
> > Sylvain
> > 
> > Given that BioJava already has a BLAST file parser that returns
> results
> > the goal should be to have a remote/web call return the same set of
> > classes as if you had parsed the file locally. That is going to be my
> > approach. Once we get a couple services working we can integrate into
> a
> > common factory/interface approach.
> > 
> > Thanks
> > 
> > Scooter
> > 
> > 
> > -----Original Message-----
> > From: biojava-dev-bounces at lists.open-bio.org
> > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain
> > Foisy
> > Sent: Thursday, June 04, 2009 9:07 AM
> > To: Scooter Willis; Andreas Prlic
> > Cc: biojava-dev at lists.open-bio.org
> > Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote
> > services
> > 
> > Hi Scooter,
> > 
> > On 04/06/09 07:38, "[NAME]" <[ADDRESS]> wrote:
> > 
> >> Looks like the rolled their own URL interface and did not do a WSDL.
> > Not a big
> >> deal but does appear they have some sort of submit get a "ticket" and
> > then
> >> check back with the "ticket" identifier for the results. The BioJava
> > API would
> >> hide the transport layer so you could use a custom URL approach or
> web
> >> services. 
> > 
> > That is basically the way it works. I am working on a
> RemoteBlastWrapper
> > class that would do exactly what you are writing.
> > 
> > 
> >> Not sure how the other WSDL interfaces handle long running tasks but
> I
> > assume
> >> the Web Services can handle a call that takes say 5 minutes to
> respond
> > without
> >> timing out. Some process would need to distinguish between a long
> > running
> >> server task and a server that is no longer responding.
> > 
> > We'll have to try ;-)
> > 
> > Best regards
> > 
> > Sylvain
> > 
> > 
> > ===================================================================
> > 
> >  Sylvain Foisy, Ph. D.
> >  Consultant Bio-informatique / Bioinformatics
> >  Diploide.net - TI pour la vie / IT for Life
> > 
> >  Courriel: sylvain.foisy at diploide.net
> >  Web: http://www.diploide.net
> >  Tel: (514) 893-4363
> > ===================================================================
> > 
> > 
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.



More information about the biojava-dev mailing list