[Biojava-l] Blast

Thomas Down td2@sanger.ac.uk
Wed, 7 Nov 2001 11:39:58 +0000


On Tue, Nov 06, 2001 at 06:14:17PM -0800, David Waring wrote:
> I have set up a system that calls a local version of blastall. With it you
> can blast a SequenceDB against your local blast database and get back a list
> of SequenceSimilarityResults. This also provides access to the sequences in
> the blast database via a call to fastacmd. So you have access to the query
> and subject sequences from the SequenceSimliaritySearchResult. As Thomas
> mentions there must be enviroment specific information available for this to
> work. This is why I have not tried to put it into biojava. I also have
> similar classes for cross_match and RepeatMasker.

Great -- this sound like the sort of thing we want :-).

I note that when Gerald Loeffler first did a search API
for BioJava, he included a SeqSimilaritySearcher interface.
However, this part of the system hasn't yet been implemented.
Perhaps this is the way to go?

An alternative -- which sound like it might be closer to what
you've got, and sounds quite logical, would be to have:

  public interface SearchableSequenceDB extends SequenceDB {
      public SeqSimilaritySearchResult search(SymbolList sl, Map params);
      public SeqSimilaritySearchResult search(SequenceDB db, Map params);

      public Annotation getAnnotation();
  }

Or something to that effect...

> [Configuration files]
>
> Of course there are probably better ways to do this. And perhaps someone
> would have ideas about how to abstract it a bit so we could handle cross
> network activity as well as local system calls. If we can come up with a
> design, I would be happy to move my things over to biojava.

This sounds like the kind of thing we need.

But since you talk about abstracting them (I especially
like the idea of being able to drop in any big compute
farms/whatever which might be available at a given site),
how about using JNDI (Java Naming and Directory Interface):

  Context dir = new InitialContext();
  SearchableSequenceDB searcher = (SearchableSequenceDB)
          dir.lookup("bio/databases/embl");
  // Run your searches.

[Please, please, don't lets get into arguments about the exact
syntax/semantics of the name strings themselves right now...
Plenty of time for that later :-)]

We could write a really simple JNDI service provider which
seeded a context with searchers based on the kind of config
file you provide.  But for big sites, it also allows all
sorts of other tricks -- for instance, keeping metadata about
all the available databases on an LDAP server.

What do you think?

    Thomas.