[Biojava-l] Sequence Alignment

David Waring dwaring@u.washington.edu
Wed, 13 Jun 2001 14:40:07 -0700


I plan to build (or find) a tool that takes multiple, large DNA sequences
(BAC-sized), finds the overlaps and presents them in a GUI interface that
allows users to view the overlapping sequences in a pictoral mode and to
zoom in to actual sequence. This is the first stage of a project that will
get more complicated later. Looking at the biojava API it seems that the
foundation is there, and this is where I would start. Before procededing I
would like to find out if anyone in the biojava community has already done
something similar or is working on the same thing.

The functional classes for most parts exist. With regard to finding
alignments, it seems that there are classes for SearchHit, SearchResult,and
SubHit but no actual searcher. Does a functional SeqSimilaritySearcher
exist, or is anyone working on one?

I have already written a Cross_match searcher that uses as its engine a
system call to Phil Green's cross_match program and parses it to generate a
result object similar to the biojava SeqSimilaritySearchResult. I am
planning to re-write it to fit the biojava interfaces which should not take
long. I have built it in such a way that the system based engine could be
easily replaced with a pure Java based engine. That is to say, once such an
engine was written it could be easily plugged in.

Biojava seems to have plenty of useful classes to work with but, I wonder if
applications using them are available. Particularly a GUI application that
displays Alignments. Or just displays a single sequence that could give me a
place to start.

Finally, I think that I will have to come up with a new class encapsulating
an alignment of multiple very long sequences that overlap on their ends.
Such as shown below.

-------------         -----------------
          ------------------	    --------
                                     ---------------

The defined Alignment Interface seems more geared toward alignments of
multiple homologous sequences.

------------xx------
---yy----------x----
z---z-----z---------

Am I correct in this assumption or am I missing something?

Thanks,

David


|||||||||||||||||||||||||||||||||||||||||||||||||||||||
|   David Waring
|   Systems Programmer
|   University of Washington Genome Center
|   dwaring@u.washington.edu
|   (206) 221-6902
|||||||||||||||||||||||||||||||||||||||||||||||||||||||