[Biojava-l] sequence similarity searches - some code

Simon Brocklehurst simon.brocklehurst@CambridgeAntibody.com
Fri, 30 Jun 2000 19:04:17 +0100

Gerald Loeffler wrote:

> regarding the example given below (which looks _very_ encouraging!): is
> it an accident that none of the values (sequences) of your
> <biojava:QuerySequence>-elements contains a gap?? I mean, do these
> elements contain the sequences exactly in the way as the BLAST-output
> contains them (including gaps) or do you process the sequences somehow??

Gaps are as they would appear in the Blast output (dashes, white-space etc.).  In fact, there should actually be a preserved white-space gap in the match consensus line in the example (where the XXXXXX bit is in the
QuerySequence element. But as I suspected might happen, some wrapping occurred to **** this up!

What does happen with processing Blast-like alignments is that you get a single BlastLikeAlignment per HSP.  This is, I think, what is wanted in terms of modelling the data i.e. as opposed to what Blast gives you which
is the alignment split over several "blocks" for formatting reasons.

In the pre-release code I sent you, there is a pre-compiled app Blast2XML that will convert Blast output to this XML format.  This might be easier to get going with, that delving into the workings of the code (not as
elegant as it needs to be, yet).

Simon M. Brocklehurst, Ph.D.
Head of Bioinformatics & Advanced IS
Cambridge Antibody Technology
The Science Park, Melbourn, Cambridgeshire, UK