[Biojava-l] sequence similarity searches - some code

Simon Brocklehurst simon.brocklehurst@CambridgeAntibody.com
Fri, 30 Jun 2000 15:16:38 +0100


Gerald Loeffler wrote:

>
> should be possible - but i haven't looked at the XML-SAX-BLAST-parser
> code yet. I'll try to do this next week. Of course i'd be happy if i
> could use existing code to actually get at the alignments...

All the information content of Blast alignments is parsed in detail by the SAXParser, so it should be
pretty easy. In the context of alignments, the SAX events generated by the parser would be as if from an
XML fragment something like (hopefully wrapping on my e-mail software won't screw this up):

      <biojava:Hit sequenceLength="214">
        <biojava:HitId id="P07155;P27109;P27428"
                       metaData="none">
        </biojava:HitId>
        <biojava:HitDescription>HIGH MOBILITY GROUP PROTEIN HMG1 (HMG-1) (AMPHOTERIN)
(HEPARIN-BINDING</biojava:HitDescription>
        <biojava:HSPCollection>
          <biojava:HSP>
            <biojava:HSPSummary numberOfPositives="154"
                                expectValue="2e-89"
                                alignmentSize="168"
                                percentagePositives="91"
                                percentageIdentity="91"
                                numberOfIdentities="154"
                                score="326">
            </biojava:HSPSummary>
            <biojava:BlastLikeAlignment>
              <biojava:QuerySequence startPosition="1"
 stopPosition="168">GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGETKKKFKDPNAPKRPPSAFFLFCSEYRPKIKGEHPGLSIGDVAKKLGEMWNNTAADDKQPXXXXXXXXXXXXXXDIAAYRAKGKPD</biojava:QuerySequence>

              <biojava:MatchConsensus
xml:space="preserve">GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGETKKKFKDPNAPKRPPSAFFLFCSEYRPKIKGEHPGLSIGDVAKKLGEMWNNTAADDKQP
DIAAYRAKGKPD</biojava:MatchConsensus>
              <biojava:HitSequence startPosition="1"
 stopPosition="168">GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGETKKKFKDPNAPKRPPSAFFLFCSEYRPKIKGEHPGLSIGDVAKKLGEMWNNTAADDKQPYEKKAAKLKEKYEKDIAAYRAKGKPD</biojava:HitSequence>

            </biojava:BlastLikeAlignment>
          </biojava:HSP>
        </biojava:HSPCollection>

etc...

--
Simon M. Brocklehurst, Ph.D.
Head of Bioinformatics & Advanced IS
Cambridge Antibody Technology
The Science Park, Melbourn, Cambridgeshire, UK
http://www.CambridgeAntibody.com/
mailto:simon.brocklehurst@CambridgeAntibody.com