[Biojava-l] sequence similarity searches - some code
Gerald Loeffler
Gerald.Loeffler@vienna.at
Fri, 30 Jun 2000 19:27:14 +0200
regarding the example given below (which looks _very_ encouraging!): is
it an accident that none of the values (sequences) of your
<biojava:QuerySequence>-elements contains a gap?? I mean, do these
elements contain the sequences exactly in the way as the BLAST-output
contains them (including gaps) or do you process the sequences somehow??
cheers,
gerald
Simon Brocklehurst wrote:
>
> Gerald Loeffler wrote:
>
> >
> > should be possible - but i haven't looked at the XML-SAX-BLAST-parser
> > code yet. I'll try to do this next week. Of course i'd be happy if i
> > could use existing code to actually get at the alignments...
>
> All the information content of Blast alignments is parsed in detail by the SAXParser, so it should be
> pretty easy. In the context of alignments, the SAX events generated by the parser would be as if from an
> XML fragment something like (hopefully wrapping on my e-mail software won't screw this up):
>
> <biojava:Hit sequenceLength="214">
> <biojava:HitId id="P07155;P27109;P27428"
> metaData="none">
> </biojava:HitId>
> <biojava:HitDescription>HIGH MOBILITY GROUP PROTEIN HMG1 (HMG-1) (AMPHOTERIN)
> (HEPARIN-BINDING</biojava:HitDescription>
> <biojava:HSPCollection>
> <biojava:HSP>
> <biojava:HSPSummary numberOfPositives="154"
> expectValue="2e-89"
> alignmentSize="168"
> percentagePositives="91"
> percentageIdentity="91"
> numberOfIdentities="154"
> score="326">
> </biojava:HSPSummary>
> <biojava:BlastLikeAlignment>
> <biojava:QuerySequence startPosition="1"
> stopPosition="168">GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGETKKKFKDPNAPKRPPSAFFLFCSEYRPKIKGEHPGLSIGDVAKKLGEMWNNTAADDKQPXXXXXXXXXXXXXXDIAAYRAKGKPD</biojava:QuerySequence>
>
> <biojava:MatchConsensus
> xml:space="preserve">GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGETKKKFKDPNAPKRPPSAFFLFCSEYRPKIKGEHPGLSIGDVAKKLGEMWNNTAADDKQP
> DIAAYRAKGKPD</biojava:MatchConsensus>
> <biojava:HitSequence startPosition="1"
> stopPosition="168">GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGETKKKFKDPNAPKRPPSAFFLFCSEYRPKIKGEHPGLSIGDVAKKLGEMWNNTAADDKQPYEKKAAKLKEKYEKDIAAYRAKGKPD</biojava:HitSequence>
>
> </biojava:BlastLikeAlignment>
> </biojava:HSP>
> </biojava:HSPCollection>
>
> etc...
>
> --
> Simon M. Brocklehurst, Ph.D.
> Head of Bioinformatics & Advanced IS
> Cambridge Antibody Technology
> The Science Park, Melbourn, Cambridgeshire, UK
> http://www.CambridgeAntibody.com/
> mailto:simon.brocklehurst@CambridgeAntibody.com
--
Gerald.Loeffler@vienna.at _________________ Software Architect
http://www.imp.univie.ac.at ____ http://www.daemonstration.com
OOA&D, Java, J2EE, JSP, Servlets, JavaBeans, ODBMS, RDBMS, XML