[Biojava-l] sequence similarity searches - some code

Gerald Loeffler Gerald.Loeffler@vienna.at
Fri, 30 Jun 2000 19:27:14 +0200


regarding the example given below (which looks _very_ encouraging!): is
it an accident that none of the values (sequences) of your
<biojava:QuerySequence>-elements contains a gap?? I mean, do these
elements contain the sequences exactly in the way as the BLAST-output
contains them (including gaps) or do you process the sequences somehow??

	cheers,
	gerald

Simon Brocklehurst wrote:
> 
> Gerald Loeffler wrote:
> 
> >
> > should be possible - but i haven't looked at the XML-SAX-BLAST-parser
> > code yet. I'll try to do this next week. Of course i'd be happy if i
> > could use existing code to actually get at the alignments...
> 
> All the information content of Blast alignments is parsed in detail by the SAXParser, so it should be
> pretty easy. In the context of alignments, the SAX events generated by the parser would be as if from an
> XML fragment something like (hopefully wrapping on my e-mail software won't screw this up):
> 
>       <biojava:Hit sequenceLength="214">
>         <biojava:HitId id="P07155;P27109;P27428"
>                        metaData="none">
>         </biojava:HitId>
>         <biojava:HitDescription>HIGH MOBILITY GROUP PROTEIN HMG1 (HMG-1) (AMPHOTERIN)
> (HEPARIN-BINDING</biojava:HitDescription>
>         <biojava:HSPCollection>
>           <biojava:HSP>
>             <biojava:HSPSummary numberOfPositives="154"
>                                 expectValue="2e-89"
>                                 alignmentSize="168"
>                                 percentagePositives="91"
>                                 percentageIdentity="91"
>                                 numberOfIdentities="154"
>                                 score="326">
>             </biojava:HSPSummary>
>             <biojava:BlastLikeAlignment>
>               <biojava:QuerySequence startPosition="1"
>  stopPosition="168">GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGETKKKFKDPNAPKRPPSAFFLFCSEYRPKIKGEHPGLSIGDVAKKLGEMWNNTAADDKQPXXXXXXXXXXXXXXDIAAYRAKGKPD</biojava:QuerySequence>
> 
>               <biojava:MatchConsensus
> xml:space="preserve">GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGETKKKFKDPNAPKRPPSAFFLFCSEYRPKIKGEHPGLSIGDVAKKLGEMWNNTAADDKQP
> DIAAYRAKGKPD</biojava:MatchConsensus>
>               <biojava:HitSequence startPosition="1"
>  stopPosition="168">GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGETKKKFKDPNAPKRPPSAFFLFCSEYRPKIKGEHPGLSIGDVAKKLGEMWNNTAADDKQPYEKKAAKLKEKYEKDIAAYRAKGKPD</biojava:HitSequence>
> 
>             </biojava:BlastLikeAlignment>
>           </biojava:HSP>
>         </biojava:HSPCollection>
> 
> etc...
> 
> --
> Simon M. Brocklehurst, Ph.D.
> Head of Bioinformatics & Advanced IS
> Cambridge Antibody Technology
> The Science Park, Melbourn, Cambridgeshire, UK
> http://www.CambridgeAntibody.com/
> mailto:simon.brocklehurst@CambridgeAntibody.com

-- 
   Gerald.Loeffler@vienna.at _________________ Software Architect
   http://www.imp.univie.ac.at ____ http://www.daemonstration.com
   OOA&D, Java, J2EE, JSP, Servlets, JavaBeans, ODBMS, RDBMS, XML