[Biojava-l] parse blast results

Howard Ungar howard_ungar@yahoo.com
Thu, 25 Oct 2001 14:03:35 -0700 (PDT)


Jennifer,
It sounds like my Alignment object needs to be expanded to support
multiple hits.  If you can send me an example Blast report which
contains multiple hits within the same sequence I can modify the
BlastHandler and BlastReport to handle this.
Howard
--- Jennifer Pan <jpan@incellico.com> wrote:
> Hello Howard and Hello all, 
> 
> I was trying to parse a ncbi-blast results file,
> using the BlastReport Howard provided. 
> For a hit sequence that has mutiple MSPs, I could only retrieve one 
> start and one end position of one MSP within this hit sequence.  
> --------------------
> for example, I would like to see: 
> Query NM_00000
> >hit1 [1, 300]
>  hit1 [400, 600]
>  hit1 [700, 1000]
> >hit2 [1, 700]
> >hit 3 [200, 500]
> 
> and I've gotten 
> Query NM_00000
> >hit1 [400, 600]
> >hit2 [1, 700]
> >hit 3 [200, 500]
> -----------------------------------------
> 
> Any hint and help here?
> 
> many thanks
> 
> -Jennifer 
> 
> -----Original Message-----
> From: Howard Ungar [mailto:howard_ungar@yahoo.com]
> Sent: Friday, October 19, 2001 11:53 AM
> To: richard cai; biojava-l@biojava.org
> Subject: Re: [Biojava-l] parse blast results
> 
> 
> --- richard cai <cairi1@yahoo.com> wrote:
> > Thanks, Howard.  This is exactly what I need. 
> > 
> > Richard Cai
> > 
> Richard,
> There are four files in the attached zip file:
> BlastSAXParser.java - contains changes to the parser to explicitly
> create a "QueryName" attribute.
> BlastHandler.java - uses the parser to read the QueryName attribute
> BlastReport.java - prints the results.
> Alignment.java - support object to pass data between the handler and
> the report.
> 
> Let me know if you need anything else to get this working.
> 
> --- Keith James <kdj@sanger.ac.uk> wrote:
> > This stems from the fact that the DTD has undergone recent
> > modifications since Simon's group first commited their code. They
> > refined the DTD, while incorporating a couple of suggested changes,
> > one of which was adding QueryId.
> > 
> > This is why the DTD reads, for example:
> > 
> > <!ELEMENT biojava:Header (biojava:RawOutput, QueryId?, DatabaseId?
> )>
> > 
> > so that the new elements (here QueryId and DatabaseId) are not
> (yet)
> > mandatory. This avoids breaking existing code, but leaves
> extraction
> > of this information to the user (it's embedded in the
> > biojava:RawOutput element). See Javadoc in
> > org.biojava.bio.program.ssbind.BlastDBQueryHandler.
> 
> Keith,
> I think I tried to get the QueryId from the RawOutput, but the parser
> was not starting to include data until later in the Blast Report
> file. 
> I don't think my changes will fit into the current DTD as you
> described
> it.  I could modify what I did to create these attributes in the
> rawOutput element if you would like me to.  Or would it be better to
> consider modifying the DTD (and risk breaking existing code).
> Either way, please take a look at what I did and provide some
> feedback
> about how you would like me to proceed.
> 
> Howard
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Make a great connection at Yahoo! Personals.
> http://personals.yahoo.com


__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com