[Biojava-l] parse blast results

Jennifer Pan jpan@incellico.com
Thu, 25 Oct 2001 16:27:26 -0400


Hello Howard and Hello all, 

I was trying to parse a ncbi-blast results file,
using the BlastReport Howard provided. 
For a hit sequence that has mutiple MSPs, I could only retrieve one 
start and one end position of one MSP within this hit sequence.  
--------------------
for example, I would like to see: 
Query NM_00000
>hit1 [1, 300]
 hit1 [400, 600]
 hit1 [700, 1000]
>hit2 [1, 700]
>hit 3 [200, 500]

and I've gotten 
Query NM_00000
>hit1 [400, 600]
>hit2 [1, 700]
>hit 3 [200, 500]
-----------------------------------------

Any hint and help here?

many thanks

-Jennifer 

-----Original Message-----
From: Howard Ungar [mailto:howard_ungar@yahoo.com]
Sent: Friday, October 19, 2001 11:53 AM
To: richard cai; biojava-l@biojava.org
Subject: Re: [Biojava-l] parse blast results


--- richard cai <cairi1@yahoo.com> wrote:
> Thanks, Howard.  This is exactly what I need. 
> 
> Richard Cai
> 
Richard,
There are four files in the attached zip file:
BlastSAXParser.java - contains changes to the parser to explicitly
create a "QueryName" attribute.
BlastHandler.java - uses the parser to read the QueryName attribute
BlastReport.java - prints the results.
Alignment.java - support object to pass data between the handler and
the report.

Let me know if you need anything else to get this working.

--- Keith James <kdj@sanger.ac.uk> wrote:
> This stems from the fact that the DTD has undergone recent
> modifications since Simon's group first commited their code. They
> refined the DTD, while incorporating a couple of suggested changes,
> one of which was adding QueryId.
> 
> This is why the DTD reads, for example:
> 
> <!ELEMENT biojava:Header (biojava:RawOutput, QueryId?, DatabaseId? )>
> 
> so that the new elements (here QueryId and DatabaseId) are not (yet)
> mandatory. This avoids breaking existing code, but leaves extraction
> of this information to the user (it's embedded in the
> biojava:RawOutput element). See Javadoc in
> org.biojava.bio.program.ssbind.BlastDBQueryHandler.

Keith,
I think I tried to get the QueryId from the RawOutput, but the parser
was not starting to include data until later in the Blast Report file. 
I don't think my changes will fit into the current DTD as you described
it.  I could modify what I did to create these attributes in the
rawOutput element if you would like me to.  Or would it be better to
consider modifying the DTD (and risk breaking existing code).
Either way, please take a look at what I did and provide some feedback
about how you would like me to proceed.

Howard



__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com