[Biojava-l] documentation was blast parsing and empty hits

Doug Rusch drusch@tcag.org
Thu, 3 Oct 2002 15:40:11 -0400


Good point. I have not been checking the CVS version, I was working only with the latest release candidate of BioJava. My assumption that nothing has been changed since 2000 is based solely on the information in the Javadoc at the BioJava.org site. I was under the impression that changes to the code would be reflected and date in the Javadoc.

I have no problem contributing to the BioJava effort - but I am finding the learning curve to be fairly steep. Much of the difficulty arises because the documentation has not been consistently maintained. I find I have to go to the source more often than not to find out what is really going on. When I do that, I find that many of the demos and test programs are out of date with the code base. On top of that, the demos are really quite simple and I havent seen any source for powerful apps that integrate a variety of BioJava packages.

Is there a list of 'on going' projects associated with BioJava? Is XFF being worked on anymore? What is the ssaha demo? It seems to be undocumented. What is happening with AGAVE? Is this even actively in use anymore? Should it be archived?

I guess I have meandered off of the topic a bit.... Last relevant question - How do I go about getting myself CVS access?

Thanks!
Doug Rusch
TCAG.org

-----Original Message-----
From:	Simon Brocklehurst [mailto:simon.brocklehurst@CambridgeAntibody.com]
Sent:	Thu 10/3/2002 3:11 PM
To:	Doug Rusch
Cc:	biojava-l@biojava.org
Subject:	Re: [Biojava-l] blast parsing and empty hits
Doug Rusch wrote:
> 
> Actually I have made changes that fix both the no summary and "No hits found" >problems though I have not done extensive testing and I do not know if this would >work for wu-blast yet. Its more of a hack though than a nice solution. It would be >nice to use the regex in 1.4 to put together a nice clear parser and I may do that in >the near future. I am still surprised that this is even a problem. Is the community >that small that obvious problems like this have not been fixed much earlier?
> 

Hi Doug,

I think you're assuming that what I'm sure is a genuine problem for you,
is a problem in a broad variety of use cases.  I suspect that's an
incorrect assumption. In general, people don't get terribly excited at
the prospect of parsing search reports that don't have any hits.
Furthermore, for many use cases, work arounds to deal with missing SAX
events and/or empty documents in the special case of empty blast reports
will often either be trivial or not be required at all.

This issue of the biojava blast SAX driver producing events equivalent
to mal-formed XML in the case of empty blast reports has been known for
a while (I think problems with empty blast reports were first posted to
the biojava list in Dec 2001).  Clearly having this as a known bug, and
not fixing it isn't ideal - but the reason why no-one has fixed this yet
is that it is it simply hasn't caused anyone enough grief yet.

Relating to your previous comments about frequency of code updates. I'm
not sure where you got the idea that the Blast parsing code hadn't been
updated in almost two years.  
If you're interested in update histories of classes in the SAX parsing
biojava package, you can see them at the URL below:

http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/biojava-live/src/org/biojava/bio/program/sax/?cvsroot=biojava

The last update seems to be 4 weeks ago when Keith James from the Sanger
center added support for NCBI Blast versions 2.2.2 and 2.2.3.

Despite your initial problems, I do hope you'll give the biojava parser
a chance - it might not actually be as bad as you think! Your bug fixes
for NCBI Blast parsing for empty reports would be really appreciated by
the community, I'm sure. If you get yourself cvs access, you could
easily apply them.

Simon
--
Simon M. Brocklehurst, Ph.D.
Director of Informatics & Robotics
Cambridge Antibody Technology
The Science Park, Melbourn, Cambridgeshire, UK
http://www.CambridgeAntibody.com/
mailto:simon.brocklehurst@CambridgeAntibody.com