[Biojava-dev] SearchIO module for biojava
Jose Manuel Duarte
jose.duarte at psi.ch
Sun Aug 30 11:51:21 UTC 2015
Hi Paolo
Sorry for the late reply, only now I've been able to have a quick look
at your new SearchIO module and I think it looks fantastic. In my
opinion this is a great and needed feature. I would definitely want to
use it for my own projects as soon as possible. Some comments inline below.
On 22.08.2015 19:26, Paolo Pavan wrote:
>
> Also note that this is a required part of another module I have
> written that can potentially be of community interest: a biojava-run
> module, to bless it similarly to something already listened. This
> latter aims to be a generic module used to run an analysis performed
> by an external program. In my case I needed ncbi blast search. So the
> API was written to declare a database of biojava Sequence objects,
> pass a collection of query sequences and retrieve in output Result
> objects of the SearchIO module.
> I know from previous attempt echoed in the mailing list that the
> orientation of the project was to reimplement the blast algorithm in
> pure Java and I agree that it would be a great idea. But until now
> this project as far as I know is late and I solved the platform
> portability issue by including several binaries for all the platforms
> (well, the major) packaging all together in one jar file relying upon
> this great Java facility.
> Anyway, all this came later.
The biojava-run module sounds interesting too, do you have it anywhere
in github that we can have a look at it?
>
> Just to spend few technical comments on the SearchIO module:
> - included in core module since it defines a new base data structure
> - include a dependency from biojava-alignment. This is not compulsory,
> it is there since the alignment data structure is included in that
> package. In my opinion, moving this important data structure in core
> will solve this and avoid similar problems in the future. This is also
> the reason why I choose to add those new implemented Hits/Hsp etc
> directly in core, after all search is one of the most important tasks
> in bioinformatics.
I agree it only makes sense to have it in core right now. However the
dependency on biojava-alignment is not ideal. In principle the best
solution would be to move the alignment data structure to core, I agree
with you. For the time being another possibility would be to put all the
new SearchIO stuff in biojava-alignment, but that doesn't really improve
the structure of things. I'd vote for moving the alignment data
structure as a sounder and better long-term solution.
This would surely require a new minor version, so that all the new stuff
would be released as biojava 4.2.
> - BlastXML parser is implemented in the BlastXMLQuery class. Maybe
> this name it is not so meaningful, it comes from the original class
> that is still there in biojava even if it seems not so much utilised,
> that I initially started to improve trying to remain tighter to the
> original project. From here also the use of the class XMLHelper and
> some deprecated tags I added. From the old thread I understood that
> there was not any "elective choice" of biojava for XML parsing, but
> anyway the job was already done with the XMLHelper module and so this
> class came to new life.
> - it was designed to be easy to extend: add support for a new file
> format a developer must just write a single class that implements the
> ResultFactory interface (I have implemented also a blast tabular
> parser to show it). The Api for biojava user does not change, it is just:
> SearchIO reader = new SearchIO(new File("BlastReport.blastxml"),
> blastResultFactory);
>
> - it is possible to auto recognise file formats relying upon standard
> file extension. Just try a different constructor:
> SearchIO reader = new SearchIO(new File("BlastReport.blastxml"));
>
How about the different blast XML formats? Would it work with all the
latest ones? Blast+ 2.2.31 has introduced some modifications to the
format (see http://www.ncbi.nlm.nih.gov/books/NBK131777/)
> If you agree that this feature would be interesting for the project I
> can send a pull request for the SearchIO part and then push on my
> GitHub also the run module.
>
I think it is a very nice addition, in my opinion you should go ahead
with the pull request which will make it easier for everyone to check it
out and review it for a while.
Jose
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150830/e032085c/attachment.html>
More information about the biojava-dev
mailing list