[Biojava-dev] Biojava-Mapreduce paradigm...

Mark Fortner phidias51 at gmail.com
Thu Mar 28 19:12:19 UTC 2013


Hi Prasanna,



> 1) *Were you planning on submitting this as a GSOC project?*
> At present I would like to contribute for Biojava community and be
> constructive part of Bioinformatics community.
>

I'm guessing by your response that you haven't looked into GSOC yet.  It
stands for the Google Summer of Code and it's a way for Google to sponsor
development in open source projects.  There's a wiki page
here<http://biojava.org/wiki/Google_Summer_of_Code_2013> that
has some details about BioJava's participation in the program, and
information about the program in general,
here<https://developers.google.com/open-source/soc/>.
 The reason I asked about this, is there's a deadline coming up for
submissions.  Andreas has the details.



>  2) *Were you planning on using GATE for items 1 & 2?*
> For part-of-speech tagging, chunking I want to use GATE, OpenNLP. Again it
> rule based and ML based. So I would like to improve the rules with
> state-of-algorithms and also my own implementation which can be
> distributable. Also can we use LGPL license ML libraries for this inorder
> to stop re-inventing the wheel again.
>

GATE uses LGPL v3, OpenNLP uses APL 2.0, BioJava uses LGPL v2.1.  Andreas
may be able to confirm if these are compatible.  This
link<http://stackoverflow.com/questions/1978511/is-there-a-chart-of-which-oss-license-is-compatible-with-which>may
help.


>
> 3) *Would there be ways to plugin different ontologies (such as the NCI
> Metathesaurus)?*
> I would like to plugin ontologies like UML in the current scenario. I want
> to explore with others thesaurus mentioned in the literatures.
>


Not really sure I understood this part.  You want to use UML (Unified
Modeling Language i.e. the stuff used for class diagrams, sequence
diagrams, etc) as an ontology?



>  What types of network formats were you planning on supporting?
> I am interested in supporting for Cytoscape, SBML, GraphML formats.
>
> Are you planning on supporting both visual representations of networks as
> well as network graph data?
> Since visualizations plays a vital role in biological network analysis, I
> am interested in exploring large scale real time visualization using tools
> like d3.js using mapreduce paradigm. I would like to know if this can be
> helpful. Also would like to know what others visualization tools can be
> useful for me to include here.
>  I am still in the phase of developing the code now completely and test
> it. I am very much concerned about restrictions in using third party
> libraries. If I can know there are no conflicts in using them, I would just
> go with the nod from the community. My intentions is such that, there
> should be minimal efforts for developer to reuse the code, so that they
> could maximize the benefits of cloud computing using Apache Hadoop, Amazon
> EMR. They can also carry out large scale literature mining, network
> analysis worrying little about cloud infrastucture and architecture to
> attain their goal.
>
>
Looking forward to seeing what you come up with.

Regards,

Mark



More information about the biojava-dev mailing list