[GSoC] generative biolearn project, further steps

Sarthak Sehgal sarthaksehgal00 at gmail.com
Sun Mar 22 07:36:15 UTC 2020


Hi Anton,

The admins had a discussion regarding this and after reviewing the proposal
briefly, we are a little concerned that the project is a new R&D project
and not aligned with the best interests of GSoC. Particularly, there seems
to be less time contributed to code and more time to research. Roughly, we
expect the following guidelines to be followed as per our understanding of
the program:
1. We must be able to see at least 60% of the time on writing FOSS software
(code, docs, tests) in the proposal
2. The applicant must be able to dedicate at least 30 hours per week

If you feel that the project aligns with the goals of GSoC and the
guidelines above, please go ahead with the proposal. The org admins review
all the proposals once again when the application period is over for
slot allottment.

Also, I have updated the tags. As BioJS is not participating under OBF this
year, I have removed the "biojs" tag and added some suitable tags for each
project.

Best,
Sarthak

On Fri, Mar 20, 2020 at 5:12 AM Anton Kulaga <antonkulaga at gmail.com> wrote:

> She is still working on her proposal draft, but in short (and probably
> with some distortion as I have not seen her full draft yet), it will be
> more about opensource bioinformatic workflow as well as dataset/repository
> creation.
> She wants to address the problem of comparing protein interactions between
> vertebrates for cases when for some species (usually human and popular
> animals) there is structural data on interactions while for other species
> there are only multiple sequence alignments. In such cases, the modeling is
> way easier as contact surfaces are known from some protein pairs. As some
> species have way better working DNA-repair, cancer resistance and other
> properties, knowledge of differences in interactions of some of their
> proteins can be very valuable. At the same time using structural data known
> for part of the species simplifies modeling a lot (she already did such
> types of comparisons in other projects).
> Apart from the obvious benefit for aging researchers (as we will apply
> what she does to protein-protein interactions relevant for aging) there are
> following benefits for people doing comparative biology:
> 1) Everybody who wants to do similar analysis has to go through similar
> pre and post-processing workflows: get protein, get its interactions for
> which structural data is available, get orthologue genes for the
> interactions, evaluate for which of them the analysis is likely to be not
> hard due to similarities in sequences and contact surfaces and so on... In
> other words pre and postprocessing steps before/after molecular dynamics
> simulations.
> 2) There are no common databases/repositories where one can see
> comparisons of protein-protein interactions between species, see predicted
> differences in terms of strength of binding (and other parameters) and
> submit her predictions.  Overall whole biology is heavily biased towards
> interactions of gene-products in well-researched model species + human.
>
> The project goes well in line with the cross-species project that we are
> working on right now (there we plan to publish two papers soon, one of
> which will be about the database that compares expressions of >30 species
> in different tissues, another - about ensembl-based gene predictions for
> prolongevity genes). So, we see a high value of having also structural
> information on key protein-protein interactions for the most important
> protein pairs and we are ready to allocate time/people to help her if her
> GSOC project will be selected.
>
> I understand the project has a large part devoted to research and writing
> new code, while it is recommended to have a focus on integration with
> existing popular software libraries or tools. We will be grateful for
> suggestions how she can modify her student proposal to integrate better
> with existing bioinformatic OSS libraries. What we also noticed (from
> interviewing many students) is that many of them successfully completed
> research-focused projects as well as projects about writing new software in
> GSOC 2019/2018, so it looks like there are ways to have projects with a big
> part of research inside, esp. if research part is not risky.
>
> I also want to ask a question that the students which we interviewed asked
> us. When they submit their proposal to OBF there are only two tags (openms
> and biojs) while the project idea list is way more diverse, which tag
> should they choose if they do not have openms or biojs component? Or should
> they include openms or biojs component to be eligible? In many projects it
> is possible to put a thin biojs integration but it will be far from the
> core of the project.
>
> Sincerely,
> Anton Kulaga
>
> Bioinformatician at Computational Biology of Aging Group
> 296 Splaiul Independentei, Bucharest, Romania, 060031
> http://aging-research.group
>
>
> On Thu, 19 Mar 2020 at 23:29, Michael Crusoe <michael.crusoe at gmail.com>
> wrote:
>
>> Would this be a new software project?
>>
>> --
>> Michael R. Crusoe
>>
>> On Thu, Mar 19, 2020, 21:27 Anton Kulaga <antonkulaga at gmail.com> wrote:
>>
>>> Dear Michael and Sarthak,
>>>
>>> We recently got a student with extremely high expertise (several papers
>>> in good journals) in structural bioinformatics. We had a call with her, and
>>> she suggested a good idea that has nothing to do with generative biolearn
>>> project (GSOC project we announced on OBF website), but goes very well with
>>> one of our current projects (the project devoted to cross-species
>>> comparisons) by complementing it from structural dimension, it may also
>>> produce quite useful OSS code.
>>> I wonder if we need to list the cross-species direction in the OBF
>>> project_ideas google docs or it is just enough for her to apply with her
>>> idea despite its unrelatedness to Generative biolearn project?
>>> She also asked me to ask OBF admins if there can be any issues that she
>>> works as a research assistant in the same institute (but totally different
>>> lab and no previous collaboration with mentors) as one of the mentors
>>>
>>>
>>> Sincerely,
>>> Anton Kulaga
>>>
>>> Bioinformatician at Computational Biology of Aging Group
>>> 296 Splaiul Independentei, Bucharest, Romania, 060031
>>> http://aging-research.group
>>>
>>>
>>> On Sun, 23 Feb 2020 at 12:54, Michael Crusoe <michael.crusoe at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Sat, Feb 22, 2020 at 3:41 PM Anton Kulaga <antonkulaga at gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you very much for accepting my project idea suggestion, I am
>>>>> already getting messages/emails from students.
>>>>>
>>>>
>>>> You are very welcome, and I'm glad to hear about the quick response!
>>>>
>>>>
>>>>> I discovered (some of the students complained) that I forgot to put my
>>>>> contact email to the project description, could you please update the
>>>>> project description? (I put emails as suggestions in your google doc).
>>>>>
>>>>
>>>> I don't have permission to update the website, but I believe Sarthak
>>>> does.
>>>>
>>>>
>>>>> Also, could you please clarify the further steps in GSOC?
>>>>>
>>>>
>>>> Here's the GSoC 2020 timeline
>>>> https://developers.google.com/open-source/gsoc/timeline
>>>>
>>>>
>>>>> From what I understood:
>>>>> * right now we (as mentors) talk with students, interview them and
>>>>> best of the students write detailed applications of what they do in our
>>>>> projects
>>>>>
>>>>
>>>> Most importantly, the students successfully submit their proposals to
>>>> Google.
>>>>
>>>>
>>>>> * OBF requests slots (how many students will get funded) and OBF
>>>>> admins make final decisions who of the students will get funded
>>>>>
>>>>
>>>> First the potential mentors and the admins review all the OBF student
>>>> applications and assign them scores in a spreadsheet. Then we collectively
>>>> decided who we approve (with the consent of their potential mentors) and
>>>> convert that into a number of slots to request from Google.
>>>>
>>>> Once Google grants us a number of slots it is then a race to select
>>>> students (who may also have proposals with other organizations). If a
>>>> student is not available we are free to use that slot for another proposal,
>>>> if agreed by the future mentors and the OBF GSoC admins.
>>>>
>>>>
>>>>>
>>>>> Do I get it right? I wonder how many slots can the project have, what
>>>>> are the limitations? What is the process of the final selection of students
>>>>> by OBF Admins, how is it done?
>>>>>
>>>>
>>>> There is no formal maximum to the number of slots, but we will be
>>>> penalized in the future based upon the performance of our students, and if
>>>> we request more slots than we can fill.
>>>>
>>>> Final selection is usually easy and straight forward: we assign the
>>>> slots to the previously agreed upon students as long as their potential
>>>> mentors are still in agreement.
>>>>
>>>>
>>>>> Also, I noticed that you have "Feel free to propose your own entirely
>>>>> new idea." , in such case can we write in the end of our project
>>>>> description that we will be grateful for any bioinformatic/ML project ideas
>>>>> ( even not directly connected with generative biolearn) that will be
>>>>> beneficial for biology of aging research?
>>>>>
>>>>
>>>> Yes, please suggest that text in the Google doc.
>>>>
>>>>
>>>>>
>>>>>
>>>>> Sincerely,
>>>>> Anton Kulaga
>>>>>
>>>>> Bioinformatician at Computational Biology of Aging Group
>>>>> 296 Splaiul Independentei, Bucharest, Romania, 060031
>>>>> http://aging-research.group
>>>>>
>>>>
>>>>
>>>> --
>>>> Michael R. Crusoe
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/gsoc/attachments/20200322/c08fb058/attachment.htm>


More information about the GSoC mailing list