[BioRuby] GSoC project

Konstantin Stepanyuk konstantin.s.stepanyuk at gmail.com
Fri Apr 9 05:19:04 UTC 2010


Hi Naohisa,

Thanks for your comments!
Updated version of application is below.
Some quick comments are inline.

> Write more about your Ruby programming experiences.
> In addition, can you show URL to Ruby scripts you wrote?

I have only basic knowledge of Ruby and its standard library. I wrote more
about my Ruby experience in the plan below.
Examples of the scripts:
- solving the traveling salesman problem: http://paste2.org/p/764145
some handy small tools:
- generator of random DNA sequences http://paste2.org/p/764147
- run through file tree and fix some string http://paste2.org/p/764146


> Please improve project plan more. For example:
> * Preparation. E.g. to subscribe to ruby-core mailing list
> [skip]

I've added this to the plan, but I think I will perform all of this during
the Community Bonding period

> * Extracting bioruby-1.4.0.tar.gz, looking at lib/ and
> test/unit (andtest/functional), and checking existance
> of test files and directories corersponding to library
> main files.

I've already checked out the git repository, and succeed to run the tests.

> Although it is very difficult, but if you can, it is
> good to estimate the needed efforts. It is also good
> to prioritize classes/modules to write tests. I think
> Bio::GenBank and Bio::GenPept are high priority.

As I think estimating of the current tests quality and coverage is quite
complex task which will require load of time. So I mentioned this phase in
the development plan. I've already played with the tests and IMO there is a
room for improvement.

Thanks,
Kostya.

Updated application:

1.Contact information

Full Name: Konstantin Stepanyuk

Address:
Pirogova str. 20/1, app. 800,
Novosibirsk,
Zip code: 630090
Russian Federation.

E-mail: konstantin.s.stepanyuk at gmail.com
Phone: +7 923 247 2424
ICQ: 427601980


2. Motivation and goals.
Bioinformatics is one of my primary fields of interests. I already have a
solid background in bioinformatics since I have been participating in Unipro
UGENE (http://ugene.unipro.ru) open-source bioinformatics project for two
years. My existing research area in university includes local sequence
alignment and genome assembly.

I highly appreciate Ruby programming language and I was very glad to get to
know that there is an open-source ruby-based open-source bioinformatics
project.
I believe that cross-version of BioRuby is an important issue for the
project, since the project is quite modern and perspective. The one of the
main tasks in porting BioRuby to version 1.9.2 is improving test coverage,
since currently project has quite little unit tests. It will make us more
certain about introducing compatibility & conformance fixes.


3. My skills summary and work experience
Programming languages: C++ (3 years), Java, Ruby, Python.
Ruby experience: basic knowledge. Ability to write simple scripts not larger
than ~200 LoC.
 I've wrote some algorithms in Ruby such as sorts, Simulating Annealing for
traveling salesman problem, several networking scripts (simple TCP
servers/clients), and handy 'one-liners' for every day tasks.

Projects:
* Unipro UGENE - free and open-source Integrated Bioinformatic Tools (
http://ugene.unipro.ru).
- Role: C++ and Qt developer for two years (Unipro LLC).
- Implemented and tested several algorithms, such as Smith-Waterman local
sequence alignment (and its SSE, CUDA and ATI Stream versions).
* Apache Harmony - clean-room implementation of J2SE platform (
http://harmony.apache.org).
- Role: Intern in Intel corporation
- Implemented tool for aggregating and reporting perfomance and statistical
counters.


4. A project plan.
I propose to divide the total work into two big milestones, accordingly to
Google schedule. Also, the plan includes preparation phase which will be
performed during the Community Bonding time.

0) Preparation:
- Establishing the Ruby environment:
* install different actual versionf of Ruby: 1.8.7, 1.9.1, and check out the
Ruby repository to be able to regularly build the newest version.
* Subsribing to Ruby development mailing list to check the current status of
the project
- Establishing BioRuby environment
* Checking out BioRuby codebase
  * Choosing a right tools to work with BioRuby code. Vim + Rakefiles way is
surely reliable, but using some high-level IDE such as JetBrains RubyMine
will be considered.

1) Improving test coverage of the project. 23 May - 16 July (total 8 weeks)

2) Porting the project codebase to be compatible with Ruby 1.9.2. 16 July -
20 August (total 5 weeks).

Each of this chunks of work is divided into several subparts:

1)
- Evaluate test coverage (1 week). This includes:
* prioritizing classes/modules to write tests.
* measuring coverage. Rcov is the first candidate to use.
* integration the test coverage metrics to the build process will be
considered.

- Write unit-tests according to the plan. Consider creating the stress-test
suite. (6-7 weeks)

2)
- Elaborate the list of incompatibilities with new version of Ruby (1 week)
- Port the codebase (4 weeks)

5. My plans for the summer
I plan that GSoC project will be my primary occupation during the summer.
But I'm going to a have a 1 week vacation in July.



On Thu, 8 Apr 2010 16:55:25 +0800
Konstantin Stepanyuk <konstantin.s.stepanyuk at gmail.com> wrote:

> Hi Pjotr and folks,
>
> here is my proposal written according to the scheme published on OBF
> GSoC page. It is quite compact since I have not buried into the
> codebase and tests deeply. So I will appreciate any help or
> suggestions, and I'm looking forward to contribute to your project
> during the GSoC.
>
> Thanks!
> Kostya.
>
> 1.Contact information
>
> Full Name: Konstantin Stepanyuk
>
> Address:
> Pirogova str. 20/1, app. 800,
> Novosibirsk,
> Zip code: 630090
> Russian Federation.
>
> E-mail: konstantin.s.stepanyuk at gmail.com
> Phone: +7 923 247 2424
> ICQ: 427601980
>
>
> 2. Motivation and goals.
> Bioinformatics is one of my primary fields of interests. I already
> have a solid background in bioinformatics since I have been
> participating in Unipro UGENE (http://ugene.unipro.ru) open-source
> bioinformatics project for two years. My existing research area in
> university includes local sequence alignment and genome assembly.
>
> I highly appreciate Ruby programming language and I was very glad to
> get to know that there is an open-source ruby-based open-source
> bioinformatics project.
> I believe that cross-version of BioRuby is an important issue for the
> project, since the project is quite modern and perspective. The one of
> the main tasks in porting BioRuby to version 1.9.2 is improving test
> coverage, since currently project has quite little unit tests. It will
> make us more certain about introducing compatibility & conformance
> fixes.
>
>
> 3. My skills summary and work experience
>  Programming languages: C++ (3 years), Java, Ruby, Python.
>  Projects:
>  * Unipro UGENE - free and open-source Integrated Bioinformatic Tools
> (http://ugene.unipro.ru).
>    - Role: C++ and Qt developer for two years (Unipro LLC).
>    - Implemented and tested several algorithms, such as Smith-Waterman
> local sequence alignment (and its SSE, CUDA and ATI Stream versions).
>  * Apache Harmony - clean-room implementation of J2SE platform
> (http://harmony.apache.org).
>    - Role: Intern in Intel corporation
>    - Implemented tool for aggregating and reporting perfomance and
> statistical counters.
>
>
> 4. A project plan.
> I propose to divide the total work into two big milestones,
> accordingly to Google schedule.
>
> 1) Improving test coverage of the project. 23 May - 16 July (total 8
weeks)
>
> 2) Porting the project codebase to be compatible with Ruby 1.9.2. 16
> July - 20 August (total 5 weeks).
>
> Each of this chunks of work is divided into several subparts:
>
> 1)
>  - Evaluate test coverage (1 week). Consider integration of some tool
> to build process to automate test coverage reporting.
>    Create concrete test plan which will be targeted to improve test
> coverage up to 90-100%
>  - Write unit-tests according to the plan. Consider creating the
> stress-test suite. (6-7 weeks)
>
> 2)
>  - Elaborate the list of incompatibilities with new version of Ruby (1
week)
>  - Port the codebase (4 weeks)
>
> 5. My plans for the summer
> I plan that GSoC project will be my primary occupation during the
> summer. But I'm going to a have a 1 week vacation in July.
>
> On 4/7/10, Pjotr Prins <pjotr.public14 at thebird.nl> wrote:
> > Hi Konstantin,
> >
> > Not much time left. Leave us enough time to help comment.
> >
> > Pj.
> >
> > On Wed, Apr 07, 2010 at 01:02:07PM +0800, Konstantin Stepanyuk wrote:
> >> Hi All,
> >>
> >> My name is Kostya Stepanyuk, I'm an undergraduate student from
> >> Novosibirsk State University in Russia and I'm a looking forward to
> >> participate in 'Ruby 1.9.2 support of BioRuby' GSoC project.
> >>
> >> I already have a background in bioinformatics since I have been
> >> participating in Unipro UGENE (http://ugene.unipro.ru) open-source
> >> bioinformatics project for a long time. Also, I highly appreciate Ruby
> >> programming language and I was very glad to get to know that there is
> >> an open-source ruby-based open-source bioinformatics project.
> >>
> >> My motivation in participating in this project is to improve my
> >> knowledge of Ruby, to familiarize myself with your great project and
> >> to help BioRuby become more qualitative and popular. I'm looking
> >> forward to contribute to your promising project!
> >>
> >> I'm going to send the full application as soon as possible.
> >>
> >> Thanks,
> >> Kostya.
> >> _______________________________________________
> >> BioRuby Project - http://www.bioruby.org/
> >> BioRuby mailing list
> >> BioRuby at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioruby
> >
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby



More information about the BioRuby mailing list