[Biojava-l] [Fwd: Re: Fwd: Proposing a project on "Biojava alignment lead"]

jitesh dundas jbdundas at gmail.com
Sat Apr 17 02:20:12 UTC 2010


Hi Everyone,

I went throug  the URLs sent by Dr Chapman. Interesting  work that you
are doing here.:)...

I was wondering if there is anyone who could consider on these. I
would like to also be a part of the research work being carried out
using Biojava( especially in sequence alignment, miRNA signature
Analysis (especially for cancers)...)

1) A set of tools for converting flat data (e.g. sequence strings,
taxononmy strings) into BioJava-like objects (e.g. SymbolLists,
NCBITaxon). These BioJava-like objects could then be used for more
advanced applications.
 A set of tools for manipulating the BioJava-like objects.

2) Module?: biojava-ws-blast Module?: biojava-ws-biolit
Proposed Module: biojava-j2ee Lead: Mark Schreiber

- This would probably take the form of SessionBeans and WebServices
that can be deployed to Glassfish/ JBoss etc to provide biological
services for people who want to make client server or SOA apps.

3) I also liked what  Mr. Gang Wu is working on(I read the
discussions). I was wondering if I could
do something of that  sort...

May I request the leads to tell me how I could chip in...

Regards,
Jitesh Dundas



On 4/16/10, Mark Chapman <chapman at cs.wisc.edu> wrote:
> A great place to start finding ideas is the wiki.
> Both http://biojava.org/wiki/BioJava:Modules
> and http://biojava.org/wiki/BioJava3_Proposal
> list the next steps planned/desired for BioJava.
>
> What research area did you have in mind?
>
> Have fun,
> Mark
>
>
> On 4/16/2010 8:57 AM, jitesh dundas wrote:
>> Dear Sir,
>>
>> I am very interested in contributing to this project.
>>
>> I am looking for a good problem,more on the research side. I can also
>> help in coding (I also work as a software
>> engineer-j2ee/eclipse/jboss/tomcat ..
>>
>> Anything that I could work on...
>>
>> Regards,
>> Jitesh Dundas
>>
>> On 4/8/10, Andreas Dräger<andreas.draeger at uni-tuebingen.de>  wrote:
>>> Hi all,
>>>
>>> This e-mail is just for your information about somebody new, who'd like
>>> to contribute to our project.
>>>
>>> Cheers
>>> Andreas
>>>
>>>
>>> Subject:
>>> Re: Fwd: Proposing a project on "Biojava alignment lead"
>>> From:
>>> Andreas Dräger<andreas.draeger at uni-tuebingen.de>
>>> Date:
>>> Wed, 07 Apr 2010 09:27:13 +0200
>>> To:
>>> Cai Shaojiang<caishaojiang at gmail.com>
>>>
>>> Hi Cai Shaojiang,
>>>
>>> Thank you for you e-mail! I don't know what happened to the e-mail list.
>>> Sometimes it takes a while due to the spam filters, I guess.
>>>
>>>   >  I am a PhD student from National University of Singapore. My major
>>> research area is local alignment algorithms and data structures for SNP
>>> identification. And I have used Java and Eclipse for years for software
>>> development. I am very interested in your GSoC programme. I find that
>>> there is a module called "biojava-alignment lead" whose mentor is you. I
>>> want to propose a new project on this module. I have several questions
>>> about this module.
>>>
>>> Yes, that's me. So great to get your support.
>>>
>>>   >  1. It seems that pairwise alignment is to find similarity between
>>> two
>>> short sequences. Existing pairwise alignment is based on dynamic
>>> programming, is it Smith-Waterman algorithm?
>>>
>>> So, currently, BioJava contains three different alignment approaches.
>>> There are two deterministic algorithms, i.e., Smith-Waterman for local
>>> alignment and Needleman-Wunsch for global alignment. Third, there is the
>>> possibility to apply Hidden Markov Models for alignment. An example of
>>> the latter approach should be in the cookbook.
>>>
>>>   >  2. What is the exact task of "refactoring of underlying data
>>> structures"?
>>>
>>> Yes, this is something, I did last week already but it could still be
>>> improved. The problem was that the alignment algorithms actually
>>> produced a kind of string that looks similar to the output of BLAST.
>>> This string contained the score, the computation time, the length of the
>>> alignment etc. The problem was that people wanted to perform
>>> higher-level computation on the score value or evaluate some other
>>> information. Now, the alignment will produce a data structure that
>>> contains all the information and can, in addition to that, also produce
>>> such a BLAST-like output. There is, however, still the following
>>> problem: The data structure requires both sequences in the pair-wise
>>> alignment to have an identical length. In case of local alignment this
>>> is especially stupid (actually), because gaps are inserted to fill the
>>> sequences. And then the data structure tries to keep the old sequence
>>> coordinates, leading to the effect that the numbers "query start",
>>> "query end", "subject start", and "subject end" are required to shift
>>> the sequences against each other when displaying the output. So, you
>>> cannot easily print the sequences below of each other, you first have to
>>> shift them. Please check out the latest version of this package via
>>> anonymeous svn and have a look ;-)
>>>
>>>   >  3. My existing research area is aiming to deal with aligning short
>>> read (10s~100s bp) against extremely long sequences (e.g., human
>>> genome). Af far as I know, there is not existing such alignment tools
>>> implemented in Java. Would you consider this direction?
>>>
>>> See, this would be very nice to include. But this requires that we no
>>> longer fill the short sequence with many, many gap symbols (just a waist
>>> of memory), but improve the data structure. There is already an
>>> UnequalLenghtAlignment (just a data structure, no algorithm) and I think
>>> we could use this as a starting point. Then your algorithm should only
>>> produce such a data structure and this would be fine.
>>>
>>>   >  4. It seems that the existing tools is just lacking of some
>>> refactoring and representation interfaces. Any more underlying tasks?
>>>
>>> Hm. Yes: With the release of BioJava 3 data structures have changed
>>> again. So maybe there's also some adaptation to the new structure
>>> required.
>>>
>>>   >  I am keeping an eye on GSoC from last month, but sorry to find out
>>> that I sent the initial email to the mailing list before I subscribe
>>> it...
>>>
>>> Ok. Sounds good. Thanks for your interest. So I suggest: Download the
>>> latest trunk, have a look, play around and if you can improve something
>>> we'll put it into the trunk and write your name into the authors' tag.
>>>
>>> Cheers
>>> Andreas
>>>
>>> --
>>> Dipl.-Bioinform. Andreas Dräger
>>> Eberhard Karls University Tübingen
>>> Center for Bioinformatics (ZBIT)
>>> Sand 1
>>> 72076 Tübingen
>>> Germany
>>>
>>> Phone: +49-7071-29-70436
>>> Fax:   +49-7071-29-5091
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>




More information about the Biojava-l mailing list