[Biojava-l] Analytical Tool- Prediction of Unknown Protein's location on an a Predicted pathway

Sat Apr 17 02:31:46 UTC 2010

Dear All,

I wanted to propose an analytical tool in BioJava.

For e.g.) if we have  a large datasets with complete pathway
information  and the related information(e.g. p53 pathway will have
all the genes,proteins,miRNA s involved,etc ) mentioned, could we find
the location of a specific unknown (and just predicted protein)
protein/gene on a predicted pathway.

This was a suggestion on  the possible t ings on the analytical side
that we could do.Could we think of doing something of this sort for
BioJava (or atleast make it capable to handle such aspects)

Any ideas / comments are most welcome...

Regards,
Jitesh Dundas

On 4/17/10, jitesh dundas <jbdundas at gmail.com> wrote:
> Hi Everyone,
>
> I went throug  the URLs sent by Dr Chapman. Interesting  work that you
> are doing here.:)...
>
> I was wondering if there is anyone who could consider on these. I
> would like to also be a part of the research work being carried out
> using Biojava( especially in sequence alignment, miRNA signature
> Analysis (especially for cancers)...)
>
> 1) A set of tools for converting flat data (e.g. sequence strings,
> taxononmy strings) into BioJava-like objects (e.g. SymbolLists,
> NCBITaxon). These BioJava-like objects could then be used for more
> advanced applications.
>  A set of tools for manipulating the BioJava-like objects.
>
> 2) Module?: biojava-ws-blast Module?: biojava-ws-biolit
> Proposed Module: biojava-j2ee Lead: Mark Schreiber
>
> - This would probably take the form of SessionBeans and WebServices
> that can be deployed to Glassfish/ JBoss etc to provide biological
> services for people who want to make client server or SOA apps.
>
> 3) I also liked what  Mr. Gang Wu is working on(I read the
> discussions). I was wondering if I could
> do something of that  sort...
>
> May I request the leads to tell me how I could chip in...
>
> Regards,
> Jitesh Dundas
>
>
>
> On 4/16/10, Mark Chapman <chapman at cs.wisc.edu> wrote:
>> A great place to start finding ideas is the wiki.
>> Both http://biojava.org/wiki/BioJava:Modules
>> and http://biojava.org/wiki/BioJava3_Proposal
>> list the next steps planned/desired for BioJava.
>>
>> What research area did you have in mind?
>>
>> Have fun,
>> Mark
>>
>>
>> On 4/16/2010 8:57 AM, jitesh dundas wrote:
>>> Dear Sir,
>>>
>>> I am very interested in contributing to this project.
>>>
>>> I am looking for a good problem,more on the research side. I can also
>>> help in coding (I also work as a software
>>> engineer-j2ee/eclipse/jboss/tomcat ..
>>>
>>> Anything that I could work on...
>>>
>>> Regards,
>>> Jitesh Dundas
>>>
>>> On 4/8/10, Andreas Dräger<andreas.draeger at uni-tuebingen.de>  wrote:
>>>> Hi all,
>>>>
>>>> This e-mail is just for your information about somebody new, who'd like
>>>> to contribute to our project.
>>>>
>>>> Cheers
>>>> Andreas
>>>>
>>>>
>>>> Subject:
>>>> Re: Fwd: Proposing a project on "Biojava alignment lead"
>>>> From:
>>>> Andreas Dräger<andreas.draeger at uni-tuebingen.de>
>>>> Date:
>>>> Wed, 07 Apr 2010 09:27:13 +0200
>>>> To:
>>>> Cai Shaojiang<caishaojiang at gmail.com>
>>>>
>>>> Hi Cai Shaojiang,
>>>>
>>>> Thank you for you e-mail! I don't know what happened to the e-mail
>>>> list.
>>>> Sometimes it takes a while due to the spam filters, I guess.
>>>>
>>>>   >  I am a PhD student from National University of Singapore. My major
>>>> research area is local alignment algorithms and data structures for SNP
>>>> identification. And I have used Java and Eclipse for years for software
>>>> development. I am very interested in your GSoC programme. I find that
>>>> there is a module called "biojava-alignment lead" whose mentor is you.
>>>> I
>>>> want to propose a new project on this module. I have several questions
>>>> about this module.
>>>>
>>>> Yes, that's me. So great to get your support.
>>>>
>>>>   >  1. It seems that pairwise alignment is to find similarity between
>>>> two
>>>> short sequences. Existing pairwise alignment is based on dynamic
>>>> programming, is it Smith-Waterman algorithm?
>>>>
>>>> So, currently, BioJava contains three different alignment approaches.
>>>> There are two deterministic algorithms, i.e., Smith-Waterman for local
>>>> alignment and Needleman-Wunsch for global alignment. Third, there is
>>>> the
>>>> possibility to apply Hidden Markov Models for alignment. An example of
>>>> the latter approach should be in the cookbook.
>>>>
>>>>   >  2. What is the exact task of "refactoring of underlying data
>>>> structures"?
>>>>
>>>> Yes, this is something, I did last week already but it could still be
>>>> improved. The problem was that the alignment algorithms actually
>>>> produced a kind of string that looks similar to the output of BLAST.
>>>> This string contained the score, the computation time, the length of
>>>> the
>>>> alignment etc. The problem was that people wanted to perform
>>>> higher-level computation on the score value or evaluate some other
>>>> information. Now, the alignment will produce a data structure that
>>>> contains all the information and can, in addition to that, also produce
>>>> such a BLAST-like output. There is, however, still the following
>>>> problem: The data structure requires both sequences in the pair-wise
>>>> alignment to have an identical length. In case of local alignment this
>>>> is especially stupid (actually), because gaps are inserted to fill the
>>>> sequences. And then the data structure tries to keep the old sequence
>>>> coordinates, leading to the effect that the numbers "query start",
>>>> "query end", "subject start", and "subject end" are required to shift
>>>> the sequences against each other when displaying the output. So, you
>>>> cannot easily print the sequences below of each other, you first have
>>>> to
>>>> shift them. Please check out the latest version of this package via
>>>> anonymeous svn and have a look ;-)
>>>>
>>>>   >  3. My existing research area is aiming to deal with aligning short
>>>> read (10s~100s bp) against extremely long sequences (e.g., human
>>>> genome). Af far as I know, there is not existing such alignment tools
>>>> implemented in Java. Would you consider this direction?
>>>>
>>>> See, this would be very nice to include. But this requires that we no
>>>> longer fill the short sequence with many, many gap symbols (just a
>>>> waist
>>>> of memory), but improve the data structure. There is already an
>>>> UnequalLenghtAlignment (just a data structure, no algorithm) and I
>>>> think
>>>> we could use this as a starting point. Then your algorithm should only
>>>> produce such a data structure and this would be fine.
>>>>
>>>>   >  4. It seems that the existing tools is just lacking of some
>>>> refactoring and representation interfaces. Any more underlying tasks?
>>>>
>>>> Hm. Yes: With the release of BioJava 3 data structures have changed
>>>> again. So maybe there's also some adaptation to the new structure
>>>> required.
>>>>
>>>>   >  I am keeping an eye on GSoC from last month, but sorry to find out
>>>> that I sent the initial email to the mailing list before I subscribe
>>>> it...
>>>>
>>>> Ok. Sounds good. Thanks for your interest. So I suggest: Download the
>>>> latest trunk, have a look, play around and if you can improve something
>>>> we'll put it into the trunk and write your name into the authors' tag.
>>>>
>>>> Cheers
>>>> Andreas
>>>>
>>>> --
>>>> Dipl.-Bioinform. Andreas Dräger
>>>> Eberhard Karls University Tübingen
>>>> Center for Bioinformatics (ZBIT)
>>>> Sand 1
>>>> 72076 Tübingen
>>>> Germany
>>>>
>>>> Phone: +49-7071-29-70436
>>>> Fax:   +49-7071-29-5091
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>