[Biojava-l] GSoC Application

Mark Chapman chapman at cs.wisc.edu
Thu Apr 8 20:45:21 UTC 2010


Hi Andreas,

Thanks for the feedback.

Difficulties and risks:
By viewing progressive multiple sequence alignment as four separate stages, I 
believe the pieces become easier to manage.  However, I also expect a few of my 
ideas to prove quite challenging to implement.  One of these challenges will be 
efficient parallelization.  Instead of spending all summer finding the optimal 
approach, I plan to make routines which are called in sequence in a simple 
implementation and in parallel in a separate one.  Later work could then extend 
the parallelism to a distributed computing framework such as hadoop or condor. 
Another difficult aspect is to make a general interface for choosing anchors in 
profile-profile alignment.  The Myers-Miller algorithm chooses optimal midpoints 
as anchors in an internal decision process.  I hope to generalize this to allow 
external identification of candidate anchors, as well.

Structural alignment integration:
At least three options exist for inserting structural information into the 
multiple sequence alignment task: pairwise scoring, anchoring, and profile 
scoring.  First, scores from pairwise structural alignments could be used to 
construct the similarity matrix.  This would create a guide tree that aligns 
sequences with similar structures earlier in the progressive alignment.  Second, 
structural alignment could identify possible anchors.  The profile-profile 
alignments would then conserve known structures when two profiles share some 
anchor candidates.  Both of these options are in my plan.  The third option 
would follow the consistency method of profile-profile alignment which replaces 
scoring from a substitution matrix with a consistency score.  This technique is 
used in T-Coffee and ProbCons.  The consistency score comes from how often 
residues in each profile aligned when combining information from pairwise 
alignments.  If these were structural pairwise alignments, then the multiple 
sequence alignment would preserve structural information.  Later work could 
implement this method as an alternative profile-profile alignment.

I'll try to incorporate these ideas when I revise my application later tonight. 
  And thanks again for your input.

Mark


On 4/8/2010 12:26 PM, Andreas Prlic wrote:
> Hi Mark,
>
> looks pretty good,
>
> * The time schedule feels tight. Where do you see possible
> difficulties and risks. What might take longer than expected?
>
> * I would like to be able to use 3D structure alignment information to
> guide the final alignment. This should increase reliability of the
> final alignment for remote sequence similarities. Any thoughts on how
> to accomplish this?
>
> Andreas
>
>
>
>
> On Thu, Apr 8, 2010 at 5:47 AM, Mark Chapman<chapman at cs.wisc.edu>  wrote:
>> I would appreciate any feedback on my proposal from mentors or other
>> developers.  Check it out at:
>> http://socghop.appspot.com/gsoc/student_proposal/show/google/gsoc2010/mark_chapman/t127055148817
>>
>> Thanks in advance,
>> Mark
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
>
>



More information about the Biojava-l mailing list