[Bioperl-l] Summer of Code Proposal

Fei Hu hufeiyc at gmail.com
Thu Apr 7 13:08:46 UTC 2011


Messina :

I corrected some written mistakes.
Also I added a new whole section talking about the RAxML and comparing it to
others.
Thank you so much.

Best
Fei

On Thu, Apr 7, 2011 at 4:51 AM, Dave Messina <David.Messina at sbc.su.se>wrote:

> Hi,
>
> Looking pretty good, particularly the project plan section.
>
> You might also add some text in your introduction which shows the
> importance of RaxML. Say that it's widely used and demonstrate that with
>  number of citations, number of downloads, or similar data.
>
> Also, there are some small English mistakes (for example wrap instead of
> wrapper, provide instead of provides), so ask a native English speaker to do
> some editing.
>
> Good luck! I'd love to see this happen.
>
> Dave
>
>
> On Apr 6, 2011, at 20:06, Fei Hu <hufeiyc at gmail.com> wrote:
>
> > Hi all,
> >
> > Below is my GoC 2011 proposal that describes my plan and thoughts.
> > As time is really tight now, I need your advice to make it more realistic
> > and reasonable.
> > Appreciate your time for reviewing.
> > Also I am looking for a mentor who is interested in this project and
> willing
> > to guide me through the summer.
> >
> > Best
> > Fei
> >
> > PS: Thanks Chris Fields for your valuable suggestion.
> >
> >
> > Name     Fei HU
> > Address  Rm. 3D-11, Swearingen Engineering Building, University of South
> > Carolina
> > Email      hufeiyc at gmail.com
> >
> > Why you are interested in the project you are proposing and are
> well-suited
> > to undertake it.
> > I like to use Perl to organize and automate the pipeline, starting from
> > extracting data, run various packages and analysis results. And I would
> like
> > more people to know its virtue and make use of it. Bio-Perl provides us a
> > perfect platform.
> > My current research is about gene order phylogeny reconstruction
> following
> > maximum likelihood criteria(others includes MP and NJ based). My
> phylogeny
> > inference pipeline involves using RAxML to build a ML tree and estimating
> > the internal(ancestral) sequence using PAML. While baseml of PAML is
> > well-supported, RAxML is not yet available. Although I wrote my own wrap
> for
> > RAxML, it’s even better for Bio-Perl to wrap RAxML so that everyone can
> use
> > easily.
> > I extensively used and also modified the source to fit RAxML to analysis
> > gene order data. With a good understanding of Perl and RAxML, what’s
> more,
> > the willing to make Bio-perl better, I am prepared to undertake it.
> > Programs or projects you have previously authored or contributed to
> > I implemented the algorithm using Perl[1](open source). And I also use
> and
> > learn Perl in daily bases.
> > A project plan for the project you are proposing
> > The wrap should be consistent with the other existing packages supported
> by
> > Tools::Run in style and api. I plan to it to full-fill most popular
> > functionality RAxML currently provide.
> > 1. Binary Sequence analysis (0-1, binary characters ) and Multi-sates
> > Sequence analysis (0-9A-V, 32 characters, available models are: ORDERED,
> MK,
> > GTR), this is useful for morphological data.
> > 2. DNA analysis and Amino Acid analysis, given custom transition
> matrix(AA
> > only), rate heterogeneity.
> > 4. Conduct standard bootstrapping and rapid bootstrapping as well as the
> > final through inference[2] as well as the relative new bootstopping.
> > 5. Given user starting tree or incomplete constrain tree.
> > 6. Specify a column weight file name to assign individual weights to each
> > column of the alignment.
> > 7. Specify an exclude file name, that contains a specification of
> alignment
> > positions you wish to exclude.
> > 8. Automatically generate random seed for the program.
> > 9. And more to be added.
> > Others plan that may benefit user.
> > 1. Call Bio::SeqIO to parse and reconstruct interleave or sequential
> phylip
> > format so that RAxML can read.
> > 2. Design a set of more understandable commands, such as
> > use “--model” instead of “-P” to specify a custom model file.
> > use “--workingdir” instead of “-w” to specify the working directory.
> > But still one can use the old style according to their own preference.
> > 3. Implement more sophisticated exception handler and running mode
> summary.
> > There is huge combination of arguments that can cause error. For example,
> to
> > enable a rapid bootstrapping plus a thorough inference, one needs to give
> > “-f a” “-x {random seed}” together with the number of replicates “-#
> > {number}”, if anyone is missing, RAxML won’t tell at once that these
> three
> > are all necessary, instead RAxML usually can only tell the “nearest”
> error
> > it can spot. In my plan if one wants to conduct a RBS plus inference, the
> > wrap is able to inform user that all those three are necessary and then
> > guides to correct it.In sum, I plan to dig the errors from source code
> and
> > group them in accordance to their functionality. So each error message
> will
> > no longer be independent.
> > Another “trivial” thoughts is when the running-id already exists, RAxML
> will
> > exit directly without choice, this would be disturbing if overwrite is
> fine,
> > I suggest to use a switch to define the behavior(overwrite, add a
> post-fix
> > to name, exit, skip this run).
> > 4. Preliminary post-processing can be conducted and afterward returned as
> a
> > value or list.  Output the maximum likelihood scores for each
> bootstrapped
> > tree. Enumerate branches that have confidence value larger than a
> threthold.
> > Return a hash table containing branch lengths and running time, final ML
> > score.More analysis could be done by other package anyway.
> >
> > Any obligations, vacations, or plans for the summer that may require
> > scheduling during the GSoC work period.
> > No special obligations and vacations.
> >
> >
> > [1]Hu, F., Gao, N. and Tang, J., "Maximum Likelihood Phylogenetic
> > Reconstruction Using Gene Order Encodings", CIBCB 2011, accepted.
> > [2]Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algorithm for
> the
> > RAxML web-servers. Syst. Biol. 2008, 75:758–771.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
*Fei Hu
Bioinformatics Lab
3D-11 Swearingen Building
U of South Carolina
Tel: 803-397-5240*




More information about the Bioperl-l mailing list