[BioRuby] GSoC: MAF parser questions

Clayton Wheeler cswh at umich.edu
Mon Mar 26 07:26:14 UTC 2012


Hello,

I'm Clayton Wheeler, an undergraduate microbiology student at the University of Michigan. I'm particularly interested in molecular biology and microbial ecology, but in a previous life I was a programmer and system administrator for 11 years, so I'm very interested in applying that background to bioinformatics, also. The lab I've been working in uses bioinformatics tools and next-generation sequencing quite a bit, especially for studying hydrothermal plume communities; I've been working on the wet-lab side, but this is an area I'd like to learn more about. Moreover, I've been using Ruby for the last six years or so, and so I'd like to help develop it as a bioinformatics platform. The GSoC projects for BioRuby caught my eye, and I'd like to see if there's a good way I could contribute there.

The parser projects for MAF and GFF3 seem interesting and doable; I've been looking into the MAF parser project in particular, and I have a few questions about it. 

First, does the set of operations on MAF data described in Blankenberg et al (http://1.usa.gov/GQRYvX) cover the use cases that such a parser and indexing system should support?

Is https://github.com/polyatail/biopython the correct location of the BioPython MAF code described in http://biopython.org/wiki/Multiple_Alignment_Format? (It looks like it's been 11 months since the last commit.) 

Also, I've found a few existing tools for working with MAF files such as the scripts shipped with bx-python and PHAST, but not too many worked examples of how they'd be used in practice. Are there any more or less real-world examples available of MAF operations demonstrating the kinds of indexed access, large data sizes, etc. that should be supported in bio-alignment? 

Thanks,

Clayton Wheeler
cswh at umich.edu






More information about the BioRuby mailing list