[GSoC] Questions on next steps for MAF parsing for bio-maf

Clayton Wheeler cswh at umich.edu
Tue Jul 10 23:45:33 UTC 2012


Hi all,

In the course of working out my plan for the rest of my bio-maf project, I have come up with a few questions I'm not able to answer:

https://github.com/csw/bioruby-maf/wiki/Questions

* Is it useful to build indexes on other sequences besides the reference sequence?

* Should the score field of an alignment block be zeroed or removed whenever the block is modified?

* How, precisely, should selection based on features in GTF/GFF3 files work?

* When converting a MAF Block/Sequence to bio-alignment representation, how should we handle quality metadata (from 'q' lines), which is tied to the actual sequence data and would need to be maintained in parallel if a column were deleted?

* Is supporting the bx-python index format still desirable? Performance with Kyoto Cabinet indexes seems competitive, and the indexes are neither very large nor very expensive to build.

* Blankenberg et al. mention this filtering mode: "removing blocks which have aligned species occurring between non-syntenic chromosomes or strands" which is unfortunately a bit cryptic.

* Are coverage statistics useful or appropriate to provide?

Any insight that you might be able to offer would be helpful.

Thanks,

Clayton Wheeler
cswh at umich.edu







More information about the GSoC mailing list