[GSoC] Questions on next steps for MAF parsing for bio-maf

Tue Jul 10 23:45:33 UTC 2012

Hi all,

In the course of working out my plan for the rest of my bio-maf project, I have come up with a few questions I'm not able to answer:

https://github.com/csw/bioruby-maf/wiki/Questions

* Is it useful to build indexes on other sequences besides the reference sequence?

* Should the score field of an alignment block be zeroed or removed whenever the block is modified?

* How, precisely, should selection based on features in GTF/GFF3 files work?

* When converting a MAF Block/Sequence to bio-alignment representation, how should we handle quality metadata (from 'q' lines), which is tied to the actual sequence data and would need to be maintained in parallel if a column were deleted?

* Is supporting the bx-python index format still desirable? Performance with Kyoto Cabinet indexes seems competitive, and the indexes are neither very large nor very expensive to build.

* Blankenberg et al. mention this filtering mode: "removing blocks which have aligned species occurring between non-syntenic chromosomes or strands" which is unfortunately a bit cryptic.

* Are coverage statistics useful or appropriate to provide?

Any insight that you might be able to offer would be helpful.

Thanks,

Clayton Wheeler
cswh at umich.edu