[GSoC] [Biopython-dev] GSoC python variant update
Brad Chapman
chapmanb at 50mail.com
Wed Aug 8 13:55:36 UTC 2012
Lenna;
This all sounds great and will be a nice practical addition to
Biopython. Thanks for taking it on. Some specific thoughts on your questions:
> * I'm representing intron locations relative to CDS coords using the
> HGVS standards: http://www.hgvs.org/mutnomen/refseq_figure.html
> I'd like to know if there are other common ways of representing such
> positions.
I don't know of one myself, so it's great to be following a standard
rather than reinventing something. Nice work.
> * In order to customize the display of positions (e.g. 0-based or
> 1-based), I'm using a class as a configuration container. I've read on
> StackOverflow that attempts to use globals or a singleton class are
> discouraged in Python, but I have not found practical suggestions for
> how to implement module-wide configurations. Suggestions are welcome.
With configuration items like this, you have two choices:
- A global variable.
- Pass the configuration to every function that needs it.
There are tradeoffs with both approaches, but for this case I agree with
your decision to use globals. Most people will want 0-based/Biopython
style but it gives those who don't a knob to switch over.
> * Any advice about circular genomes or strandedness is also welcome.
Circular handling is an unresolved issue in Biopython:
https://redmine.open-bio.org/issues/2578
It's a bit tricky, especially with features that span the origin.
I'd prioritize handling strandedness since you're going to have plenty
of reverse strand coding sequences. You're mapping not only within the
coding region but also back to the original sequence on the reverse
strand. So in your g2c mapping, the original gene goes from
e1 -> s1 -> e0 -> s0 as you read 5' to 3' across the sequence. The best
place to get started is to pick a reverse strand gene and then work
through the mappings, thinking through the orientations. I find drawing
it out to be the easiest way.
> * This mapper will work for SeqRecords, SeqFeatures, FeatureLocations,
> etc. Are there other Biopython objects that store sequence coordinates
> and thus should be mappable?
That sounds like a great start. Thanks again for this,
Brad
More information about the GSoC
mailing list