[Biopython-dev] Coordinate Mapper pull request

Reece Hart reece at harts.net
Fri Apr 18 05:57:23 UTC 2014


Lenna- I hope that grad school is treating you well. As the diagnostic
sequencing space has heated up, accurate transcript-genome mapping has
become ever more relevant to genome interpretations.

The mapping code from that original post, now 4 years old I think, was
quite primitive. My recollection is that the original post didn't even
handle minus strand transcripts. Perhaps you've improved it since then.

A team of us at Invitae have recently released a much more robust HGVS
parser, formatter, and mapper (Apache 2.0 licensed). It's available at
https://bitbucket.org/invitae/hgvs. This package currently relies on
transcripts from the Universal Transcript Archive, which has current and
recent historical transcripts from multiple sources and alignments to
reference genomes and patches using splign and blat. UTA is available via
postgresql/libpq at uta.invitae.com:5432; code is at
https://bitbucket.org/invitae/uta, but only required for loading the
database).

This pair is reasonably accurate, but does have shortcomings. Users are
advised to read through issues to understand limitations and the
development roadmap. We are actively working on improving these tools.
Patches are certainly welcome. (For those going to HVP in May, I'll be
speaking about it there and would love to connect with users.)

-Reece



More information about the Biopython-dev mailing list