[Biopython] how to find closest genes for a given location
Brad Chapman
chapmanb at 50mail.com
Thu Feb 25 13:34:31 UTC 2010
Hi Sameet;
> I have multiple locations from human genomes. I want to determine
> what are the closest genes on either side of the location, and if it
> is in the location how far from the TSS the given location is. I was
> thinking of using the CCDS database, because it contains information
> for the genes that have been verified. Is there any other
> better/smarter way of doing it.
I don't know of a ready to go library in Python that does this, but
you could put something together using the Interval intersection
library in bx-python:
http://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/intervals/intersection.pyx
You would build up an interval tree of gene features from someplace
like CCDS, and then loop through your BED file and intersect with
the tree. For finding closest non-overlapping genes, look at
upstream_of_interval and downstream_of_interval.
For a non-python approach the ChIPpeakAnno R package in Bioconductor
provides a library that does what you are looking for:
http://bioconductor.org/packages/2.5/bioc/html/ChIPpeakAnno.html
rpy2 is an excellent gateway to R from Python:
http://rpy.sourceforge.net/rpy2.html
Hope this helps,
Brad
More information about the Biopython
mailing list