[Biopython] Working with genomic intervals

Aaron Quinlan aaronquinlan at gmail.com
Mon Aug 15 23:54:31 UTC 2011


Dear Peter, Sean, and Laurent,
   Thanks so much for the useful suggestions.
Best,
Aaron


On Aug 15, 2011, at 2:17 AM, Laurent Gautier wrote:

> On 2011-08-14 18:00, biopython-request at lists.open-bio.org wrote:
>> On Sun, Aug 14, 2011 at 7:11 AM, Peter Cock<p.j.a.cock at googlemail.com>  wrote:
>>> >  On Friday, August 12, 2011, Aaron Quinlan<aaronquinlan at gmail.com>  wrote:
>>>> >>  All,
>>>> >>
>>>> >>  I apologize in advance if this is a naive question.
>>>> >>  I am wondering if BioPython provides libraries for
>>>> >>  working with genomic intervals in BED, GFF, or
>>>> >>  any other like format? ?I am looking for libraries
>>>> >>  that handle the parsing of files in these formats
>>>> >>  into Python objects, as well as libraries for
>>>> >>  manipulating (intersection, merging, counting,
>>>> >>  etc.) intervals. ?I know this exists in Galaxy's
>>>> >>  bx-python, but am wondering if there are similar
>>>> >>  libraries in BioPython?
>>>> >>
>>>> >>  Gratefully,
>>>> >>  Aaron
>>> >
>>> >  Hi Aaron,
>>> >
>>> >  Have a look athttp://biopython.org/wiki/GFF_Parsing
>>> >  wher Brad is working on this. He's also spoken
>>> >  highly of bx-python as I recall.
>> I would second the bx-python vote.  Not only are the "normal" interval
>> classes covered, but there are also some variants (clustering is one
>> that comes to mind).
>> 
>> Sean
> 
> One can also access from Python the utilities for ranges available in
> bioconductor, for example using the bioconductor extension to rpy2 or rpy2
> directly (may be using dynamic class mapping features, as shown below):
> 
> from rpy2.robjects.packages import importr
> iranges = importr("IRanges")
> # Python class IRanges as an API to Bioconductors IRanges::IRanges
> from rpy2.robjects.methods import RS4, RS4Auto_Type
> class IRanges(RS4):
>    __metaclass__ = RS4Auto_Type
>    __rpackagename__ = "IRanges"
>    __rname__ = "IRanges"
> 
> # now in action
> 
> >>> from rpy2.robjects.vectors import IntVector
> >>> ir = IRanges(iranges.IRanges(start = IntVector(range(10)), width = 11))
> >>> print(ir)
> IRanges of length 10
>     start end width
> [1]      0  10    11
> [2]      1  11    11
> [3]      2  12    11
> [4]      3  13    11
> [5]      4  14    11
> [6]      5  15    11
> [7]      6  16    11
> [8]      7  17    11
> [9]      8  18    11
> [10]     9  19    11
> >>> print(IRanges(ir.reduce__IRanges(ir)))
> IRanges of length 1
>    start end width
> [1]     0  19    20
> 
> 





More information about the Biopython mailing list