[Biopython-dev] OBO parser & DAG

Iddo Friedberg idoerg at gmail.com
Wed Jan 8 16:55:16 UTC 2014


Hi Bartek.

See inlined responses below. Anyone else has ideas on how to do this?

On Tue, Jan 7, 2014 at 3:51 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org>wrote:

> Hi,
>
> I've talked with Kamil today and we've looked through some of your code.
>
> We have actually evaluated networkx before implementing our library and
> thought it was unnecessary for our purposes to make it a requirement for
> biopython as the enrichment analysis is using a relatively small subset of
> graph operations.
>
>
I understand your rationale, but I disagree with it, mainly for design
reasons.

1. Enrichment analysis is only one of many different applications that can
be performed with GO. Therefore, saying that features are unnecessary
because a particular use case does not require them should not be a design
consideration for a module that is intended for general use. Rather, a
generic package handling ontologies should be just that: generic, and
disengaged from any kind of application. Therefore, if your package is
intended for biopython the use-case (enrichment analysis) should be
decoupled from the parser + data structure.

2. The graph features that you wrote in Digraph exist in networkx anyway,
or am I missing something? So why not take advantage of nx instead of
redoing it even if it does have many redundant (for you) graph manipulation
& diagnostic features? Someone else may want to use these features,
including the graphics nx provides, etc.



> However it would be very easy to make functions for converting our
> ontologies to networkx digraphs, either with or without gene annotations as
> additional attributes.
>
>
Well, the idea is actually to maintain ontologies as nx digraphs. Yes, I
agree there.


> As for support for different types of transitivity in relations of
> different type (as in your inference of ancestry for is_a and part_of
> relations) we are currently not supporting it, but after thinking about it,
> we will make a change to support this feature. Probably we will let the
> user to (optionally) define the transitivity between relationship types
> (i.e. is_a + part_of becomes part_of, etc).
>
> In general, it would be very helpful if you could give us some rough idea
> about your expected use cases. For example: are you expecting to modify the
> graphs in the networkx objects? What will you use the inferred ancestor
> lists for? So that the changes we make will be as useful to the community
> as possible.
>


The idea is that expected use cases should not impact the design of a basic
parser + data structure. In my lab, we are looking at inferred ancestors
lists to calculate semantic similarity, but it really doesn't matter what
we (or anyone) will end up using the GO module for. If you provide
enrichment analysis *on top* of the parser + data structure (as a separate
module), and we provide semantic similarity (again as a separate module *on
top* of the parser + data structure) those are nice bonuses. But the parser
+ data structure should be as general as possible. That is: include all the
information in the OBO file, placed in a digraph structure that can be
comprehensively interrogated, visualized and manipulated (which is what nx
offers).



> Naturally, if anyone else wants to contribute their ideas or use-cases,
> you are most welcome...
>
> best
> Bartek
>
>
>
> On Mon, Jan 6, 2014 at 11:42 PM, Iddo Friedberg <idoerg at gmail.com> wrote:
>
>> I will meet with my student tomorrow (cc'd) and we can continue this
>> further.
>>
>> Osama: Bartek Wilczynski's group has been working on a OBO parser / GO
>> module too. Their parser seems complete & useful. Their digraph
>> implementation is not in networkx, so I'm not sure about adopting that as
>> is. In any case, let's meet tomorrow and talk, and maybe the four of us can
>> work out a collaborative plan if we feel it's useful.
>>
>> Cheers,
>>
>> Iddo
>>
>>
>> On Mon, Jan 6, 2014 at 5:26 PM, Bartek Wilczynski <
>> bartek at rezolwenta.eu.org> wrote:
>>
>>> Hi,
>>>
>>> I will meet with Kamil sometime this week and we will discuss options
>>> for switching to networkx or at least adding some compatibility layer for
>>> it. I think the information about the edge type is preserved in the DAG
>>> after parsing, so I'm not sure what you mean by "supporting" other types of
>>> relationships. Our interest was mostly in ontology term enrichment
>>> analysis, which Kamil implemented, and his version is also usable for
>>> parsing, but I think we are quite open to changes still at this point and
>>> I'm sure we will be able to come up with a good version merging the
>>> important features from both versions.
>>>
>>> best
>>> Bartek
>>>
>>>
>>> On Mon, Jan 6, 2014 at 11:17 PM, Iddo Friedberg <idoerg at gmail.com>wrote:
>>>
>>>> Hi Bartek,
>>>>
>>>> Thanks. I looked at it a bit.
>>>>
>>>> Any reason why you did your own digraphs instead of using networkx? See
>>>> also: http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph
>>>>
>>>> That said, it seems very mature. But the networkx does many functions
>>>> for plotting, interrogating and manipulating graphs.
>>>>
>>>> Your OBO parser seems quite mature. Are you planning to add other edge
>>>> types? (E.g. "part_of").
>>>>
>>>> What we are trying to do here, is set up (besides the parser & DAG
>>>> implementation) also a measure of DAG similarities. This is due to my
>>>> interest in assessing function similarity. That will be a separate module
>>>> (perhaps not even useful to Biopython).
>>>>
>>>> So now I am not sure what to do :/ The IO modules seem complete &
>>>> usable, but I would have rather seen a DAG implementation using networkx.
>>>>
>>>> Ideas?
>>>>
>>>>
>>>>
>>>> On Mon, Jan 6, 2014 at 4:59 PM, Bartek Wilczynski <
>>>> bartek at rezolwenta.eu.org> wrote:
>>>>
>>>>> Hi Iddo,
>>>>>
>>>>> My student has also recently implemented a module for ontologies.
>>>>> Maybe we can somehow merge these efforts. His code can be found here:
>>>>> https://github.com/tosterovic/biopython
>>>>>
>>>>> the relevant part is Bio/Ontology
>>>>>
>>>>> best
>>>>> Bartek
>>>>>
>>>>> On Mon, Jan 6, 2014 at 8:56 PM, Iddo Friedberg <idoerg at gmail.com>wrote:
>>>>>
>>>>>>  Hi all,
>>>>>>
>>>>>> Is there any effort going on for developing the OBO parser &
>>>>>> Bio-ontology
>>>>>> DAG? If not, my lab wants to push this. We already have a basic
>>>>>> representation using digraph from networkx, and a basic OBO parser.
>>>>>> But i'm
>>>>>> checking to see if there is no duplicate effort here.
>>>>>>
>>>>>> All very initial development.
>>>>>>
>>>>>> Parser:
>>>>>> https://github.com/idoerg/go-parser
>>>>>>
>>>>>> (the relevan module is
>>>>>> https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py
>>>>>>
>>>>>>
>>>>>> DAG:
>>>>>> https://github.com/osamajomaa/DAGON
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Iddo
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Iddo Friedberg
>>>>>> http://iddo-friedberg.net/contact.html
>>>>>>
>>>>>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.>
>>>>>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----.
>>>>>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>>
>>>>>> >>----.<--.>++++++.<<<<------------------------------------.
>>>>>> _______________________________________________
>>>>>> Biopython-dev mailing list
>>>>>> Biopython-dev at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bartek Wilczynski
>>>>> ==================
>>>>> Institute of Informatics
>>>>> University of Warsaw
>>>>> http://www.mimuw.edu.pl/~bartek
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Iddo Friedberg
>>>> http://iddo-friedberg.net/contact.html
>>>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.>
>>>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----.
>>>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>>
>>>> >>----.<--.>++++++.<<<<------------------------------------.
>>>>
>>>
>>>
>>>
>>> --
>>> Bartek Wilczynski
>>> ==================
>>> Institute of Informatics
>>> University of Warsaw
>>> http://www.mimuw.edu.pl/~bartek
>>>
>>
>>
>>
>> --
>> Iddo Friedberg
>> http://iddo-friedberg.net/contact.html
>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.>
>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----.
>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>>
>> >>----.<--.>++++++.<<<<------------------------------------.
>>
>
>
>
> --
> Bartek Wilczynski
> ==================
> Institute of Informatics
> University of Warsaw
> http://www.mimuw.edu.pl/~bartek
>



-- 
Iddo Friedberg
http://iddo-friedberg.net/contact.html
++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.>
++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----.
.>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>>
>>----.<--.>++++++.<<<<------------------------------------.



More information about the Biopython-dev mailing list