[Biopython-dev] Pathway Module

Wed Aug 1 02:02:41 EDT 2001

Hi,

thanks to Cayte for taking the initiative on getting a Pathway module
discussion going. Below are my ramblings on what I think such a module
should be like. This is all off the top off my head, so any feedback
would be greatly appreciated.

First off all I think it is an useful exercise to consider what kind of
tasks would benefit from the availability of reaction/pathway classes.
I can think of the following:

 * Elementary mode analysis and MCA.
   - Involves converting a set of reactions to a stochiometry matrix
 * Mapping genes clustered by location or expression to pathways
 * Route queries (how can we transform A to B given a set of enzymes?)
 * Neighborhood queries (which enzymes are k-separated from enzyme Y?)
   - All three of these focus on the graph structure of the pathways.
 * Dynamic simulations

The last task is beyond the scope of anything we could do on this project.
Not only because of the technical challenges, but also because the lack of
information about kinetics. There is a fair amount of kinetic information in
databases like EMP and Brenda, but these numbers are extremely context
specific and irregular. I therefore think that information like reaction
temperature, free energies, experimentally determined kinetics, and even
which organism a reaction has been observed in are best left in the Record
objects of the individual database modules.

I think the core of a biopython pathway module should be a relatively
lightweight abstraction for pathway connectivity, and not much more.
Below is a quick description of what I imagine it could look like.
Note that this is a description of an *abstraction*, not a python
*implementation*.

CLASSES:

Species:
 - A very light class for representing any biochemical species that
   are present in the system we're interested in. It could be a small
molecule,
   an enzyme, whatever.

 a unique name or id              - identifies what this species is (EC
number,
                                    CAS number, something like that)
 a user-defined reference          - ref to object containing further
information,
                                    probably an appropriate Record

Reaction:
 - Represents any biochemical transformation that can take place in the
   system, such as an enzymatic reaction, or a spontaneous transformation.

 a set S of Species objects       - the substrates
 s set P of Species objects       - the products
 a set E of Species objects       - the enzymes
 a set F of species objects       - the factors (cofactors, effectors,
                                    inhibitors?)

System:
 - Represents the biochemical system we're interested in. It is essentially
   a directed multi-graph were the vertices are Species and the edges are
   labeled with references to the reaction that links the parent vertex
   to the child vertex.

 a set V of Species objects       - these are all biochemical species in
this
                                    system, including metabolites, enzymes
and
                                    whatnot

 a set E of tuples (from, to, reaction)
   where from, to refer to elements in V and reaction is a
  (not necessarily unique) Reaction object where from is
   a substrate and to is a product.
                                  - these are the 'edges' that collectively
                                    define a multi-graph representing the
                                    network connectivity

So for example, in as system with Species A,B,C,D,E and one Reaction
R1: A + B -E-> C + D, the System object would be

S1:
 V = {A,B,C,D,E}
 E = {(A,C,E), (A,D,E), (B,C,E), (B,D,E)}

USAGE:

 This is a collection of imagined user interactions with the pathway module:

 First we create a bunch of Species objects which refer to descriptions of
them,
 such as KEGG or WIT records. This step will usually happen inside a
database
 parser:

 A = Species('A',ref1)
 B = Species('B',ref2)
 C = Species('C',ref2)
 ...

 Then we create any Reaction objects. This will also usually happen inside a
 parser module:

 R1 = Reaction(name='smelly',substrates=[A,B],enzymes=[E],products=[C,D])
 R2 = Reaction(name='decay',substrates=[C])
 R3 = R1.reverse()

 It should be easy to create a System object from a collection of
 Reactions. Connectivity should be inferred automatically when several
 reactions are combined:

 >>>S = System()
 >>>S.add_reaction(R1)
 >>>S.add_reaction(R2)
 >>>repr(S.species())
 [Species('A'), Species('B'), ..., Species('E')]

 We might be interested in only some of the species:

 >>>repr(S.enzymes())
 [Species('E')]
 >>>repr(S.metabolites())
 [Species('A'), Species('B'), Species('C'), Species('D')]

 Other useful information:

 >>>S.stochiometry()
 [[-1 -1 1 1], [0 0 -1 0]]

 Putting the information to use:

 flux analysis:

 >>>import Bio.Pathway.Metatool
 >>>Metatool.find_elementary_modes(S, exterals=[A,D])
 ...Metatool output...

 neighborhood queries:

 >>>import Bio.Pathway.Graph
 >>>Graph.find_neighbours(S, E1, separation=3)
 [[E2, E3], [E4], []]

 ..and so on. You get the picture.

Appendix :) - reply to Cayte:

> Step is separate from reaction, because a reaction could occur in
> more than one pathway.

I'm not sure I see the rationale for this. It is true that a reaction
can occur in several pathways, but unless there is information about a
reaction that only applies to a specific pathway there is no need to
keep a separate Step object - you can just let two different pathway
objects reference the same reaction object.

> There may be other information associate with reaction, like
> temperature, but I haven't come across it yet in the WIT or
> EMP databases.

As I said above, I don't think we should represent kinetics and
other "volatile" information in the core pathway objects.

 - Tarjei