Mmunin: Integer-linear-programming-based cutting-plane algorithm for shortest source-sink hyperpaths
Overview
Cell signaling pathways are cornerstones of molecular and cellular biology. They underly cellular communication, govern environmental response, and their perturbation has been implicated in the cause of many diseases. While signaling pathways have classically been modeled as ordinary graphs, using directed or undirected edges to link pairs of interacting molecules, it has been shown that ordinary graphs cannot adequately represent cellular activity that involves the assembly and disassembly of protein complexes, or multiway reactions among such complexes. We instead model these pathways using directed hypergraphs, a generalization of ordinary graphs where an edge, now called a hyperedge, is directed from one set of vertices, called its tail, to another set of vertices, called its head.Biologically, a typical cell-signaling pathway consists of membrane-bound receptors that bind to extracellular ligands, triggering intracellular cascades of reactions, culminating in the activation of transcriptional regulators and factors. Computationally, treating receptors as sources, and transcription factors as targets, finding the most efficient way to synthesize a particular transcription factor from a set of receptors maps to the shortest hyperpath problem we consider here: Given a cell-signaling network whose reactants and reactions are modeled by the vertices and weighted hyperedges of a directed hypergraph, together with a set of sources and a target, find a hyperpath consisting of hyperedges from the sources to the target of minimum total weight.
Methods
We formulate Shortest Hyperpaths as an integer linear program (ILP) based on a characterization of the problem in terms of superpaths. We solve this ILP with a cutting plane algorithm that starts with an initial set of constraints that are a subset of those in the full ILP, and iteratively adds violated constraints until a hyperpath is returned, which will always be provably optimal.Integer linear program
The variables of our ILP encode the hyperedges in a superpath \(F\). For every hyperedge \(e \in E\), there is a variable \(x_e\), which takes on integer values \(x_e \in \{0,1\}\). An assignment of values to these variables encodes a superpath \(F\) by \(x_e =1\) iff \(e \in F\). We represent the collection of all variables in the ILP by a vector \(x = (x_e) e \in E\), where \(x \in \{0,1\}^{|E|}\). The constraints consist of one constraint for each \(s,t\)-cut, encoded by ensuring the indicator variables for edges crossing the cut sum to a value greater than 1.
Cutting plane algorithm
For a hypergraph of \(n\) vertices and \(m\) hyperedges, the integer linear program given by (4) above has \(m\) variables and \(\Theta (2^n)\) constraints (corresponding to the number of \(s,t\)-cuts). Thus for a large hypergraph, we cannot even feasibly write down the corresponding ILP, due to its exponentially-many constraints. Nevertheless, we can actually compute optimal solutions to this full ILP in practice, even for very large inputs, using an approach known as a cutting plane algorithm. A cutting plane algorithm computes over a subset of the constraints of the full integer linear program, and solves a series of less-constrained problems, stopping once it detects it has a solution to the full ILP.
Initial constraints
The cutting plane algorithm is seeded with a set of initial cut constraints based on the estimated distance to each vertex from the source (calculated by a hyperpath heuristic Hhugin). In addition to these distance-based cuts, we also add cuts that capture connectivity, minimality, and ensure the target is produced. These initial constraints significantly reduce the running time of the cutting plane algorithm.
Results
Our cutting plane algorithm is remarkably fast in practice, with a median running time under 10 seconds, as measured over all instances from Reactome.
Video
The following video was presented at RECOMB 2023 and gives more detailed information on the algorithms implemented in MmuninPublications
The methods implemented in Mmunin are given in the following publication, which should be cited under noteworthy use of Mmunin- Spencer Krieger. Algorithmic inference of cellular reaction pathways and protein secondary structure. Department of Computer Science, The University of Arizona, July 2022. link
- Spencer Krieger and John Kececioglu. “Computing shortest hyperpaths for pathway inference in cellular reaction networks.” Proceedings of the 27th International Conference on Research in Computational Molecular Biology (RECOMB), Springer LBNI 13976, 155-173, 2023. link
Source code
Source code for Mmunin along with documentation will be available on GitHub shortly.The list of extra sources that were added during the pathway reconstruction experiment is available here.