Datasets
We found optimal hyperpaths with Mmunin on instances from the NCI-PID and Reactome datasets.
The NCI-PID dataset aggregates all pathways from NCI-PID, and the Reactome dataset aggregates all pathways from Reactome.
To build a hypergraph for each dataset, we map each protein or small molecule to a vertex and each reaction to a hyperedge, with reactants in the tail and products in the head.
This results in 9,009 vertices and 8,456 hyperedges for the NCI-PID hypergraph, and 20,458 vertices and 11,802 hyperedges for the Reactome hypergraph.
For all instances, we create a supersource s and add a single hyperedge e with only s in its tail and a head containing all vertices with no in-edges (which we call our sources).
Then, for each individual instance, we connect one target v (a vertex with no out-edges) to a sink t by an ordinary graph edge (v, t). This results in 2,636 NCI-PID instances and 5,066 Reactome instances.
Running time
Mmunin is remarkably fast in practice, with a median running time under 10 seconds.
The maximum running time over all ~7,000 instances is under 30 minutes, demonstrating that it is now feasible to find optimal general shortest hyperpaths very fast, even for large instances with over 10,000 hyperedges.
We note that for instances of pathway reconstruction, the running time increased to around 24 hours for instances with twice the number of double-reachable edges.
Source code
Source code for Mmunin along with documentation is available on GitHub.