As per our email, our Message Passing algorithm currently works on a Grid. Message passing is the most effective when running over a grid. Just a brief description of our Message Passing algorithm:
The reduction operation performed is the $f_{min}$ which the message passing technique exploits, keeping track of the current minimum across dimensions and computing the least operations to get $f_{min}$
Using Message Passing yields 2 major benefits:
DIRECT relies on the definition of potentially optimal rectangles. An interval (rectangle, at a particular side) $j$ is potentially optimal if $$ \forall i \in I_3 : f(c_j) \leq f(c_i) $$ where $I_3$ is the set of indices of size $d_j$. In other words, the intervals for which is the $f_{min}$s in size $d$ are potentially optimal.
DIRECT considers potentially optimal rectangles by looking at the various optimal rectangles across various sizes (which in our zooming language - zoom level). At every iteration,
Note that if we have $d$ dimensions, it will be guaranteed that after $log_3(N)^d$ iterations that space explored would be equivalent to $N^d$hypercube grid
I think there are two ideas to resolve the problem:
The key idea is not to compute the min at every step. Which means we compute the messages to be $f_{a,b}$ and not reduce the messages and keep the messages stored inside the junction nodes.
Then, the key modification would be to consider is to propagate the marginal set of points with their associated values, then we use the messages to compute the desired f(x) after all messaged are computed or when appropriate. Even though this method seems like there would be no benefit compared with just evaluating f(x) itself, there can be a computational benefit in the long run:
Consider your example, in addition we assume that the nodes are a path graph with edges $[(1,2), (2,3), ... , (d-1, d)]$.
In computational terms, the ratio of compute time for an equivalent $F$ using $f(d-1)$ is $f(d-1) \sim F1.5$ (measured). The ratio gets better as the graph is complex, the computation here assumes a simple path graph as menioned.
Considering each additional iteration,
Here, we assume that the message passing compute is negliable.
After $k$ additional iterations:
There would be a certain $k$ when MP would be faster, as for every iteration $(2f < 2F)$.
Very similar to the zooming technique, we adapt the DIRECT technique - instead of splitting up the selected rectangle only on the longest side we explore the best voxel of every size. For example:
Let $Li$ be the set of voxels at level $i$. Let $Li_j$ be the best j-th voxel in $Li$, best is being defined by $f_{min}$.
So, in every iteration we explore all of the potentially optimal voxel:
and so on... Note that the number of voxels explored in every iteration is linear to the iteration number.
Computation of the best voxel is clear and can be computed directly using our current message passing setup. However, it is still not clear if there is a way to compute the next best (and subsequent) voxels efficiently.
Contrasting to our Zooming technique, this technique continues to explore the current best voxel in every level not just the best hypercube like what we are doing now. I think with this technique it would allow us to avoid being too greedy and miss a better optima. The shortcoming would be that we consume much more computation as in every iteration we commit to explore exponentially more voxels in the entirety of the algorithm.
For this to be possible, the math behind DIRECT needs to work with exploring hypercubes instead of rectangles and perhaps also inherit the other properties.(I havnt taken a deep look at this yet). My intuition would be that it would be similar as in the vanilla version of DIRECT the full grid of a space would be also reached after $log_3(N)^d$ iterations.
Also, this would be parallelizable.