Skimmed quickly the paper. This does not look like RL. It's a genetic algorithm. In a previous life I was working on compbio (protein structure prediction), we built 100s of such heuristic based algorithm (monte carlo simulated annealing, ga..). The moment you have a good energy function (one that provide some sort of gradient), and a fast enough sampling function (llms), you can do looots of cool optmization with sufficient compute.
> This does not look like RL. It's a genetic algorithm.
couldn't you say that if you squint hard enough, GA looks like a category of RL? There are certainly a lot of similarities, the main difference being how each new population of solutions is generated. Would not at all be surprised that they're using a GA/RL hybrid.
This depends quite a bit of what you’re trying to optimize.
Gradient descent is literally following the negative of the gradient to minimize a function. It requires a continuous domain, either analytical or numerical derivatives of the cost function, and has well-known issues in narrow valleys and other complex landscapes.
It’s also a local minimization technique and cannot escape local minima by itself.
_Stochastic_ gradient descent and related techniques can overcome some of these difficulties, but are still more or less local minimization techniques and require differentiable and continuous scoring functions.
In contrast, genetic algorithms try to find global minima, do not require differentiable scoring functions, and can operate on both continuous and discrete domains. They have their own disadvantages.
Different techniques for different problems. The field of numerical optimization is vast and ancient for a reason.
I guess that's now becoming true with LLMs.
Faster LLMs -> More intelligence