Diffusion models are evolutionary algorithms

upghost 5 hours ago | next |

This article uses the term "evolutionary algorithm" far too narrowly and it is causing a lot of confusion in the comments. I would STRONGLY recommend checking out the book "Evolutionary Optimization Algorithms" by Dan Simon[1]. It is incredibly readable. The type being referred to in the article is the classic "Genetic Algorithm" variant of evoluationary algorithms. But genetic programming, evolutionary programming, simulated annealing, ant colony optimization, particle swarm optimization, differential evolution, estimation of distribution algorithms, biogeography-based optimizations, cultural algorithms, and opposition-based learning algorithms are just a few examples of other types of evolutionary algorithms.

In general, they are a great approach to solving any non-convex optimization problem where gradient descent is not a practical choice.

[1]: https://books.google.com/books/about/Evolutionary_Optimizati...

bob1029 4 days ago | prev | next |

I have a hard time with the analogy due to how important population dynamics and solution diversity are to evolutionary algorithms.

In an EA, each candidate in the population represents a complete potential solution. As the diversity & size of the population increases, the potential for convergence on high quality solutions also increases. I do not see the same concept in diffusion models.

dawnofdusk 9 hours ago | root | parent | next |

>I do not see the same concept in diffusion models.

I believe,

Population size = noise dimensionality

Population diversity = noise magnitude

amelius 6 hours ago | root | parent |

But iiuc, an EA algorithm needs to keep an entire population in memory at once.

I don't think this is the case for diffusion models.

bob1029 6 hours ago | root | parent |

You can maintain a population of arbitrary size when using approaches such as tournament selection. Each worker can pull a subset of the population from another datacenter and run an isolated tournament. You also don't have to run the entire population each iteration. You can keep win-loss statistics and handle final selection/crossover/mutation in a separate thread once enough have accumulated.

Put more simply, run it like chess and use things like Elo rating to determine fitness at selection time.

I’ll add to your observation about them being complete solutions. Also, since the author thinks they’re like NN’s, we’ll do a direct comparison where members are a complete solution. Here’s an evolutionary algorithm:

1. A series of NN models with random weights.

2. A way to score how well each NN model works.

3. Iteratively mixing pieces of models with each other based on how well they scored.

I didn’t see that in articles about deep learning at all. It was like they were totally, different things. They were so different that I looked into using both GA’s and simulated annealing for optimization of NN’s.

I’d say that EA’s and NN-like methods are different and complementary.

I think that evolutionary algorithms, updates are by natural selection, diffusion models, updates by denoising. Perturbations in evolutionary algorithms using mutations, difusion model is done with difusion process.

Seems cool!

wakawaka28 4 hours ago | root | parent | prev |

I agree with you. I suppose you could simulate having a large population by maintaining a set of snapshots, but it still seems like not the same thing as a classic evolutionary technique. There is no sane way to do things like crossover I guess, unless you can average the weights from multiple snapshots (a strategy that might not work at all IMO).

I think I have seen research papers addressing the crossover of neural network optimization and EA, but I think direct optimization techniques have been preferred for managing the actual weights. Manipulating the input data is more along the lines of perturbation methods. If you look hard enough you can probably see whatever you want but it's a stretch.

To me, neural network training is a distinct task that draws from many areas of optimization and does not fit neatly under any one umbrella.

jcgrillo 2 hours ago | root | parent |

Admittedly I haven't dug too deep into the references, including the paper the article refers to--and I'm not too well versed in diffusion models either. Those caveats aside, I'm also having trouble making the analogy, and similarly for me the hangup is about crossover. If I recall correctly, what distinguishes evolution from, say, simulated annealing at varying temperatures, is precisely the crossover operator. It's not obvious where crossover is happening here.

throwaway314155 7 hours ago | prev | next |

Michael Levin's work is fascinating. Seems there's no field he can't help contribute to from a biological perspective.

SubiculumCode 9 hours ago | prev | next |

In the end. Linear Regression.

adamnemecek 4 days ago | prev |

They are all bialgebras.

will_byrd 3 days ago | root | parent | next |

Interesting comment. Would you mind expanding on that observation? Are there any references you'd suggest looking at that help make the connection more clear? Thank you!

[deleted]

tbalsam 3 days ago | root | parent | prev |

I saw this comment, thought, "yeah this reminds me of Adam Nemeck", and lo and behold.