Teaching Pong to Play Itself: My First Neural Network Experiment
I wanted to experiment with basic learning AI to understand it better. So I taught Pong to play itself.
Generation 0: 0% win rate, random paddle movement. The AI was just mashing buttons and hoping for the best.
Generation 50: 98% win rate, perfect predictive tracking. The AI wasn’t just reacting — it was anticipating where the ball would go before it got there.
That progression happened through NEAT (NeuroEvolution of Augmenting Topologies). The system evolves both the weights and the topology of neural networks, starting from minimal complexity and adding structure only as needed.
The honest part: I didn’t understand why the novelty search mattered at first. Pure ELO optimization made all agents converge to a single “safe” strategy. They’d all play the same way, and no one discovered anything interesting.
The breakthrough was adding novelty rewards — giving points for unique behaviors, not just winning. That maintained population diversity and enabled the discovery of non-obvious, superior solutions. Suddenly agents were trying weird strategies that actually worked.
What didn’t work: Trying to make the training too fast initially. I cranked up the mutation rate and the population collapsed into chaos. The agents were changing so fast they couldn’t build on successful strategies. Slowing it down made the evolution actually meaningful.
The lesson: Diversity isn’t just nice to have — it’s essential for discovery. Without it, optimization just finds the local maximum and stops.