Ever since ChatGPT was released, I’ve been pondering a concept that’s been floating around in my mind rather disjointedly. Today, triggered by a new input, these thoughts seem to be starting to coalesce, though I’m still very much in the process of working through these ideas. The notion I’ve been grappling with is this: Verification is much easier than generation It’s a simple statement, but I think its underlying meaning is much more profound than one might think. The RLHF Connection This concept appears to be at the core of Reinforcement Learning from Human Feedback (RLHF). In RLHF, when an AI model outputs candidate answers, humans step in to verify which ones are good or bad. It’s not about humans creating the perfect answer from scratch — it’s more about humans judging the quality of machine-generated outputs. The AI is then trained on a reward model that’s built from this human feedback data. This process suggests that while creation — coming up with novel, high-quality outputs — is incredibly challenging, the act of critiquing or verifying these outputs is relatively more straightforward for humans. The Haiku Example To illustrate this principle more concretely, let’s consider an example that Andrej Karpathy has used. Imagine we’re tackling a task to generate haikus: In a traditional supervised learning approach, we’d need human annotators to manually create high-quality haikus from scratch. This seems like quite a challenge — creating good poetry isn’t easy! But with RLHF, the process looks quite different. The model generates a bunch of haikus, and humans simply need to verify their quality. They don’t need to be poets themselves; they just need to recognize good poetry when they see it. This approach makes it much easier and more efficient to generate useful datasets because, let’s face it, everyone can be a critic, but few can create truly good content. It’s the difference between recognizing a masterpiece and painting one yourself. A Broader Perspective I’ve been mulling over this phenomenon for a while, trying to find ways to generalize it beyond just AI training. Today, I came across a perspective from Alexander Wang, the CEO of Scale, a company that focuses on generating datasets for AI companies. Wang’s explanation of the significance of RLHF really struck a chord with me. Here’s what he said: And what RLHF means is that this allows the models to actually exceed human performance in a lot of cases because it’s kind of like how every human in the world can be a movie critic, but almost none of us can make a movie. So each of us can say ways in which a movie could be better or could be improved, but obviously I can’t make a movie. In the same way, if humans can teach the model what better looks like and how to improve, then the model can keep improving even far beyond what human capability is. This analogy to movie criticism is intriguing. It seems to capture an essence of why RLHF might be so powerful — it could be leveraging our ability to judge quality, even in domains where we might not be able to produce high-quality work ourselves. The Gradient Descent Analogy As I’ve been pondering Wang’s words, I’ve been trying to frame this concept in terms that align with my understanding of machine learning. One way to think about it might be in terms of gradient descent, but with an interesting twist. Imagine a high-dimensional loss landscape with a global minimum — that’s our goal, the optimal performance we’re trying to reach. In traditional gradient descent, we’re working at a low level, directly manipulating the loss function. But the verification process in RLHF seems to be operating at a higher level. It’s almost like a meta gradient descent, if you will. Instead of directly adjusting parameters based on a predefined loss function, we’re using human feedback data to guide the overall direction of improvement. It’s a higher level of nudging, where humans are providing broad directional guidance rather than fine-grained adjustments. I think the really challenging part, both for models and humans, is actually moving the point downwards — the act of descending the gradient. This is where brute force and loads of GPU computation comes in, requiring massive matrix multiplications to make these incremental improvements. Models excel at this part, performing it reliably and efficiently in a way that humans simply can’t match. Thoughts If we humans can consistently nudge a model in the right direction through our verification process, we can guide it towards superhuman performance. This shows the crucial role of verification in pushing the boundaries of AI capabilities. What’s particularly fascinating is that models can reach superhuman performance while humans can’t, yet we can still verify and critique their outputs without possessing that level of intelligence ourselves. It’s like being able to recognize a masterpiece without having the ability to paint one. This asymmetry between creation and verification is what makes RLHF so powerful. And the big question that keeps nagging at me is: how large is this intelligence asymmetry? How far can RLHF push the boundaries before the discriminator (us humans or AI) can no longer effectively improve the creator (the AI)? As I continue to mull over these ideas, I’m looking for different angles to approach this concept. There’s still so much to unpack about the interplay between human verification and AI creation, and I’m excited to see where these thoughts lead me next.