GAN – Review
This time, I’m reviewing the paper “Generative Adversarial Nets”, known as GAN. This classic 2014 paper by Ian Goodfellow and colleagues introduced adversarial training for generative models, using two networks, a generator and a discriminator.
The central concept is: a generator attempts to produce realistic samples to fool a discriminator, while the discriminator tries to distinguish real samples from generated ones. This competition pushes both models to improve, enabling the generator to produce realistic outputs without directly modeling complex probability distributions.
Key Concepts
Generator and Discriminator
GAN consists of two neural networks trained simultaneously. The Generator (G) learns to produce data that looks like real samples from random noise, and the Discriminator (D) learns to classify whether a given sample is real or fake. This interplay drives learning.
Minimax Game and Value Function
GAN training is framed as a minimax game, where D tries to maximize its accuracy, while G tries to minimize it. Mathematically, this interaction is captured by a value function involving two competing optimization steps, one ascending (for D) and one descending (for G).
Noise as Input (Latent Representation)
The generator takes random noise as input (often standard Gaussian noise), which it transforms into realistic data. This noise acts as a latent representation, similar in purpose to latent spaces in other generative models like VAE or diffusion models.
Key Takeaways (What I Learned)
Police vs. Counterfeiters Analogy
The authors’ analogy (police versus counterfeiters) makes the adversarial setup intuitive: the discriminator tries to catch counterfeit samples, while the generator improves to evade detection. The analogy clarifies the competitive dynamics and why both sides keep improving.
Why the Order of Optimization Matters
In theory, the discriminator should reach optimality before the generator updates. Practically, training typically alternates between discriminator and generator updates, often simplifying to a 1:1 step ratio.
The “Saturation” Problem
Initially, the idea of the generator’s gradient saturating (becoming ineffective) was unclear. The authors point out that if the discriminator becomes too strong too early, the generator’s gradients become nearly zero because it consistently outputs samples easily identified as fake. Understanding this clarified the importance of balancing the discriminator and generator strengths.
Noise as a Form of Latent Space
Initially, calling the generator input “noise” felt unintuitive. Noise serves as a random seed or latent code that gets mapped to structured data, introducing randomness into an otherwise deterministic network and enabling continuous generation and interpolation in the latent space.
Interpolation and Connection to VAEs
GANs achieve interpolation despite the absence of a clearly defined encoder (unlike VAEs). VAEs explicitly model latent spaces to allow interpolation, but GANs achieve this indirectly. Because the generator learns from the discriminator’s feedback rather than directly fitting discrete data points, it learns a continuous representation that supports meaningful interpolations, without explicitly modeling complicated distributions.
Simplicity of Theoretical Results
The theoretical results, particularly the global optimality condition (pg = pdata), are straightforward. By defining an optimal discriminator, the proof shows a Jensen-Shannon divergence between the true and generated distributions.
GANs vs. VAEs
Reflecting on VAE and GAN together: VAE explicitly approximates intractable distributions, while GAN sidesteps this via the adversarial setup. The appeal of GAN is its simplicity paired with strong discriminative feedback, which the generator leverages during training.
Summary & Final Thoughts
“Generative Adversarial Nets” introduced a framework that pits generators against discriminators to produce realistic data without explicitly modeling complex distributions. The combination of an intuitive training game and a clear theoretical link to divergence minimization helps explain why GANs became foundational in generative modeling.
Exploring GAN alongside VAE deepened my understanding of how different approaches tackle generative tasks.