GAN – Review | Hun Tae Kim

This time, I’m reviewing “Generative Adversarial Nets”, the original GAN paper. This 2014 paper by Ian Goodfellow and colleagues introduced adversarial training for generative models using two networks: a generator and a discriminator.

The central idea is simple: a generator tries to produce realistic samples that fool a discriminator, while the discriminator tries to distinguish real samples from generated ones. This competition pushes both models to improve, allowing the generator to produce realistic outputs without directly modeling complex probability distributions.

Key concepts

Generator and discriminator

GAN consists of two neural networks trained simultaneously. The generator (G) learns to produce data that looks like real samples from random noise, and the discriminator (D) learns to classify whether a given sample is real or fake. This interaction drives learning.

Minimax game and value function

GAN training is framed as a minimax game, where D tries to maximize its accuracy, while G tries to minimize it. Mathematically, this interaction is captured by a value function involving two competing optimization steps, one ascending (for D) and one descending (for G).

Noise as input (latent representation)

The generator takes random noise as input (often standard Gaussian noise), which it transforms into realistic data. This noise acts as a latent representation, similar in purpose to latent spaces in other generative models like VAE or diffusion models.

What I learned

Police vs. counterfeiters analogy

The authors’ analogy (police versus counterfeiters) makes the adversarial setup intuitive: the discriminator tries to catch counterfeit samples, while the generator improves to evade detection. The analogy clarifies the competitive dynamics and why both sides keep improving.

Why the order of optimization matters

In theory, the discriminator should reach optimality before the generator updates. Practically, training typically alternates between discriminator and generator updates, often simplifying to a 1:1 step ratio.

The “saturation” problem

Initially, the idea of the generator’s gradient saturating, or becoming ineffective, was unclear. The authors point out that if the discriminator becomes too strong too early, the generator’s gradients become nearly zero because it consistently outputs samples that are easy to identify as fake. This clarified why balancing the discriminator and generator matters.

Noise as a form of latent space

Initially, calling the generator input “noise” felt unintuitive. Noise serves as a random seed or latent code that gets mapped to structured data. It introduces randomness into an otherwise deterministic network and enables continuous generation and interpolation in latent space.

Interpolation and connection to VAEs

GANs achieve interpolation despite the absence of a clearly defined encoder, unlike VAEs. VAEs explicitly model latent spaces to allow interpolation, but GANs do this indirectly. Because the generator learns from the discriminator’s feedback rather than directly fitting discrete data points, it learns a continuous representation that supports meaningful interpolation.

Simplicity of theoretical results

The theoretical results, particularly the global optimality condition (pg = pdata), are straightforward. By defining an optimal discriminator, the proof shows a Jensen-Shannon divergence between the true and generated distributions.

GANs vs. VAEs
Reflecting on VAE and GAN together: VAE explicitly approximates intractable distributions, while GAN sidesteps this through the adversarial setup. The appeal of GAN is its simplicity paired with strong discriminative feedback.

Summary

“Generative Adversarial Nets” introduced a simple but powerful setup: train a generator against a discriminator and let the competition shape the generator’s output. The theoretical link to divergence minimization helps explain why the idea became so influential.

Reading GAN alongside VAE helped me understand how different generative models avoid or approximate difficult probability distributions.