Paper
An archive of posts in this category
- Dropout - Review • 04/2025
- Rethinking Sequence-to-Sequence - Review • 04/2025
- Knowledge Distillation - Review • 04/2025
- Neural Probabilistic Language Model - Review • 04/2025
- Revisiting the 2014 Sequence-to-Sequence Paper • 04/2025
- DeepSeek GRM - Review • 04/2025
- RAG - Review • 04/2025
- Palatable Conceptions of Disembodied Being – Review • 04/2025
- On the Biology of a Large Language Model – Review • 04/2025
- Circuit Tracing – Review • 04/2025
- KAN - Review • 04/2025
- Llama 3 Paper - Review (Part 2) • 03/2025
- Llama 3 Paper - Review (Part 1) • 03/2025
- Scaling Laws Paper - Review • 03/2025
- DDPM – Review • 03/2025
- VQVAE – Review • 03/2025
- GAN – Review • 02/2025
- VAE – Review • 02/2025
- NeRF - Review • 02/2025
- DPO - Review • 02/2025
- RoFormer(RoPE) - Review • 02/2025
- Mamba - Review • 02/2025
- Structured State Spaces (S4) – Review • 02/2025
- Whisper – Review • 02/2025
- CLIP – Review • 02/2025
- DeepSeekMath - Review • 01/2025
- DeepSeekMoE - Review • 01/2025
- DeepSeek-V2 - Review • 01/2025
- ViT - Review • 01/2025
- BERT - Review • 01/2025
- BLT – Review • 01/2025
- Continuous Latent Reasoning for LLMs (COCONUT) - Review • 01/2025
- Let's Reproduce GPT-2 by Karpathy - Review • 01/2025
- Transformer Circuits(Anthropic) - Review • 01/2025
- Karpathy's "Let's Build GPT From Scratch" - Review • 01/2025