Another month, another batch of interesting links I’ve come across. Here’s what caught my attention in February 2025.


DeepSeek, China, OpenAI, and AI Megaclusters - Lex Fridman Podcast #459

  • I first encountered Dylan Patel on the Dwarkesh podcast with Asianometry, so I was glad to see him on Lex Fridman’s. He specializes in analyzing AI hyperscalers, so his insights are incredibly in-depth. Basically, highly recommended if you’re interested in the infrastructure side of things.

How to Scale Your Model

  • This is published by a group of researchers at Google DeepMind, particularly Sholto Douglas. It’s a step-by-step guide to scaling up compute using infrastructure, specifically TPUs. I’ve been working my way through it, and it starts with the very basics but gets pretty complicated quickly. It’s refreshing to see closed AI labs offering these “breadcrumbs” to the open-source community, sharing their knowledge.

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

  • Similar in nature to the DeepMind scaling book, but this one’s from the Hugging Face team. It’s also very detailed, with an interactive guide, and, I should say, quite a mouthful.

Thinking like Transformer

  • The author created a Python library and a detailed blog post explaining how computation actually works within a transformer network. He also developed an abstract conceptual framework for it. It’s really complicated—I still don’t fully grasp it—but it’s nonetheless fascinating.

Gwern - Anonymous Writer Who Predicted AI Trajectory on $12K/Year Salary

  • This is from the Dwarkesh Podcast. He introduces a blogger named Gwern, and you really need to read his work to understand. He has an extensive blog called gwern.net. His posts are incredibly in-depth and well-researched, more like separate articles. The sheer depth is staggering, unlike anything I’ve encountered, even in reputable sources like the New York Times. I think this guy is incredibly capable. It also made me think about how the current internet is dominated by big tech. But in the pre-big tech era, I heard there were these niche, individual creators posting unique content. Gwern feels like a holdover from that era.

The Melancholy of Subculture Society - Gwern

  • Gwern’s writing is extensive and very long, so it requires dedicated time. This is one piece I read, and it’s a musing on how the internet, as technology evolves, is kind of segregating society, and the state of things. I think it’s a well-thought-out piece.

2024 Letter - Zhengdong Wang

  • This blog post, written by a researcher at Google DeepMind, makes a bombshell statement. Regarding the current state of their large language model research and machine learning models, he boldly claims that the models will essentially achieve anything if the evaluations are clearly defined. So, if there are evals, any evals, the model will succeed. It’s an audacious statement, but the implications are mind-blowing. I should probably write a dedicated blog post about this.

Sesame Research - Crossing the Uncanny Valley of Voice

  • This startup called Sesame revealed a really interesting voice-to-voice language model. They have a demo, and it works quite well. The key difference between Sesame and ChatGPT’s Advanced Voice Mode is that the model produces nuanced tones, like “ums” and “ahs,” very naturally. It feels incredibly realistic. For the first 10 minutes, I was blown away. But as I probed the model further, it became clear it doesn’t possess the same level of intelligence as LLMs. The same goes for ChatGPT’s Advanced Voice Mode; OpenAI seems to have significantly limited the model to comply with their guidelines. In Sesame’s case, I suspect the limitations are purely due to model and computational constraints.


Hope you enjoyed this month’s picks—March should bring even more to explore!