Link Archive - Feb 2025

Another month, another batch of interesting links I’ve come across. Here’s what caught my attention in February 2025.

I first encountered Dylan Patel on the Dwarkesh podcast with Asianometry, so I was glad to see him on Lex Fridman’s. He specializes in analyzing AI hyperscalers, so his insights are in-depth. Recommended if you’re interested in the infrastructure side of things.

This is published by a group of researchers at Google DeepMind, particularly Sholto Douglas. It’s a step-by-step guide to scaling up compute using infrastructure, specifically TPUs. I’ve been working my way through it, and it starts with the very basics but gets pretty complicated quickly. Good to see closed AI labs offering these “breadcrumbs” to the open-source community, sharing their knowledge.

Similar in nature to the DeepMind scaling book, but this one’s from the Hugging Face team. It’s also very detailed, with an interactive guide, and quite dense.

The author created a Python library and a detailed blog post explaining how computation actually works within a transformer network. He also developed an abstract conceptual framework for it. It’s complicated—I still don’t fully grasp it—but it’s fascinating.

This is from the Dwarkesh Podcast. He introduces a blogger named Gwern, and you really need to read his work to understand. He has an extensive blog called gwern.net. His posts are in-depth and well-researched, more like separate articles. The depth stands out, unlike much of what I’ve encountered, even in reputable sources like the New York Times. I think this guy is very capable. It also made me think about how the current internet is dominated by big tech. But in the pre-big tech era, I heard there were these niche, individual creators posting unique content. Gwern feels like a holdover from that era.

Gwern’s writing is extensive and very long, so it requires dedicated time. This is one piece I read, and it’s a musing on how the internet, as technology evolves, is kind of segregating society, and the state of things. I think it’s a well-thought-out piece.

This blog post, written by a researcher at Google DeepMind, makes a strong statement. Regarding the current state of their large language model research and machine learning models, he claims that the models will essentially achieve anything if the evaluations are clearly defined. So, if there are evals, any evals, the model will succeed. It’s an audacious statement, and the implications are significant. I should probably write a dedicated blog post about this.

This startup called Sesame revealed an interesting voice-to-voice language model. They have a demo, and it works quite well. The key difference between Sesame and ChatGPT’s Advanced Voice Mode is that the model produces nuanced tones, like “ums” and “ahs,” very naturally. It feels very realistic. For the first 10 minutes, I was impressed. But as I probed the model further, it became clear it doesn’t possess the same level of intelligence as LLMs. The same goes for ChatGPT’s Advanced Voice Mode; OpenAI seems to have significantly limited the model to comply with their guidelines. In Sesame’s case, I suspect the limitations are purely due to model and computational constraints.

Hope you enjoyed this month’s picks—March should bring even more to explore!