Thoughts on o3, o4 mini

So, here are my thoughts on OpenAI’s release of o3 and o4 mini. My short reaction: strong models, but a mixed bag. It’s easy to get caught up in the hype, and the release really is impressive in some ways, but there are some nuances worth separating out.

For a quick recap: OpenAI released o3 and o4 mini, new variants of their reasoning models specifically trained to use tools and act more agentically. When you see the demos and people’s use cases, it really is fantastic. It has more of a “person feel.” I’ve used it myself, and compared to previous models that mostly did research by fetching and analyzing web information, o3 and o4 mini feel much more agentic. Previous models could use function calling, but often in a separate, isolated way. These models seem to parse information, act on it, and use tools such as the command-line interface more fluidly. In that sense, they are really capable models.

Context and comparison

But for me, after the initial impression settled, it felt like OpenAI basically released their version of Anthropic’s Claude 3.7 Sonnet, which is good at agentic tasks. Because of its agentic capabilities, 3.7 Sonnet became a go‑to enterprise solution, especially for agent‑coding IDEs. While OpenAI’s new models arrived two or three months later and are arguably better, the fundamental paradigm hasn’t drastically changed.

Another thing I noticed was the ChatGPT interface itself. When serving o3, OpenAI enabled function calling and tool use by default. This means the models can readily use capabilities like data analysis, the coding environment, web search, and other tools OpenAI has integrated into ChatGPT. All this combined gives the models a much broader action space, and that has a combinatorial effect on capabilities.

Previously, if I wanted to research a topic while studying, I’d usually paste my content into the chat and query the model. Now, since the model can search the web itself, and it feels less like simple RAG and more like it is fitting fetched information into its context, I don’t have to provide as much relevant context manually. It feels more reliable in that sense. The capability of the chat interface itself has changed.

OpenAI as a product company

This led me to another thought: the overall trajectory of OpenAI as a company.

When I saw how OpenAI packaged these models and tools inside the chat interface, it clicked for me: as Sam Altman had said, OpenAI is now officially a product company. In hindsight, given where their revenue comes from, maybe they always were.

If you compare their revenues to Anthropic’s, OpenAI is the market leader, partly due to the network effect of being the first mover. But most of OpenAI’s revenue comes from user subscriptions, with a smaller fraction from API usage. Anthropic is almost the opposite: most of its revenue comes from API usage, though even that API revenue lags behind OpenAI’s API revenue. This suggests subscriptions are more popular than API access, at least for OpenAI.

Then it makes sense for OpenAI to prioritize products because intelligence is becoming cheap, almost too cheap to meter. Model API costs are racing to the bottom, leaving thin margins, especially with competitors like Google, Anthropic, xAI, and others. Subscriptions are much cleaner cash.

So what do I mean by OpenAI acting as a product company? As I mentioned, o3/o4 mini feel like a better version of Claude 3.7 Sonnet. There’s a qualitative jump, I’m not denying that, but the bigger thing is the delivery.

When Claude 3.7 Sonnet launched, Anthropic launched something that felt more like an enterprise capability than a consumer product. And make no mistake, Claude is a capable, agentic model, which I wrote more about here. They also released the Model Context Protocol (MCP), which I’ve also written about and presented on. However, when it came to their consumer-facing product, Anthropic essentially put this smart model into a basic chat interface without giving it the tools needed to show its agentic power. The end user interacting with Claude AI wouldn’t even realize how smart and agentic the underlying model is.

Looking back, especially after seeing OpenAI’s o3/o4 mini launch, I can see how Anthropic could have stolen OpenAI’s thunder. If they had built the necessary scaffolding and provided tools for Claude to use directly within the chat interface, allowing it to search the web and execute code agentically, the user experience would have been different. Claude can do these things via MCP, but the default interface doesn’t allow it. This forces users like me to manually scaffold MCP and handcraft custom environments just to tap into the model’s full potential, which is far from ideal. With o3, OpenAI made it frictionless; it just works.

MCP and financial realities

In that sense, I was surprised when OpenAI announced they would also support MCP on their models. Initially, I was skeptical they’d adopt it as a standard. First, Anthropic developed it. Second, it seemed counter to OpenAI’s strategy. They were preparing o3/o4 mini as a product. They serve it through the API too, but that doesn’t feel like the main priority. Their strategy seems to be building their own scaffolding and tool integration into the model, then selling it as a product. MCP, as an open standard, pushes in a more open direction.

However, I think this kind of standardization is inevitable, so OpenAI likely followed suit. For MCP proliferation, the open-weights/source community and Anthropic seem like the main beneficiaries, not OpenAI. Personally, I think broader adoption of open standards is desirable, even if it wasn’t OpenAI’s first preference.

I believe that to achieve financial independence, OpenAI will aggressively build itself as a product company. I understand the criticisms about OpenAI deviating from its non‑profit roots. But as I’ve covered before, the reality seems simple: they need money. They need financial independence to do what they set out to do. Because of scaling laws and everything else, capital is necessary to scale up compute and continue research. I think they feel they have no choice but to pursue this path. I know Sam Altman can sound manipulative, but I think it may be true that they didn’t fully anticipate this financial necessity early on, and now they feel forced into this position.

The walled garden strategy

Given this product path, OpenAI’s recent moves start to make more sense. Take their enhanced memory feature, for example. This is a play for user retention, a step toward building a walled garden. They want users integrated into their ecosystem.

Looking ahead, imagine if OpenAI develops a truly frontier, genius-level model. What if they only offer it through their ChatGPT interface, not the API? Since subscriptions are the real cash cow compared to the low-margin API race, this seems plausible, especially if AGI-level capabilities emerge. A country of geniuses in a datacenter, only accessible via chatgpt.com. They could make a lot of money this way.

Combine that potential model superiority with features like memory that build up personalized context over time, and the friction for users to switch to a competitor becomes immense. Your interaction history, your personalized AI, all of it stays within OpenAI’s walls. The memory capability becomes a strategic tool. It makes the platform stickier and much harder to leave. In a way, OpenAI is making our accumulated data a reason to stay, almost holding it hostage to stop us from jumping to competitors. Food for thought as these platforms evolve.

Conclusion

So yes, o3 and o4 mini are qualitatively good models. But I think these kinds of performance improvements were somewhat expected given the trajectory of previous models and the industry. What feels non-trivial, and what not enough people seem to be paying attention to, is how OpenAI implemented this. The product integration, the default tool usage in the chat interface, and what this signals about their product-first direction: that’s the bigger story here.

Although OpenAI’s revenue mix already revealed its product focus, this release makes it obvious. ChatGPT’s first-mover advantage has generated a network effect: the more users it attracts, the more feedback and data it gathers, which funds better models, which in turn attract even more users. It’s the classic aggregator flywheel. Even though Google currently dazzles with pure-model wins like Gemini 2.5 Pro, and Anthropic keeps pushing Claude 3.7 Sonnet, neither has matched the friction-free, fully integrated product OpenAI now offers. Unless OpenAI makes a catastrophic misstep, I think that flywheel will accelerate.