OpenAI launches o3 and o4-mini, AI models that ‘think with images’ and use tools autonomously

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

OpenAI launched two groundbreaking AI models today that can reason with images and use tools independently, representing what experts call a step change in artificial intelligence capabilities.

The San Francisco-based company introduced o3 and o4-mini, the latest in its “o-series” of reasoning models, which it claims are its most intelligent and capable models to date. These systems can integrate images directly into their reasoning process, search the web, run code, analyze files, and even generate images within a single task flow.

“There are some models that feel like a qualitative step into the future. GPT-4 was one of those. Today is also going to be one of those days,” said Greg Brockman, OpenAI’s president, during a press conference announcing the release. “These are the first models where top scientists tell us they produce legitimately good and useful novel ideas.”

How OpenAI’s new models ‘think with images’ to transform visual problem-solving

The most striking feature of these new models is their ability to “think with images” — not just see them, but manipulate and reason about them as part of their problem-solving process.

“They don’t just see an image — they think with it,” OpenAI said in a statement sent to VentureBeat. “This unlocks a new class of problem-solving that blends visual and textual reasoning.”

During a demonstration at the press conference, a researcher showed how o3 could analyze a physics poster from a decade-old internship, navigate its complex diagrams independently, and even identify that the final result wasn’t present in the poster itself.

“It must have just read, you know, at least like 10 different papers in a few seconds for me,” Brandon McKenzie, a researcher at OpenAI working on multimodal reasoning, said during the demo. He estimated the task would have taken him “many days just for me to even like, onboard myself, back to my project, and then a few days more probably, to actually search through the literature.”

The ability for AI to manipulate images in its reasoning process — zooming in on details, rotating diagrams, or cropping unnecessary elements — represents a novel approach that industry analysts say could revolutionize fields from scientific research to education.

I had early access, o3 is an impressive model, seems very capable. Some fun examples:
1) Cracked a business case I use in my class
2) Creating some SVGs (images created by code alone)
3) Writing a constrained story of two interlocking gyres
4) Hard science fiction space battle. pic.twitter.com/TK4PKvKNoT
— Ethan Mollick (@emollick) April 16, 2025

OpenAI executives emphasized that these releases represent more than just improved models — they’re complete AI systems that can independently use and chain together multiple tools when solving problems.

“We’ve trained them to use tools through reinforcement learning—teaching them not just how to use tools, but to reason about when to use them,” the company explained in its release.

Greg Brockman highlighted the models’ extensive tool use capabilities: “They actually use these tools in their chain of thought as they’re trying to solve a hard problem. For example, we’ve seen o3 use like 600 tool calls in a row trying to solve a really hard task.”

This capability allows the models to perform complex, multi-step workflows without constant human direction. For instance, if asked about future energy usage patterns in California, the AI can search the web for utility data, write Python code to analyze it, generate visualizations, and produce a comprehensive report — all as a single fluid process.

OpenAI surges ahead of competitors with record-breaking performance on key AI benchmarks

OpenAI claims o3 sets new state-of-the-art benchmarks across key measures of AI capability, including Codeforces, SWE-bench, and MMMU. In evaluations by external experts, o3 reportedly makes 20 percent fewer major errors than its predecessor on difficult, real-world tasks.

The smaller o4-mini model is optimized for speed and cost efficiency while maintaining strong reasoning capabilities. On the AIME 2025 mathematics competition, o4-mini scored 99.5 percent when given access to a Python interpreter.

“I really do believe that with this suite of models, o3 and o4-mini, we’re going to see more advances,” Mark Chen, OpenAI’s head of research, said during the press conference.

The timing of this release is significant, coming just two days after OpenAI unveiled its GPT-4.1 model, which excels at coding tasks. The rapid succession of announcements signals an acceleration in the competitive AI landscape, where OpenAI faces increasing pressure from Google’s Gemini models, Anthropic’s Claude, and Elon Musk’s xAI.

Last month, OpenAI closed what amounts to the largest private tech funding round in history, raising $40 billion at a $300 billion valuation. The company is also reportedly considering building its own social network, potentially to compete with Elon Musk’s X platform and to secure a proprietary source of training data.

o3 and o4-mini are super good at coding, so we are releasing a new product, Codex CLI, to make them easier to use.
this is a coding agent that runs on your computer. it is fully open source and available today; we expect it to rapidly improve.
— Sam Altman (@sama) April 16, 2025

How OpenAI’s new models transform software engineering with unprecedented code navigation abilities

One area where the new models particularly excel is software engineering. Brockman noted during the press conference that o3 is “actually better than I am at navigating through our OpenAI code base, which is really useful.”

As part of the announcement, OpenAI also introduced Codex CLI, a lightweight coding agent that runs directly in a user’s terminal. The open-source tool allows developers to leverage the models’ reasoning capabilities for coding tasks, with support for screenshots and sketches.

“We’re also sharing a new experiment: Codex CLI, a lightweight coding agent you can run from your terminal,” the company announced. “You can get the benefits of multimodal reasoning from the command line by passing screenshots or low fidelity sketches to the model, combined with access to your code locally.”

To encourage adoption, OpenAI is launching a $1 million initiative to support projects using Codex CLI and OpenAI models, with grants available in increments of $25,000 in API credits.

Inside OpenAI’s enhanced safety protocols: How the company protects against AI misuse

OpenAI reports conducting extensive safety testing on the new models, particularly focused on their ability to refuse harmful requests. The company’s safety measures include completely rebuilding their safety training data and developing system-level mitigations to flag dangerous prompts.

“We stress tested both models with our most rigorous safety program to date,” the company stated, noting that both o3 and o4-mini remain below OpenAI’s “High” threshold for potential risks in biological, cybersecurity, and AI self-improvement capabilities.

During the press conference, OpenAI researchers Wenda and Ananya presented detailed benchmark results, noting that the new models underwent over 10 times the training compute of previous versions to achieve their capabilities.

When and how you can access o3 and o4-mini: Deployment timeline and commercial strategy

The new models are immediately available to ChatGPT Plus, Pro, and Team users, with Enterprise and Education customers gaining access next week. Free users can sample o4-mini by selecting “Think” in the composer before submitting queries.

Developers can access both models via OpenAI’s Chat Completions API and Responses API, though some organizations will need verification to access them.

The release represents a significant commercial opportunity for OpenAI, as the models appear both more capable and more cost-efficient than their predecessors. “For example, on the 2025 AIME math competition, the cost-performance frontier for o3 strictly improves over o1, and similarly, o4-mini’s frontier strictly improves over o3-mini,” the company stated.

The future of AI: How OpenAI is bridging reasoning and conversation for next-generation systems

Industry analysts view these releases as part of a broader convergence in AI capabilities, with models increasingly combining specialized reasoning with natural conversation abilities and tool use.

“Today’s updates reflect the direction our models are heading in: we’re converging the specialized reasoning capabilities of the o-series with more of the natural conversational abilities and tool use of the GPT-series,” OpenAI noted in its release.

Ethan Mollick, associate professor at the Wharton School who studies AI adoption, described o3 as “a very strong model, but still a jagged one” in a social media post after the announcement.

As competition in the AI space continues to intensify, with Google, Anthropic, and others releasing increasingly powerful models, OpenAI’s dual focus on both reasoning capabilities and practical tool use suggests a strategy aimed at maintaining its leadership position by delivering both intelligence and utility.

With o3 and o4-mini, OpenAI has crossed a threshold where machines begin to perceive images the way humans do—manipulating visual information as an integral part of their thinking process rather than merely analyzing what they see. This shift from passive recognition to active visual reasoning may ultimately prove more significant than any benchmark score, representing the moment when AI began to truly see the world through thinking eyes.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link