Software Services
For Companies
For Developers
Portfolio
Build With Us
Get Senior Engineers Straight To Your Inbox
Every month we send out our top new engineers in our network who are looking for work, be the first to get informed when top engineers become available
At Slashdev, we connect top-tier software engineers with innovative companies. Our network includes the most talented developers worldwide, carefully vetted to ensure exceptional quality and reliability.
Build With Us
Everything You Need To Know About OpenAI’s o3 Model Release/
OpenAI Skips o2, Jumps Straight to o3
OpenAI saved its biggest announcement for the last day of its 12-day “shipmas” event. The company unveiled o3, the successor to the o1 “reasoning” model released earlier in the year. o3 is actually a model family, consisting of o3 and o3-mini, a smaller, distilled model fine-tuned for particular tasks.
The naming choice raised some eyebrows – why o3 and not o2? According to The Information, OpenAI skipped o2 to avoid a potential trademark conflict with British telecom provider O2, something CEO Sam Altman hinted at during a recent livestream.
Availability and Safety Considerations
Neither model is widely available yet. Safety researchers can sign up for an o3-mini preview starting now, with the o3 preview coming at an unspecified later date. Altman indicated plans to launch o3-mini toward the end of January, followed by o3.
This timeline appears to conflict with Altman’s recent statements, where he expressed a preference for establishing federal testing frameworks before releasing new reasoning models to guide monitoring and risk mitigation.
Safety concerns are indeed significant. Testing has shown that o1’s reasoning abilities led to higher rates of attempted deception compared to conventional models and competitors from Meta, Anthropic, and Google. OpenAI claims to be addressing these concerns through a new “deliberative alignment” technique, detailed in a recent study.
How o3’s Reasoning Works
Unlike traditional AI models, o3 incorporates a self-fact-checking process. This results in longer response times – typically seconds to minutes – but delivers more reliable results, particularly in domains like physics, science, and mathematics.
The model employs reinforcement learning to develop a “private chain of thought,” allowing it to reason through tasks and plan ahead. When given a prompt, o3 pauses to consider related prompts and explain its reasoning before providing what it determines to be the most accurate response.
A new feature in o3 is adjustable reasoning time, with low, medium, and high compute settings. Higher compute settings correlate with better task performance, though the model isn’t infallible – even with extended reasoning time, errors and hallucinations can still occur.
Benchmark Performance and AGI Claims
OpenAI’s benchmarking results are noteworthy. On ARC-AGI, a test evaluating AI systems’ ability to acquire new skills, o3 achieved an 87.5% score on high compute settings, though at significant computational cost – thousands of dollars per challenge, according to ARC-AGI co-creator François Chollet.
Chollet cautioned against overinterpreting these results, noting that o3 still struggles with “very easy tasks” and exhibits “fundamental differences” from human intelligence. He predicts that upcoming benchmark versions may prove more challenging for o3.
The model’s performance on other benchmarks is impressive:
- 22.8 percentage point improvement over o1 on SWE-Bench Verified
- Codeforces rating of 2727 (99.2nd percentile)
- 96.7% score on the 2024 American Invitational Mathematics Exam
- 87.7% on GPQA Diamond (graduate-level science questions)
- Record-setting 25.2% on EpochAI’s Frontier Math benchmark
Industry Trends and Competition
The release of OpenAI’s first reasoning models has sparked similar developments across the industry. DeepSeek launched DeepSeek-R1, while Alibaba’s Qwen team released what they claim is the first “open” challenger to o1.
This trend reflects the industry’s search for new approaches to improve generative AI, as traditional scaling methods show diminishing returns. However, questions remain about the viability of reasoning models, given their substantial computational requirements and unclear long-term potential.
The announcement coincides with the departure of Alec Radford, a key OpenAI scientist and lead author of the foundational GPT series paper, who is leaving to pursue independent research.