OpenAI Unveils o3 Model and Becomes First to Crack the ARC-AGI Benchmark in 5 Years

December 21, 2024

3

Last day of “12 days of OpenAIAs per the announcements, OpenAI revealed the biggest update. OpenAI announced O3 and O3-Mini reasoning models, and most importantly, OpenAI made history as O3 became the first AI model to crack this coveted model. ARC-AGI Benchmark, breaking a five-year unbeaten streak.

Image Credit: OpenAI via YouTube

On the ARC-AGI semi-private evaluation set, OpenAI’s O3 model scored a stellar 87.5% using higher-compute resources and more time to think. The ARC award limit was set at 85%, which is generally close to what humans can achieve. As you know, OpenAI O1 The model could score only 32% marks.

ARC-AGI is designed to test AI models for generalized intelligence, focusing on the ability to solve new problems rather than relying on memorized patterns. So with the o3 model, OpenAI has truly made a historic breakthrough in generalized intelligence. This could bring OpenAI closer to achievement AGI (Artificial General Intelligence) – An AI system that can match or even surpass human intelligence.

Image Credit: OpenAI via YouTube
Image Credit: OpenAI via YouTube

In addition to ARC-AGI, OpenAI o3 scored 71.7 in SWE-Bench Verified, 2,727 in Codeforces, 96.7 in AIME 2024, and 87.7 in GPQA Diamond. All these tests are highly challenging and the scores are much higher than those obtained by O1. Finally, in the EpochAI Frontier Math benchmark, which requires expert mathematicians to spend hours solving a problem, OpenAI O3 got a 25.2 accuracy. The previous best score was only 2.0.

Image Credit: OpenAI via YouTube
Image Credit: OpenAI via YouTube

As for the O3-mini model, OpenAI says it is a distilled model from O3, and is optimized for coding, faster performance, and cost-efficiency. The o3-Mini has three compute settings: low, medium, and high. At medium settings, the o3-mini performs better than the larger o1 model and costs less. Its latency is also less than the O1 model.

If you are wondering why it is called O3 and not O2, well, to avoid legal issues with O2, OpenAI, a UK-based mobile network operator, has decided to drop o2 altogether.

Finally, regarding availability, OpenAI says it is conducting security testing on o3 and o3-mini models. The company is also opening o3-mini model public safety testOpenAI plans to release the O3-mini model by the end of January 2025. And after that, after rigorous testing and approval by regulators, the O3 model will be released.

Table of Contents

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant for solving everyday computing problems.

Blog Credit

Source link

Post Views: 3

OpenAI Unveils o3 Model and Becomes First to Crack the ARC-AGI Benchmark in 5 Years

Arjun Sha

Blog Credit

Gemini 2.0 Flash Thinking vs ChatGPT o1: OpenAI Thinks Deeper

How to Make Resin Armor Trim in Minecraft

Third Epic Games Store Free Mystery Game Revealed to be a ‘Slay The Spire’ Style Deck Builder

Leave a ReplyCancel reply

Most Popular

More than 271K migrants were deported from the US in 2024 – just 9% of the 2.9M encountered at the borders

Gemini 2.0 Flash Thinking vs ChatGPT o1: OpenAI Thinks Deeper

KFC to launch new tender concept — when and where it will open

NYC restaurant worker murdered by angry patron, wife says ‘He didn’t deserve to die like this.’

Recent Comments

Popular Posts

What is X? | Beebom

Lust Stories 2 का ट्रेलर हुआ रिलीज, जानें कितनी धमाकेदार होगी यह वेब सीरीज

5 Alasan DC Extended Universe Tetap Tidak Sukses di Tahun Terakhirnya

Recent Posts

More than 271K migrants were deported from the US in 2024 – just 9% of the 2.9M encountered at the borders

Gemini 2.0 Flash Thinking vs ChatGPT o1: OpenAI Thinks Deeper

KFC to launch new tender concept — when and where it will open

POPULAR CATEGORY

ABOUT US

FOLLOW US

OpenAI Unveils o3 Model and Becomes First to Crack the ARC-AGI Benchmark in 5 Years

Blog Credit

Leave a ReplyCancel reply

Most Popular

Recent Comments

Popular Posts

Recent Posts

POPULAR CATEGORY

ABOUT US

FOLLOW US

Discover more from MovieBird