Saturday, December 21, 2024
HomeTechnologyOpenAI Unveils o3 Model and Becomes First to Crack the ARC-AGI Benchmark...

OpenAI Unveils o3 Model and Becomes First to Crack the ARC-AGI Benchmark in 5 Years


Last day of “12 days of OpenAIAs per the announcements, OpenAI revealed the biggest update. OpenAI announced O3 and O3-Mini reasoning models, and most importantly, OpenAI made history as O3 became the first AI model to crack this coveted model. ARC-AGI Benchmark, breaking a five-year unbeaten streak.

Image Credit: OpenAI via YouTube

On the ARC-AGI semi-private evaluation set, OpenAI’s O3 model scored a stellar 87.5% using higher-compute resources and more time to think. The ARC award limit was set at 85%, which is generally close to what humans can achieve. As you know, OpenAI O1 The model could score only 32% marks.

ARC-AGI is designed to test AI models for generalized intelligence, focusing on the ability to solve new problems rather than relying on memorized patterns. So with the o3 model, OpenAI has truly made a historic breakthrough in generalized intelligence. This could bring OpenAI closer to achievement AGI (Artificial General Intelligence) – An AI system that can match or even surpass human intelligence.

  • openai o3 codeforce
  • OpenAI O3 GPQA Diamond and Aim 2024

In addition to ARC-AGI, OpenAI o3 scored 71.7 in SWE-Bench Verified, 2,727 in Codeforces, 96.7 in AIME 2024, and 87.7 in GPQA Diamond. All these tests are highly challenging and the scores are much higher than those obtained by O1. Finally, in the EpochAI Frontier Math benchmark, which requires expert mathematicians to spend hours solving a problem, OpenAI O3 got a 25.2 accuracy. The previous best score was only 2.0.

As for the O3-mini model, OpenAI says it is a distilled model from O3, and is optimized for coding, faster performance, and cost-efficiency. The o3-Mini has three compute settings: low, medium, and high. At medium settings, the o3-mini performs better than the larger o1 model and costs less. Its latency is also less than the O1 model.

If you are wondering why it is called O3 and not O2, well, to avoid legal issues with O2, OpenAI, a UK-based mobile network operator, has decided to drop o2 altogether.

Finally, regarding availability, OpenAI says it is conducting security testing on o3 and o3-mini models. The company is also opening o3-mini model public safety testOpenAI plans to release the O3-mini model by the end of January 2025. And after that, after rigorous testing and approval by regulators, the O3 model will be released.

Table of Contents

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant for solving everyday computing problems.


Blog Credit

Source link

RELATED ARTICLES

Leave a Reply

Most Popular

Recent Comments

Зарегистрируйтесь, чтобы получить 100 USDT on Farmer Wants A Wife star Claire Saunders shares urgent warning after ‘shock’ health scare

Discover more from MovieBird

Subscribe now to keep reading and get access to the full archive.

Continue reading