Close Menu
    National News Brief
    Monday, June 1
    • Home
    • Business
    • Lifestyle
    • Science
    • Technology
    • International
    • Arts & Entertainment
    • Sports
    National News Brief
    Home » ARC-AGI-2: Leading AI models fail new test of artificial general intelligence

    ARC-AGI-2: Leading AI models fail new test of artificial general intelligence

    Team_NationalNewsBriefBy Team_NationalNewsBriefMarch 25, 2025 Science No Comments4 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email


    The ARC-AGI-2 benchmark is designed to be a difficult test for AI models

    Just_Super/Getty Images

    The most sophisticated AI models in existence today have scored poorly on a new benchmark designed to measure their progress towards artificial general intelligence (AGI) – and brute-force computing power won’t be enough to improve, as evaluators are now taking into account the cost of running the model.

    There are many competing definitions of AGI, but it is generally taken to refer to an AI that can perform any cognitive task that humans can do. To measure this, the ARC Prize Foundation previously launched a test of reasoning abilities called ARC-AGI-1. Last December, OpenAI announced that its o3 model had scored highly on the test, leading some to ask if the company was close to achieving AGI.

    But now a new test, ARC-AGI-2, has raised the bar. It is difficult enough that no current AI system on the market can achieve more than a single-digit score out of 100 on the test, while every question has been solved by at least two humans in fewer than two attempts.

    In a blog post announcing ARC-AGI-2, ARC president Greg Kamradt said the new benchmark was required to test different skills from the previous iteration. “To beat it, you must demonstrate both a high level of adaptability and high efficiency,” he wrote.

    The ARC-AGI-2 benchmark differs from other AI benchmark tests in that it focuses on AI models’ abilities to complete simplistic tasks – such as replicating changes in a new image based on past examples of symbolic interpretation – rather than their ability to match world-leading PhD performances. Current models are good at “deep learning”, which ARC-AGI-1 measured, but are not as good at the seemingly simpler tasks, which require more challenging thinking and interaction, in ARC-AGI-2. OpenAI’s o3-low model, for instance, scores 75.7 per cent on ARC-AGI-1, but just 4 per cent on ARC-AGI-2.

    The benchmark also adds a new dimension to measuring an AI’s capabilities, by looking at its efficiency in problem-solving, as measured by the cost required to complete a task. For example, while ARC paid its human testers $17 per task, it estimates that o3-low costs OpenAI $200 in fees for the same work.

    “I think the new iteration of ARC-AGI now focusing on balancing performance with efficiency is a big step towards a more realistic evaluation of AI models,” says Joseph Imperial at the University of Bath, UK. “This is a sign that we’re moving from one-dimensional evaluation tests solely focusing on performance but also considering less compute power.”

    Any model that is able to pass ARC-AGI-2 would need to not just be highly competent, but also smaller and lightweight, says Imperial – with the efficiency of the model being a key component of the new benchmark. This could help address concerns that AI models are becoming more energy-intensive – sometimes to the point of wastefulness – to achieve ever-greater results.

    However, not everyone is convinced that the new measure is beneficial. “The whole framing of this as it testing intelligence is not the right framing,” says Catherine Flick at the University of Staffordshire, UK. Instead, she says these benchmarks merely assess an AI’s ability to complete a single task or set of tasks well, which is then extrapolated to mean general capabilities across a series of tasks.

    Performing well on these benchmarks should not be seen as a major moment towards AGI, says Flick: “You see the media pick up that these models are passing these human-level intelligence tests, where actually they’re not; what they are doing is really just responding to a particular prompt accurately.”

    And exactly what happens if or when ARC-AGI-2 is passed is another question – will we need yet another benchmark? “If they were to develop ARC-AGI-3, I’m guessing they would add another axis in the graph denoting [the] minimum number of humans – whether expert or not – it would take to solve the tasks, in addition to performance and efficiency,” says Imperial. In other words, the debate over AGI is unlikely to be settled soon.

    Topics:



    Source link

    Team_NationalNewsBrief
    • Website

    Keep Reading

    Capitalism has warped our understanding of ecology and life’s origins

    NASA’s Hubble captures gorgeous new photo of a spiral galaxy as it wanders through the Virgo Cluster

    New protein-folding AI vastly expands on Alphafold’s efforts

    Top U.S. science funder slows research grants to universities

    These exotic particles could break physics

    Earliest use of anaesthetics uncovered in Chinese doctor’s tomb

    Add A Comment

    Comments are closed.

    Editors Picks

    Boss’ online safety advice for children sparks debate

    March 23, 2025

    Rodney Harrison confronts Tony Dungy over Bill Belichick HOF exclusion

    February 9, 2026

    Oddly viscous stars could be impersonating black holes

    August 9, 2025

    Martha Stewart Shares Her Best Thanksgiving Tips

    November 25, 2024

    Portugal beat Spain in penalty shootout to win second Nations League crown | Football News

    June 8, 2025
    Categories
    • Arts & Entertainment
    • Business
    • International
    • Latest News
    • Lifestyle
    • Opinions
    • Politics
    • Science
    • Sports
    • Technology
    • Top Stories
    • Trending News
    • World Economy
    About us

    Welcome to National News Brief, your one-stop destination for staying informed on the latest developments from around the globe. Our mission is to provide readers with up-to-the-minute coverage across a wide range of topics, ensuring you never miss out on the stories that matter most.

    At National News Brief, we cover World News, delivering accurate and insightful reports on global events and issues shaping the future. Our Tech News section keeps you informed about cutting-edge technologies, trends in AI, and innovations transforming industries. Stay ahead of the curve with updates on the World Economy, including financial markets, economic policies, and international trade.

    Editors Picks

    A personalized vaccine for melanoma cut the risk of cancer returning after five years

    June 1, 2026

    Canada Slips Into Recession | Armstrong Economics

    June 1, 2026

    Jacob Elordi Left ‘Starstruck’ By Kendall Jenner’s Elite World

    June 1, 2026

    Israel’s Netanyahu ordered military to attack targets in Beirut’s southern suburbs

    June 1, 2026
    Categories
    • Arts & Entertainment
    • Business
    • International
    • Latest News
    • Lifestyle
    • Opinions
    • Politics
    • Science
    • Sports
    • Technology
    • Top Stories
    • Trending News
    • World Economy
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Nationalnewsbrief.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.