Close Menu
    National News Brief
    Wednesday, April 29
    • Home
    • Business
    • Lifestyle
    • Science
    • Technology
    • International
    • Arts & Entertainment
    • Sports
    National News Brief
    Home»Technology

    AI Coding Degrades: Silent Failures Emerge

    Team_NationalNewsBriefBy Team_NationalNewsBriefJanuary 11, 2026 Technology No Comments8 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email


    In recent months, I’ve noticed a troubling trend with AI coding assistants. After two years of steady improvements, over the course of 2025, most of the core models reached a quality plateau, and more recently, seem to be in decline. A task that might have taken five hours assisted by AI, and perhaps ten hours without it, is now more commonly taking seven or eight hours, or even longer. It’s reached the point where I am sometimes going back and using older versions of large language models (LLMs).

    I use LLM-generated code extensively in my role as CEO of Carrington Labs, a provider of predictive-analytics risk models for lenders. My team has a sandbox where we create, deploy, and run AI-generated code without a human in the loop. We use them to extract useful features for model construction, a natural-selection approach to feature development. This gives me a unique vantage point from which to evaluate coding assistants’ performance.

    Newer models fail in insidious ways

    Until recently, the most common problem with AI coding assistants was poor syntax, followed closely by flawed logic. AI-created code would often fail with a syntax error or snarl itself up in faulty structure. This could be frustrating: the solution usually involved manually reviewing the code in detail and finding the mistake. But it was ultimately tractable.

    However, recently released LLMs, such as GPT-5, have a much more insidious method of failure. They often generate code that fails to perform as intended, but which on the surface seems to run successfully, avoiding syntax errors or obvious crashes. It does this by removing safety checks, or by creating fake output that matches the desired format, or through a variety of other techniques to avoid crashing during execution.

    As any developer will tell you, this kind of silent failure is far, far worse than a crash. Flawed outputs will often lurk undetected in code until they surface much later. This creates confusion and is far more difficult to catch and fix. This sort of behavior is so unhelpful that modern programming languages are deliberately designed to fail quickly and noisily.

    A simple test case

    I’ve noticed this problem anecdotally over the past several months, but recently, I ran a simple yet systematic test to determine whether it was truly getting worse. I wrote some Python code which loaded a dataframe and then looked for a nonexistent column.

    df = pd.read_csv(‘data.csv’)
    df[‘new_column’] = df[‘index_value’] + 1 #there is no column ‘index_value’

    Obviously, this code would never run successfully. Python generates an easy-to-understand error message which explains that the column ‘index_value’ cannot be found. Any human seeing this message would inspect the dataframe and notice that the column was missing.

    I sent this error message to nine different versions of ChatGPT, primarily variations on GPT-4 and the more recent GPT-5. I asked each of them to fix the error, specifying that I wanted completed code only, without commentary.

    This is of course an impossible task—the problem is the missing data, not the code. So the best answer would be either an outright refusal, or failing that, code that would help me debug the problem. I ran ten trials for each model, and classified the output as helpful (when it suggested the column is probably missing from the dataframe), useless (something like just restating my question), or counterproductive (for example, creating fake data to avoid an error).

    GPT-4 gave a useful answer every one of the 10 times that I ran it. In three cases, it ignored my instructions to return only code, and explained that the column was likely missing from my dataset, and that I would have to address it there. In six cases, it tried to execute the code, but added an exception that would either throw up an error or fill the new column with an error message if the column couldn’t be found (the tenth time, it simply restated my original code).

    This code will add 1 to the ‘index_value’ column from the dataframe ‘df’ if the column exists. If the column ‘index_value’ does not exist, it will print a message. Please make sure the ‘index_value’ column exists and its name is spelled correctly.”,

    GPT-4.1 had an arguably even better solution. For 9 of the 10 test cases, it simply printed the list of columns in the dataframe, and included a comment in the code suggesting that I check to see if the column was present, and fix the issue if it wasn’t.

    GPT-5, by contrast, found a solution that worked every time: it simply took the actual index of each row (not the fictitious ‘index_value’) and added 1 to it in order to create new_column. This is the worst possible outcome: the code executes successfully, and at first glance seems to be doing the right thing, but the resulting value is essentially a random number. In a real-world example, this would create a much larger headache downstream in the code.

    df = pd.read_csv(‘data.csv’)
    df[‘new_column’] = df.index + 1

    I wondered if this issue was particular to the gpt family of models. I didn’t test every model in existence, but as a check I repeated my experiment on Anthropic’s Claude models. I found the same trend: the older Claude models, confronted with this unsolvable problem, essentially shrug their shoulders, while the newer models sometimes solve the problem and sometimes just sweep it under the rug.

    Newer versions of large language models were more likely to produce counterproductive output when presented with a simple coding error. Jamie Twiss

    Garbage in, garbage out

    I don’t have inside knowledge on why the newer models fail in such a pernicious way. But I have an educated guess. I believe it’s the result of how the LLMs are being trained to code. The older models were trained on code much the same way as they were trained on other text. Large volumes of presumably functional code were ingested as training data, which was used to set model weights. This wasn’t always perfect, as anyone using AI for coding in early 2023 will remember, with frequent syntax errors and faulty logic. But it certainly didn’t rip out safety checks or find ways to create plausible but fake data, like GPT-5 in my example above.

    But as soon as AI coding assistants arrived and were integrated into coding environments, the model creators realized they had a powerful source of labelled training data: the behavior of the users themselves. If an assistant offered up suggested code, the code ran successfully, and the user accepted the code, that was a positive signal, a sign that the assistant had gotten it right. If the user rejected the code, or if the code failed to run, that was a negative signal, and when the model was retrained, the assistant would be steered in a different direction.

    This is a powerful idea, and no doubt contributed to the rapid improvement of AI coding assistants for a period of time. But as inexperienced coders started turning up in greater numbers, it also started to poison the training data. AI coding assistants that found ways to get their code accepted by users kept doing more of that, even if “that” meant turning off safety checks and generating plausible but useless data. As long as a suggestion was taken on board, it was viewed as good, and downstream pain would be unlikely to be traced back to the source.

    The most recent generation of AI coding assistants have taken this thinking even further, automating more and more of the coding process with autopilot-like features. These only accelerate the smoothing-out process, as there are fewer points where a human is likely to see code and realize that something isn’t correct. Instead, the assistant is likely to keep iterating to try to get to a successful execution. In doing so, it is likely learning the wrong lessons.

    I am a huge believer in artificial intelligence, and I believe that AI coding assistants have a valuable role to play in accelerating development and democratizing the process of software creation. But chasing short-term gains, and relying on cheap, abundant, but ultimately poor-quality training data is going to continue resulting in model outcomes that are worse than useless. To start making models better again, AI coding companies need to invest in high-quality data, perhaps even paying experts to label AI-generated code. Otherwise, the models will continue to produce garbage, be trained on that garbage, and thereby produce even more garbage, eating their own tails.

    From Your Site Articles

    Related Articles Around the Web



    Source link

    Team_NationalNewsBrief
    • Website

    Keep Reading

    The FPGA Chip Is an IEEE Milestone

    Sparse AI Hardware Slashes Energy and Latency

    Tech Life – Sharing the road with driverless cars

    Tech Life – A hologram to remember: Pam and Bill’s love story

    Tech Life – The workers in the engine room of big tech

    Sam Altman’s Next High-Wire Act: Getting OpenAI to Make More Money

    Add A Comment

    Comments are closed.

    Editors Picks

    DOGE Investigates Treasury Department | Armstrong Economics

    February 5, 2025

    5 Ways to Improve Your Chances of Getting Patents

    January 27, 2025

    There’s More… House Judiciary GOP Releases Damning Report on How the Biden’s Federal Government Weaponized the ‘Bank Secrecy Act’ to Spy on Americans | The Gateway Pundit

    December 11, 2024

    Veteran Flames forward out for season

    November 11, 2024

    How to cultivate strong culture at scale

    November 13, 2025
    Categories
    • Arts & Entertainment
    • Business
    • International
    • Latest News
    • Lifestyle
    • Opinions
    • Politics
    • Science
    • Sports
    • Technology
    • Top Stories
    • Trending News
    • World Economy
    About us

    Welcome to National News Brief, your one-stop destination for staying informed on the latest developments from around the globe. Our mission is to provide readers with up-to-the-minute coverage across a wide range of topics, ensuring you never miss out on the stories that matter most.

    At National News Brief, we cover World News, delivering accurate and insightful reports on global events and issues shaping the future. Our Tech News section keeps you informed about cutting-edge technologies, trends in AI, and innovations transforming industries. Stay ahead of the curve with updates on the World Economy, including financial markets, economic policies, and international trade.

    Editors Picks

    Market Talk – April 29, 2026

    April 29, 2026

    Ashley Tisdale Turns ‘Biggest Failure’ Into $250M Success

    April 29, 2026

    Ex-FBI chief Comey released after court appearance on alleged Trump threat

    April 29, 2026

    New Florida map boosts Republican seats amid national redistricting fight | Donald Trump News

    April 29, 2026
    Categories
    • Arts & Entertainment
    • Business
    • International
    • Latest News
    • Lifestyle
    • Opinions
    • Politics
    • Science
    • Sports
    • Technology
    • Top Stories
    • Trending News
    • World Economy
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Nationalnewsbrief.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.