AI can fix bugs—but can't find them: OpenAI's study highlights limits of LLMs in software engineering

OpenAI's recent research highlights the limitations of large language models (LLMs) in software engineering tasks. The study, utilizing a benchmark called SWE-Lancer, tested models like GPT-4o and Claude-3.5 Sonnet on real freelance coding tasks from Upwork. Results showed that while these models can solve bugs, they struggle to understand the underlying issues, leading to frequent mistakes.

The research involved 1,488 tasks with a total payout of $1 million, divided into individual contributor and management tasks. Despite Claude 3.5 Sonnet performing the best, it only resolved 26.2% of issues correctly, indicating that LLMs cannot yet replace human engineers. The findings suggest that while LLMs can assist in coding, they still require significant improvement to be reliable in real-world applications.

Key AI Highlights in this Article

• LLMs struggle to identify root causes of coding issues.

• Claude 3.5 Sonnet was the best-performing model in the study.

Key AI Terms Mentioned in this Article

Large Language Models (LLMs)

LLMs are AI models designed to understand and generate human-like text, but they struggle with complex coding tasks.

Benchmarking

Benchmarking involves testing models against specific tasks to evaluate their performance, as seen with the SWE-Lancer dataset.

Freelance Software Engineering Tasks

These tasks represent real-world coding challenges that LLMs were tested on, revealing their limitations in practical applications.

Companies Mentioned in this Article

OpenAI

OpenAI is a research organization focused on developing advanced AI technologies, including LLMs like GPT-4o.

Anthropic

5 Sonnet, a model tested in the study.

OpenAI Anthropic Text generation AI Ethics

Related News

AI can fix bugs—but can't find them: OpenAI's study highlights limits of LLMs in software engineering

VentureBeat 8month

OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems

MSN 8month

Apple researchers suggest artificial intelligence is still mostly an illusion

Tech Xplore on MSN.com 13month

AI Is Too Unpredictable to Behave According to Human Goals

Scientific American 9month

Taking stock of a new wave of AI revolution

Hindustan Times on MSN.com 11month

Beyond the hype: Why AI still falls short in complex analogy tasks

devdiscourse 8month

Breaking safeguards: Can AI be coerced into generating harmful responses?

devdiscourse 10month

MIT CSAIL Finds LLMs Developing Simulations of Reality as Language Skills Advance

India Education Diary 14month

Latest Articles

Alphabet's AI drug discovery platform Isomorphic Labs raises $600M from Thrive

TechCrunch 7month

Isomorphic Labs, the AI drug discovery platform that was spun out of Google's DeepMind in 2021, has raised external capital for the first time. The $600

AI In Education - Up-level Your Teaching With AI By Cloning Yourself

Forbes 7month

How to level up your teaching with AI. Discover how to use clones and GPTs in your classroom—personalized AI teaching is the future.

Trump's Third Term - How AI Can Help To Overthrow The US Government

Forbes 7month

Trump's Third Term? AI already knows how this can be done. A study shows how OpenAI, Grok, DeepSeek & Google outline ways to dismantle U.S. democracy.

Sam Altman Says OpenAI Will Release an 'Open Weight' AI Model This Summer

Wired 7month

Sam Altman today revealed that OpenAI will release an open weight artificial intelligence model in the coming months. "We are excited to release a powerful new open-weight language model with reasoning in the coming months," Altman wrote on X.

Guest

Explore AI

Explore GPTs

Explore AI News

Explore AI Videos

Explore AI for Jobs

AI can fix bugs—but can't find them: OpenAI's study highlights limits of LLMs in software engineering

Large Language Models (LLMs)

Benchmarking

Freelance Software Engineering Tasks

OpenAI

Anthropic

Related News

AI can fix bugs—but can't find them: OpenAI's study highlights limits of LLMs in software engineering

OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems

Apple researchers suggest artificial intelligence is still mostly an illusion

AI Is Too Unpredictable to Behave According to Human Goals

Taking stock of a new wave of AI revolution

Beyond the hype: Why AI still falls short in complex analogy tasks

Breaking safeguards: Can AI be coerced into generating harmful responses?

MIT CSAIL Finds LLMs Developing Simulations of Reality as Language Skills Advance

Get Email Alerts for AI News

Latest Articles

Popular Topics