The "First AI Software Engineer" Is Bungling the Vast Majority of Tasks It's Asked to Do

Full Article
The

Cognition's Devin, touted as the first AI software engineer, has been found to perform poorly, achieving only a 15% success rate in tasks. Researchers from Answer.AI conducted a month-long analysis, revealing that out of 20 tasks, 14 resulted in failures. The findings raise concerns about the reliability of AI in software engineering roles, especially given the high expectations set by Cognition's marketing.

The analysis highlighted Devin's inability to recognize fundamental blockers, leading to prolonged attempts at impossible tasks. Despite early impressive demos, the reality of Devin's performance starkly contrasts with Cognition's claims of its capabilities. This situation underscores the ongoing gap between AI promises and actual performance, questioning the feasibility of AI replacing human engineers in the near future.

• Cognition's Devin achieved only a 15% success rate in software tasks.

• Devin's performance raises doubts about AI's readiness to replace human engineers.

Key AI Terms Mentioned in this Article

AI Software Engineer

An AI software engineer refers to an AI system designed to perform software development tasks, as claimed by Cognition's Devin.

Autonomous AI

Autonomous AI systems operate independently to complete tasks, but Devin's failures highlight limitations in this capability.

Machine Learning

Machine learning involves algorithms that improve through experience, which Devin struggled to apply effectively in real-world scenarios.

Companies Mentioned in this Article

Cognition

Cognition is an AI tech company that developed Devin, claiming it to be the first AI software engineer.

Answer.AI

AI is an independent AI research lab that conducted the analysis of Devin's performance.

Get Email Alerts for AI News

By creating an email alert, you agree to AIleap's Terms of Service and Privacy Policy. You can pause or unsubscribe from email alerts at any time.

Latest Articles

Alphabet's AI drug discovery platform Isomorphic Labs raises $600M from Thrive
TechCrunch 6month

Isomorphic Labs, the AI drug discovery platform that was spun out of Google's DeepMind in 2021, has raised external capital for the first time. The $600

AI In Education - Up-level Your Teaching With AI By Cloning Yourself
Forbes 6month

How to level up your teaching with AI. Discover how to use clones and GPTs in your classroom—personalized AI teaching is the future.

Trump's Third Term - How AI Can Help To Overthrow The US Government
Forbes 6month

Trump's Third Term? AI already knows how this can be done. A study shows how OpenAI, Grok, DeepSeek & Google outline ways to dismantle U.S. democracy.

Sam Altman Says OpenAI Will Release an 'Open Weight' AI Model This Summer
Wired 6month

Sam Altman today revealed that OpenAI will release an open weight artificial intelligence model in the coming months. "We are excited to release a powerful new open-weight language model with reasoning in the coming months," Altman wrote on X.

Popular Topics