AI has a stupid secret: we're still not sure how to test for human levels of intelligence

Full Article
AI has a stupid secret: we're still not sure how to test for human levels of intelligence

San Francisco's AI leaders, Scale AI and the Center for AI Safety, have launched an initiative called 'Humanity's Last Exam' to test large language models like Google Gemini and OpenAI's o1. This initiative invites the public to submit questions that can effectively evaluate the capabilities of these AI systems, with a prize of $5,000 for the best submissions. The goal is to assess how close AI has come to achieving expert-level intelligence.

Despite current AI models excelling in various established tests, there remains uncertainty about the significance of these results due to potential pre-learned answers from vast training datasets. The challenge extends to defining and measuring intelligence, particularly artificial general intelligence (AGI), which surpasses human capabilities. As AI continues to evolve, new benchmarks and testing methods are essential to ensure accurate assessments of their intelligence.

• Scale AI and CAIS challenge public to test AI capabilities.

• New benchmarks needed to measure AI intelligence effectively.

Key AI Terms Mentioned in this Article

Large Language Models (LLMs)

LLMs like Google Gemini and OpenAI's o1 are central to the testing initiative launched by Scale AI and CAIS.

Artificial General Intelligence (AGI)

The article discusses the importance of measuring AGI as AI systems demonstrate broader intelligent behavior.

Model Collapse

The article highlights concerns about model collapse as AI-generated content floods training datasets.

Companies Mentioned in this Article

Scale AI

Scale AI is collaborating with CAIS to evaluate AI capabilities through public engagement.

Center for AI Safety

CAIS partners with Scale AI to launch the initiative aimed at testing AI intelligence.

Get Email Alerts for AI News

By creating an email alert, you agree to AIleap's Terms of Service and Privacy Policy. You can pause or unsubscribe from email alerts at any time.

Latest Articles

Alphabet's AI drug discovery platform Isomorphic Labs raises $600M from Thrive
TechCrunch 6month

Isomorphic Labs, the AI drug discovery platform that was spun out of Google's DeepMind in 2021, has raised external capital for the first time. The $600

AI In Education - Up-level Your Teaching With AI By Cloning Yourself
Forbes 6month

How to level up your teaching with AI. Discover how to use clones and GPTs in your classroom—personalized AI teaching is the future.

Trump's Third Term - How AI Can Help To Overthrow The US Government
Forbes 6month

Trump's Third Term? AI already knows how this can be done. A study shows how OpenAI, Grok, DeepSeek & Google outline ways to dismantle U.S. democracy.

Sam Altman Says OpenAI Will Release an 'Open Weight' AI Model This Summer
Wired 6month

Sam Altman today revealed that OpenAI will release an open weight artificial intelligence model in the coming months. "We are excited to release a powerful new open-weight language model with reasoning in the coming months," Altman wrote on X.

Popular Topics