Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations

Full Article

Google DeepMind researchers have unveiled a new benchmark called FACTS Grounding, aimed at enhancing the factual accuracy of large language models (LLMs). This benchmark evaluates LLMs based on their ability to generate accurate responses from long-form documents, addressing the ongoing issue of hallucinations in AI outputs. The introduction of a FACTS leaderboard on Kaggle allows for real-time tracking of model performance, with Gemini 2.0 Flash currently leading with a score of 83.6%.

The FACTS dataset comprises 1,719 examples that require models to produce detailed, contextually relevant responses. Researchers emphasize the importance of grounding and factuality for the future success of LLMs, noting that the benchmark will evolve as new models are developed. This initiative represents a significant step towards improving the reliability of AI-generated information.

• FACTS Grounding benchmark aims to improve LLM factuality and reduce hallucinations.

• Gemini 2.0 Flash currently leads the FACTS leaderboard with 83.6% accuracy.

Key AI Terms Mentioned in this Article

Hallucinations

Hallucinations refer to factually inaccurate responses generated by LLMs, which remain a significant challenge.

Factuality

Factuality is the measure of how accurately an LLM can generate responses based on provided documents.

Benchmark

A benchmark is a standard or point of reference against which LLM performance can be measured, such as the FACTS Grounding.

Companies Mentioned in this Article

Google DeepMind

Google DeepMind is focused on advancing AI research, exemplified by the introduction of the FACTS Grounding benchmark.

Anthropic

Anthropic is involved in AI development, with its models ranking on the FACTS leaderboard for factuality.

OpenAI

OpenAI's models are included in the FACTS leaderboard, showcasing their performance in generating accurate responses.

Get Email Alerts for AI News

By creating an email alert, you agree to AIleap's Terms of Service and Privacy Policy. You can pause or unsubscribe from email alerts at any time.

Latest Articles

Alphabet's AI drug discovery platform Isomorphic Labs raises $600M from Thrive
TechCrunch 6month

Isomorphic Labs, the AI drug discovery platform that was spun out of Google's DeepMind in 2021, has raised external capital for the first time. The $600

AI In Education - Up-level Your Teaching With AI By Cloning Yourself
Forbes 6month

How to level up your teaching with AI. Discover how to use clones and GPTs in your classroom—personalized AI teaching is the future.

Trump's Third Term - How AI Can Help To Overthrow The US Government
Forbes 6month

Trump's Third Term? AI already knows how this can be done. A study shows how OpenAI, Grok, DeepSeek & Google outline ways to dismantle U.S. democracy.

Sam Altman Says OpenAI Will Release an 'Open Weight' AI Model This Summer
Wired 6month

Sam Altman today revealed that OpenAI will release an open weight artificial intelligence model in the coming months. "We are excited to release a powerful new open-weight language model with reasoning in the coming months," Altman wrote on X.

Popular Topics