Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations

Google DeepMind researchers have unveiled a new benchmark called FACTS Grounding, aimed at enhancing the factual accuracy of large language models (LLMs). This benchmark evaluates LLMs based on their ability to generate accurate responses from long-form documents, addressing the ongoing issue of hallucinations in AI outputs. The introduction of a FACTS leaderboard on Kaggle allows for real-time tracking of model performance, with Gemini 2.0 Flash currently leading with a score of 83.6%.

The FACTS dataset comprises 1,719 examples that require models to produce detailed, contextually relevant responses. Researchers emphasize the importance of grounding and factuality for the future success of LLMs, noting that the benchmark will evolve as new models are developed. This initiative represents a significant step towards improving the reliability of AI-generated information.

Key AI Highlights in this Article

• FACTS Grounding benchmark aims to improve LLM factuality and reduce hallucinations.

• Gemini 2.0 Flash currently leads the FACTS leaderboard with 83.6% accuracy.

Key AI Terms Mentioned in this Article

Hallucinations

Hallucinations refer to factually inaccurate responses generated by LLMs, which remain a significant challenge.

Factuality

Factuality is the measure of how accurately an LLM can generate responses based on provided documents.

Benchmark

A benchmark is a standard or point of reference against which LLM performance can be measured, such as the FACTS Grounding.

Companies Mentioned in this Article

Google DeepMind

Google DeepMind is focused on advancing AI research, exemplified by the introduction of the FACTS Grounding benchmark.

Anthropic

Anthropic is involved in AI development, with its models ranking on the FACTS leaderboard for factuality.

OpenAI

OpenAI's models are included in the FACTS leaderboard, showcasing their performance in generating accurate responses.

Google DeepMind Anthropic OpenAI Text generation AI Trends

Related News

Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations

VentureBeat 9month

DataGemma: Google's open AI models mitigate hallucination on statistical queries

VentureBeat 13month

Diffbot's AI Model Suggests "Smaller Is Better" for LLMs

Techopedia 8month

OpenAI claims its newest chatbot GPT-4.5 should 'hallucinate less'. How is that measured?

ABC (Australian Broadcasting Corporation) 7month

With hallucinations waning, AI is diving deeper into scientific research

The Next Web 16month

Study suggests that even the best AI models hallucinate a bunch

Yahoo 14month

AI-Powered Study Links Speech Speed to Early Cognitive Decline

Tasnim News Agency 15month

Breakthrough In Preemptive Detection Of AI Hallucinations Reveals Vital Clues To Writing Prompts That Keep Generative AI From Freaking Out

Forbes 10month

Latest Articles

Alphabet's AI drug discovery platform Isomorphic Labs raises $600M from Thrive

TechCrunch 6month

Isomorphic Labs, the AI drug discovery platform that was spun out of Google's DeepMind in 2021, has raised external capital for the first time. The $600

AI In Education - Up-level Your Teaching With AI By Cloning Yourself

Forbes 6month

How to level up your teaching with AI. Discover how to use clones and GPTs in your classroom—personalized AI teaching is the future.

Trump's Third Term - How AI Can Help To Overthrow The US Government

Forbes 6month

Trump's Third Term? AI already knows how this can be done. A study shows how OpenAI, Grok, DeepSeek & Google outline ways to dismantle U.S. democracy.

Sam Altman Says OpenAI Will Release an 'Open Weight' AI Model This Summer

Wired 6month

Sam Altman today revealed that OpenAI will release an open weight artificial intelligence model in the coming months. "We are excited to release a powerful new open-weight language model with reasoning in the coming months," Altman wrote on X.

Guest

Explore AI

Explore GPTs

Explore AI News

Explore AI Videos

Explore AI for Jobs

Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations

Hallucinations

Factuality

Benchmark

Google DeepMind

Anthropic

OpenAI

Related News

Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations

DataGemma: Google's open AI models mitigate hallucination on statistical queries

Diffbot's AI Model Suggests "Smaller Is Better" for LLMs

OpenAI claims its newest chatbot GPT-4.5 should 'hallucinate less'. How is that measured?

With hallucinations waning, AI is diving deeper into scientific research

Study suggests that even the best AI models hallucinate a bunch

AI-Powered Study Links Speech Speed to Early Cognitive Decline

Breakthrough In Preemptive Detection Of AI Hallucinations Reveals Vital Clues To Writing Prompts That Keep Generative AI From Freaking Out

Get Email Alerts for AI News

Latest Articles

Popular Topics