Google DeepMind researchers have unveiled a new benchmark called FACTS Grounding, aimed at enhancing the factual accuracy of large language models (LLMs). This benchmark evaluates LLMs based on their ability to generate accurate responses from long-form documents, addressing the ongoing issue of hallucinations in AI outputs. The introduction of a FACTS leaderboard on Kaggle allows for real-time tracking of model performance, with Gemini 2.0 Flash currently leading with a score of 83.6%.
The FACTS dataset comprises 1,719 examples that require models to produce detailed, contextually relevant responses. Researchers emphasize the importance of grounding and factuality for the future success of LLMs, noting that the benchmark will evolve as new models are developed. This initiative represents a significant step towards improving the reliability of AI-generated information.
• FACTS Grounding benchmark aims to improve LLM factuality and reduce hallucinations.
• Gemini 2.0 Flash currently leads the FACTS leaderboard with 83.6% accuracy.
Hallucinations refer to factually inaccurate responses generated by LLMs, which remain a significant challenge.
Factuality is the measure of how accurately an LLM can generate responses based on provided documents.
A benchmark is a standard or point of reference against which LLM performance can be measured, such as the FACTS Grounding.
Google DeepMind is focused on advancing AI research, exemplified by the introduction of the FACTS Grounding benchmark.
Anthropic is involved in AI development, with its models ranking on the FACTS leaderboard for factuality.
OpenAI's models are included in the FACTS leaderboard, showcasing their performance in generating accurate responses.
VentureBeat 13month
ABC (Australian Broadcasting Corporation) 7month
Isomorphic Labs, the AI drug discovery platform that was spun out of Google's DeepMind in 2021, has raised external capital for the first time. The $600
How to level up your teaching with AI. Discover how to use clones and GPTs in your classroom—personalized AI teaching is the future.
Trump's Third Term? AI already knows how this can be done. A study shows how OpenAI, Grok, DeepSeek & Google outline ways to dismantle U.S. democracy.
Sam Altman today revealed that OpenAI will release an open weight artificial intelligence model in the coming months. "We are excited to release a powerful new open-weight language model with reasoning in the coming months," Altman wrote on X.