The Library of Congress, with its vast collection of 180 million works, is becoming a hotspot for AI companies seeking to train their language models. This digital archive, containing 185 petabytes of data, offers a wealth of public domain content that is free from copyright restrictions. AI developers are particularly drawn to this resource as they look for legal and diverse data to enhance their models.
Access to the Library's data is facilitated through an API, which has seen significant traffic growth since its launch in September 2022. Major companies like OpenAI, Amazon, and Microsoft are exploring ways to utilize this data for AI applications, such as improving library services and historical document analysis. However, challenges remain, including the need for historical accuracy and the risk of AI-generated inaccuracies.
• Library of Congress data is increasingly sought after by AI startups.
• AI companies face challenges in historical accuracy when using contemporary models.
These models are being trained using diverse datasets from the Library of Congress.
The Library's API allows users to access its vast data resources in a structured manner.
The Library of Congress's collections are appealing to AI developers because they consist largely of public domain materials.
OpenAI is interested in utilizing the Library's data to enhance its language models.
Amazon is exploring the Library of Congress as a resource for improving its AI capabilities.
Microsoft is also looking to leverage the Library's data for AI applications.
Isomorphic Labs, the AI drug discovery platform that was spun out of Google's DeepMind in 2021, has raised external capital for the first time. The $600
How to level up your teaching with AI. Discover how to use clones and GPTs in your classroom—personalized AI teaching is the future.
Trump's Third Term? AI already knows how this can be done. A study shows how OpenAI, Grok, DeepSeek & Google outline ways to dismantle U.S. democracy.
Sam Altman today revealed that OpenAI will release an open weight artificial intelligence model in the coming months. "We are excited to release a powerful new open-weight language model with reasoning in the coming months," Altman wrote on X.