Microsoft 's new free tool OmniParser V2 gives more power to large language models (LLMs)

Full Article
Microsoft 's new free tool OmniParser V2 gives more power to large language models (LLMs)

Microsoft has launched OmniParser V2, a new open-source tool designed to empower large language models (LLMs) in automating graphical user interfaces (GUIs). This model is trained with an extensive dataset for interactive element detection and icon functional captions, significantly improving its performance. By reducing the image size of the icon caption model, OmniParser V2 achieves a 60% reduction in latency compared to its predecessor.

OmniParser V2 addresses key challenges faced by LLMs in GUI automation, such as identifying interactable icons and understanding their functions. The tool tokenizes UI screenshots into structured elements, allowing AI models to predict the next actions based on parsed data. With a notable accuracy improvement on the ScreenSpot Pro benchmark, OmniParser V2 represents a significant advancement in AI's ability to interact with user interfaces.

• OmniParser V2 reduces latency by 60% for GUI automation tasks.

• The tool achieves state-of-the-art accuracy on the ScreenSpot Pro benchmark.

Key AI Terms Mentioned in this Article

Large Language Models (LLMs)

LLMs are deep-learning models pre-trained on vast datasets, enabling them to perform complex tasks.

Graphical User Interface (GUI)

GUIs are visual interfaces that allow users to interact with software through graphical elements.

Tokenization

Tokenization in this context refers to converting UI screenshots into structured, interpretable elements for AI models.

Companies Mentioned in this Article

Microsoft

Microsoft is a leading technology company that develops AI tools like OmniParser V2 to enhance user interface automation.

Get Email Alerts for AI News

By creating an email alert, you agree to AIleap's Terms of Service and Privacy Policy. You can pause or unsubscribe from email alerts at any time.

Latest Articles

Alphabet's AI drug discovery platform Isomorphic Labs raises $600M from Thrive
TechCrunch 6month

Isomorphic Labs, the AI drug discovery platform that was spun out of Google's DeepMind in 2021, has raised external capital for the first time. The $600

AI In Education - Up-level Your Teaching With AI By Cloning Yourself
Forbes 6month

How to level up your teaching with AI. Discover how to use clones and GPTs in your classroom—personalized AI teaching is the future.

Trump's Third Term - How AI Can Help To Overthrow The US Government
Forbes 6month

Trump's Third Term? AI already knows how this can be done. A study shows how OpenAI, Grok, DeepSeek & Google outline ways to dismantle U.S. democracy.

Sam Altman Says OpenAI Will Release an 'Open Weight' AI Model This Summer
Wired 6month

Sam Altman today revealed that OpenAI will release an open weight artificial intelligence model in the coming months. "We are excited to release a powerful new open-weight language model with reasoning in the coming months," Altman wrote on X.

Popular Topics