Microsoft Copilot Vision is here, letting AI see what you do online


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Microsoft Copilot is getting smarter by the day. The Satya Nadella-led company has just announced that its AI assistant now has ‘vision’ capabilities that enable it to browse the internet with users.

While the feature was first announced in October this year, the company is now previewing it with a select set of Pro subscribers. According to Microsoft, these users will be able to trigger Copilot Vision on webpages opened on their Edge browser and interact with it regarding the contents visible on the screen.

The feature is still in the early stages of development and pretty restricted, but once fully evolved, it could prove to be a game-changer for Microsoft’s enterprise customers — helping them with analysis and decision-making as they interact with products the company has in its ecosystem (OneDrive, Excel, SharePoint, etc.) 

In the long run, it will also be interesting to see how Copilot Vision fares against more open and capable agentic offerings, such as those from Anthropic and Emergence AI, that allow developers to integrate agents to see, reason and take actions across applications from different vendors.

What to expect with Copilot Vision?

When a user opens a website, they may or may not have an intended goal. But, when they do, like researching for an academic paper, the process of executing the desired task revolves around going through the website, reading all its content and then taking a call on it (like whether the website’s content should be used as a reference for the paper or not). The same applies to other day-to-day web tasks like shopping.

With the new Copilot Vision experience, Microsoft aims to make this entire process simpler. Essentially, the user now has an assistant that sits at the bottom of their browser and can be called upon whenever needed to read the contents of the website, covering all the texts and images, and help with decision-making. 

It can immediately scan, analyze and provide all the required information, considering the intended goal of the user — just like a second set of eyes.

The capability has far-reaching benefits — it can accelerate your workflows in not time — as well as major implications, given the agent is reading and assessing whatever you’re browsing. However, Microsoft has assured that all the context and information shared by the users is deleted as soon as the Vision session is closed. It also noted that websites’ data is not captured/stored for training the underlying models.

“In short, we’re prioritizing copyright, creators, and our user’s privacy and safety – and are putting them all first,” the Copilot team wrote in a blog post announcing the preview of the capability.

Expansion based on feedback

Currently, a select set of Copilot Pro subscribers in the US, who have signed up for the early-access Copilot Labs program, will be able to use vision capabilities in their Edge browser. The capability will be opt-in, which means they don’t have to worry about AI reading their screens all the time. 

Further, at this stage, it will only work with select websites. Microsoft says it will take feedback from the early users and gradually improve the capability while expanding support to more Pro users and other websites. 

In the long run, the company may even expand these capabilities to other products in its ecosystem, such as OneDrive and Excel, allowing enterprise users to work and make decisions more easily. However, there’s no official confirmation yet. Not to mention, given the cautious approach signaled here, it may take some time to become a reality. 

Microsoft’s move to launch Copilot Vision’s preview comes at a time when competitors are pushing the bar in the agentic AI space. Salesforce has already rolled out AgentForce across its Customer 360 offerings to automate workflows across domains like sales, marketing and service. 

Meanwhile, Anthropic has launched ‘Computer Use,’ which allows developers to integrate Claude to interact with a computer desktop environment, performing tasks that were previously handled only by human workers, such as opening applications, interacting with interfaces and filling out forms.



Source link

About The Author