OpenAI has introduced a tool named Operator. It automates actions in a browser window through an AI model called Computer-Using Agent (CUA). This design processes screenshot data and simulates input from a mouse or keyboard to finish tasks on websites.

The software is in a testing phase, available to Pro subscribers in the US for $200 (£161) each month. Creators plan a larger rollout once they gather enough feedback.

CUA is part of a move toward AI that executes tasks rather than simply giving text answers. Its debut follows earlier launches from Google (Project Mariner) and Anthropic, each presenting their own browser automation systems late last year.

Google’s Project Mariner, shown in December 2024, automates actions through Chrome, while Anthropic’s system, presented in October 2024, can move a cursor and perform commands. Operator’s look is reminiscent of those, with an eye on daily usage.

 

How Does It Work?

 

Operator relies on CUA, which scans a screenshot of the computer’s screen and picks an action such as clicking or typing. GPT-4o’s skill with visual content is combined with extra training.

It captures an image, interprets the elements, then triggers simulated inputs. This repeats until it completes the request or calls on the user.

In tests, it handles repeated tasks well, for instance placing orders or filling forms. Editing text at length, or using tools like tables and calendars, sees a lower success percentage, possibly from less experience in those scenarios.

OpenAI has reported an 87% score on the WebVoyager benchmark for sites such as Amazon and Google Maps. On a test named WebArena, results drop to 58.1%. The company hopes to improve these marks over time.

 

 

How Is It Secured?

 

OpenAI added multiple layers to guard against misuse. The tool rejects risky operations, for instance sending bank transfers. It also requests user approval before finalising big actions, such as making a purchase.

Access to sensitive sites is monitored, and the tool doesn’t visit categories such as gambling or adult content. Internal tests revealed it blocked nearly all attempts at hidden text instructions, though one slipped through.

Simon Willison, an engineer, warns that inventive attacks might eventually crack these safeguards once more users experiment with the model. He points out the track record of large language models succumbing to trickery. OpenAI says it will keep refining these protections and searching for gaps.

Privacy measures are present as well. Screen images go to OpenAI’s servers, and users who don’t want their data used for training can switch off model improvement settings. A single button press deletes browsing data and logs out from all sites.

 

Where Can It Be Found?

 

For now, access is limited to US-based Pro subscribers paying 200 dollars monthly. Plans exist to allow more subscription options later. The development team suggests users watch the small browser preview, especially when entering passwords or card details.

A few businesses have been enlisted for pilot schemes, such as DoorDash and Uber. The concept is to manage everyday chores, from reordering groceries to booking transport, letting the AI handle tasks that would otherwise consume time.

OpenAI intends to refine the tool through user feedback, increase how reliable it is, and eventually merge these features into ChatGPT. The firm also plans to release CUA via an API, granting developers the resources to build browser-based AI agents.

This may take away the tedious form-filling and repetitive tasks online, possibly making web usage less of a burden. OpenAI expects that with more trials and improvement, tasks that once took multiple actions can be finished quickly with minimal user effort.





Source link

Share.
Leave A Reply

© 2025 The News Times UK. Designed and Owned by The News Times UK.
Exit mobile version