I built a universal AI agent that can actually use your PC

11 Jun 2026 14:51 21,394 views
A new prototype Windows tool lets AI agents control your computer like a human: clicking, typing, opening apps, and even troubleshooting problems on screen. Here’s how it works, why it matters, and what it could be used for next.

Most AI tools today can explain how to do something on your computer—but they can’t actually do it for you. This project flips that idea on its head: it’s a prototype Windows tool that lets an AI agent literally use your PC like a human, moving the mouse, clicking buttons, typing, and navigating apps based on what it sees on the screen.

What this AI agent actually does

The core idea is simple: instead of just giving you instructions, the AI gets to control your computer directly. The tool takes screenshots of your desktop, sends them to an AI model, and then executes whatever actions the model decides—clicks, key presses, scrolling, or typing.

In one demo, the AI is asked to open Microsoft Paint and draw a self-portrait. With no special integrations or plugins, it:

• Installs or opens MS Paint
• Navigates the interface visually
• Selects the brush tool and different colors
• Draws a face, eyes, mouth, and hair by clicking and dragging on the canvas

Everything is done the way a human would, by looking at the screen and interacting with visible UI elements.

Real-world use case: troubleshooting your PC

Beyond fun demos, this kind of agent is genuinely useful for troubleshooting. For example, imagine you can’t delete a folder and Windows only shows a vague error message. With this tool, you can simply ask the AI, “Why can’t I delete this test folder?” and let it investigate.

In the example shared, the agent:

• Opens the Run dialog
• Launches Resource Monitor
• Switches to the CPU tab
• Searches open handles for the folder name
• Identifies that the Command Prompt (CMD) is locking the folder

Instead of you needing to know where Resource Monitor is or how to search handles, the AI figures it out visually and explains the cause. Future versions could even take the next step—closing the culprit process and then deleting the folder automatically.

How the agent sees and controls your computer

This tool is designed to mimic how a human uses a PC, not how a traditional automation script works. It doesn’t rely on app-specific integrations, plugins, or UI Automation hooks. If a human can use an app by looking at the screen, the AI can too.

Under the hood, the agent can perform a full range of input actions:

• Mouse: left, middle, and right click; double-click; click-and-drag; move without clicking; scroll
• Keyboard: type text; send key combinations with modifiers (Ctrl, Alt, Shift, etc.)
• Timing: wait between actions when needed

Each of these actions is wired directly into the Windows API, so the EXE can reliably simulate real user input without extra dependencies.

Compatible with major AI models

The agent is model-agnostic and works with any of the three major AI providers:

• OpenAI
• Google (e.g., Gemini)
• Anthropic (Claude)

You provide your own API key for the model you want to use, and the tool sends screenshots plus context to that model. The model responds with a plan of actions, which the EXE then executes on your machine.

Newer fast models—especially lightweight vision-capable ones like Gemini 3.5 Flash—help keep latency manageable. The agent can also queue multiple actions for the same screen (for example, drawing several elements in Paint) to reduce the number of round trips.

No Python, no Node.js, no complex setup

Most existing AI agent frameworks are aimed at developers. They often require Python, Node.js, multiple packages, and a fair bit of configuration before you see anything work. This project takes a different approach: it’s a single compiled EXE written in C# using only .NET and Microsoft packages.

Key points about the architecture:

• One-file deployment: just run the EXE, no runtime installation required
• No third-party dependencies: no Python, Node.js, or extra frameworks
• Built for high-resolution screens: tested at 4K with normal scaling, unlike many tools that require low resolutions or scaling down below 1080p

Right now, it only supports Windows, but the design allows for MacOS and Linux backends if someone implements the necessary OS-specific input and screen-capture layers.

Why “computer use” is such a big deal

Many AI companies talk about “agents” and “computer use” APIs, but most of those are just interfaces for describing what an AI would do, not actually doing it. They still rely on a separate app or framework to click and type on your behalf—and those apps often don’t exist or are hard to set up.

This project argues that real computer control is one of the most practical uses of AI agents. Imagine if your built-in assistant on Windows could:

• Open system settings and navigate to the exact HDR toggle instead of giving outdated text instructions
• Adjust display, audio, or privacy settings by itself
• Install and configure apps while you watch
• Perform repetitive admin tasks across multiple programs

Instead of a chatbot that tells you what to do, you’d have an operator that just does it.

If you’re exploring the broader agent ecosystem, it pairs nicely with what’s covered in roundups of AI agents that are actually worth using, but with a stronger focus on direct desktop control.

Prototype status and limitations

This tool is very much a prototype, not something you’d deploy in a production environment yet. A few important limitations and design choices:

• Not a 24/7 agent: it’s meant for task-based usage—give it a job, let it run, then stop it
• Step-limited: you can configure a maximum number of steps so it doesn’t run indefinitely
• Latency: every decision requires a screenshot and a model response, so it’s slower than a human who knows exactly where to click
• Manual API setup: you must bring your own API key, and usage will cost money depending on the model and volume

You can pause the agent mid-task or send new instructions while it’s working, which makes it more interactive. But it’s still early-stage software, with rough edges and room for improvement.

Planned improvements and future ideas

There’s a long list of potential upgrades that could make this kind of agent dramatically more powerful and reliable:

• Clipboard access: so the AI can copy, paste, and inspect text or images directly
• UI Automation support: using accessibility and automation APIs to “see” elements behind the scenes, improving accuracy and reducing mis-clicks
• Better explanations: clearer summaries of what went wrong or what the agent discovered during troubleshooting
• Conversation continuity: the ability to easily chain tasks, like “find what’s locking this folder” followed by “fix it and delete the folder”

Microsoft’s own research frameworks, like UFO, already explore some of these ideas for UI Automation, but they’re not packaged as user-friendly apps. This project aims to bridge that gap by turning research concepts into something you can actually run on your desktop.

Availability, licensing, and usage

The tool is available as source-available software on GitHub. It’s free for personal use but not licensed for commercial use at this stage. The creator is keeping tight control over the codebase for now—no open pull requests—so feedback is welcome, but contributions go through them directly.

Because it’s a prototype, you should treat it as an experimental tool:

• Expect bugs and quirks
• Don’t rely on it for critical workflows
• Be mindful of what you let it control, especially with elevated permissions

That said, if you’re interested in AI agents and desktop automation, it’s a fascinating glimpse into what near-future tools could look like. It also complements more specialized AI workflows, like the ones used to build products in guides such as how an AI-powered SaaS was built with agent-style tools.

Why this matters for the future of AI agents

AI agents are often pitched as “digital workers,” but most of them are still trapped inside browsers, APIs, or narrow integrations. Letting an AI see your screen and control your computer safely is a critical step toward agents that can genuinely replace tedious, click-heavy tasks.

Instead of living in the background 24/7, this kind of agent is designed to be a focused helper: you sit at your PC, give it a task you don’t want to do, and watch it handle the boring parts. As models get faster and more reliable, this approach could become one of the most practical, everyday uses of AI on the desktop.

Share:

Comments

No comments yet. Be the first to share your thoughts!

More in AI Agents