Microsoft’s new Fara-7B represents a significant breakthrough in on-device AI automation technology. This 7-billion-parameter computer-use agent (CUA) model developed by Microsoft Research is optimized for UI automation that runs directly on your device rather than in the cloud. The model processes screenshots, action history, and task goals to generate precise actions like mouse clicks, keyboard inputs, and coordinate selections. Fara-7B incorporates Critical Points technology that pauses for user approval when encountering potentially irreversible actions.
On-device AI that sees your screen and takes action, without the cloud—a true desktop automation breakthrough.
What makes Fara-7B particularly impressive is its ability to interact directly with pixels on screen without requiring access to underlying code or accessibility features. This means it can work with virtually any application interface, including proprietary software and complex web applications. The model consolidates perception, reasoning, and action generation into a single neural network rather than chaining multiple models together. It supports 128k tokens for reasoning over extensive UI interaction sequences. Similar to message middleware, Fara-7B breaks free from traditional constraints to facilitate communication between distributed systems.
The training process involved approximately 145,000 verified trajectories covering over 1 million individual interaction steps. Microsoft’s FaraGen system generated synthetic data from real webpages and human task examples, creating a cost-efficient pipeline that produced high-quality training examples for roughly $1 per verified trajectory.
In benchmark testing, Fara-7B achieved a 73.5% success rate on the WebVoyager benchmark for UI navigation, outperforming larger models like GPT-4o when both were evaluated as computer-use agents. The model efficiently handles common tasks such as:
- Filling out forms
- Extracting information from websites
- Updating account details
- Completing booking and purchasing workflows
- Managing cross-site operations
Privacy benefits are substantial since all processing occurs on your device. Screenshots and interaction data never leave your computer, addressing regulatory requirements like HIPAA and GLBA. The model’s compact size allows it to run on consumer-grade GPUs and Copilot+ PCs with NPU acceleration, delivering low-latency performance without cloud dependencies.
Microsoft has made Fara-7B widely available through Microsoft Foundry, Hugging Face, and an open GitHub repository, enabling developers and users to experiment with and integrate this powerful automation technology into their workflows.