Back

Google Gemini’s Task Automation Shows Promise Despite Early Hiccups

Overview of Gemini Task Automation

Google has introduced a beta version of task automation for its Gemini AI on Android devices. The feature lets Gemini interact directly with apps such as food‑delivery and rideshare services, performing actions on the user’s behalf while the phone is used for other tasks.

How the Feature Works

When a user initiates an automation, Gemini operates in the background. A small text bar appears at the bottom of the screen describing the current step, for example when selecting menu items in a food‑ordering app. The AI proceeds through the app’s interface—tapping, scrolling, and entering information—until it reaches the final confirmation screen, where the user must manually approve the order or ride.

Performance and User Experience

Early testing shows that Gemini is noticeably slower than a human performing the same actions. Users reported that a simple Uber order can take several minutes, and the AI sometimes makes incorrect selections that it later corrects on its own. The system can also pause when an app requires additional permissions, such as location access, or when a delivery address needs adjustment. After the issue is resolved, the automation can be restarted without trouble.

Accuracy and Reliability

Despite the slower pace, the AI’s final outputs are often accurate. In most cases, only minor tweaks are needed before confirming an order. When failures occur, they tend to happen early in the process, usually within the first couple of minutes, and are linked to app‑specific prompts that the AI cannot handle automatically.

Practical Use Cases Demonstrated

Testers tried several scenarios. One involved ordering a chicken teriyaki combo, where Gemini correctly added two half‑portion servings after interpreting the menu layout. Another scenario scheduled a ride to an airport based on a calendar entry. By accessing email and calendar data, Gemini identified flight details, suggested appropriate departure times, and set up an Uber reservation after user confirmation.

Implications for Future Mobile Assistants

The experience underscores that current mobile apps, designed primarily for human interaction, are not optimized for AI control. Gemini’s reasoning‑based approach highlights the need for developers to adopt more robust integration methods, such as Model Context Protocol (MCP) or Android’s app functions, which would provide cleaner data interfaces for AI agents.

Industry Perspective

Sameer Samat, Google’s head of Android, noted that Gemini’s current method is a stopgap that leverages reasoning while other integration techniques mature. Observers see the beta as a glimpse of what AI‑driven mobile assistants could become once apps are built with AI compatibility in mind.

Conclusion

While Gemini’s task automation is still slow, occasionally error‑prone, and limited to a handful of services, it demonstrates a functional AI assistant capable of navigating real‑world apps. The feature’s limitations point to a broader industry challenge: redesigning app interfaces to better serve AI agents. If developers adopt newer protocols, future versions of Gemini could provide a smoother, faster, and more reliable mobile assistant experience.

Used: News Factory APP - news discovery and automation - ChatGPT for Business

Source: The Verge

Also available in: