FunctionGemma: Steering Your Phone Without Calling the Cloud

Google slipped a 270-million-parameter model into the phone: it books an event, adds a contact, flips the flashlight without any data leaving for the cloud. At what cost?

"Create a lunch event for tomorrow at noon." The words are murmured on a subway car, with no signal. A second later the event sits in the calendar and the invite is ready to go. Nothing in that exchange has left the phone: not the voice, not the time, not the guest's name. The model that understood the command and pressed the right button fits in a handful of megabytes, parked permanently inside the device.

In December 2025 Google released FunctionGemma, a 270-million-parameter model built for a single job: turning a plain-language instruction into a function call, that is, a command the system can execute. Add a contact, switch on the flashlight, set a reminder. Tiny by the standards of frontier models, it is not meant to hold forth: it acts, and it does so where you are, without reaching for a distant server.

Intelligence pared to the bone

FunctionGemma is derived from Gemma 3 270M, Google's smallest open release. Its specialty is not conversation but translating intent into JSON, the structured format that apps know how to read. Its 256,000-entry vocabulary tokenizes those technical calls efficiently, where a generalist model would waste memory and power.

The figure that matters comes from an in-house evaluation called "Mobile Actions." Untuned, the model picked the right action roughly half the time, 58%. After fine-tuning on real phone gestures, its reliability climbs to 85%. The gap says everything about the method: the goal is not a universal mind but a narrow, dependable executor for a known catalogue of moves. Where a large model is asked to know everything, this one is asked only to never fumble the few things it does.

FunctionGemma does not stand alone. On Android, Gemini Nano already lives inside a system service, AICore, summarizing an article, proofreading a message or transcribing a voice with no connection. On iPhone, Apple Intelligence runs a roughly three-billion-parameter model on the neural chip, backed by a local index of the device's data. Three approaches, one idea: bring the intelligence down into the hand instead of housing it far away.

Why "on-device" is not a detail

The difference is not one of comfort but of architecture. When the model runs locally and the feature has no permission to reach the network, the data physically cannot leave. Privacy stops being a promise written into terms of service: it becomes a material constraint. Google wraps Gemini Nano in its Private Compute Core, and Apple indexes the phone's data with differential-privacy techniques.

The benefit also shows up in seconds and in round trips avoided. No journey to a data center: the answer comes off the chip, centimeters from the screen. On the subway, the plane or the basement of a parking garage, the assistant keeps working when the network has given up. For anyone who dictates three reminders a day and fixes ten messages, those are frictions removed one by one, and a device that obeys without first asking permission far away.

The tiny brain has blind spots

The smallness that makes the virtue also makes the limit. A 270-million-parameter model does not reason, it recognizes. Its 85% reliability reads both ways: nearly one command in seven is misread. Tolerable for switching on a light, far less so for confirming a transfer or sending a message to the wrong person. The fast executor remains an executor to watch, and the more it acts unprompted, the higher the stakes of the seventh time.

This local intelligence also carries an entry fee. Apple Intelligence demands an iPhone 15 Pro or newer, able to hold multi-gigabyte models in memory; Gemini Nano depends on a recent Android and the presence of AICore. Older phones stay tethered to the cloud. The promise of data that never moves becomes, in practice, a privilege reserved for the newest, and therefore priciest, hardware. The privacy that an architecture can guarantee turns, quietly, into something you buy.

The model handles only a catalogue of expected actions; any request outside it must be sent elsewhere.
Running a neural network continuously draws on battery and storage.
Updating the model depends on the manufacturer, not the user.

The cloud that climbs back through the window

The subtlest point lies in the boundary itself. Consumer assistants are not purely local: they are hybrid. On Android, the agentic layer pairs an on-device Gemini Nano for simple tasks with a remote, cloud-based Gemini for complex, multi-step requests. The small model then plays a switchboard: it decides what to handle on the spot and what to forward to the server.

But forwarding to the server is precisely letting the data out. The gesture you thought was local turns remote again the moment the request outgrows the chip, and the user rarely sees where the line falls. No light flashes to announce that this particular sentence is about to travel. "Local first" quietly becomes "local when convenient," set by a threshold the manufacturer chooses, not the person holding the phone.

The real question, then, is not the model's power but the location of that boundary, and the hand holding the pen that draws it. A phone that keeps the line where the user wants it truly serves them; a phone that nudges it along serves the platform first. The intelligence has indeed come down into the hand. What remains to be seen is who decides what it keeps to itself, and what it goes on whispering far away.