Gemini Nano and Siri AI: Your Assistant Learns to Skip the Cloud

On the Galaxy S26 and the new Siri, part of the assistant is leaving the cloud for the phone's own chip. More privacy and autonomy, with a few strings attached.

On a Galaxy S26, when a suspicious call comes in, something unusual happens: the phone listens to the conversation for you. While the caller fishes for a code, pushes you to install an app, or tries to move you onto another messaging service, an AI model spots these social-engineering moves and warns you. The detail that matters sits in one line from Google: the audio is processed ephemerally, nothing is recorded, stored on the device, or sent to anyone. The computation happens on the chip, not on some distant server.

That shift, from the cloud into your pocket, is the real story. For a decade the voice assistant lived elsewhere: you spoke, your voice raced off to a data center, the answer came back. In 2026, part of that intelligence is moving back into the phone itself. In June, at its developer conference, Apple unveiled a new Siri built on in-house models whose lightest version runs entirely on the device. Samsung and Google, for their part, run Gemini Nano locally on the Galaxy S26. The question is no longer just what the assistant can do, but where it does it.

A three-billion-parameter model in your pocket

The turn hinges on a slice of silicon: the NPU, a processor dedicated to AI computation, now etched into every high-end phone. It is what lets models once confined to servers run on the spot. Apple describes a family of models whose on-device core, called AFM 3 Core, holds three billion parameters: enough to understand a request, summarize a text, sort photos, without ever opening a connection.

The advantage is first of all physical. A request handled locally has no journey to make: no round trip to a data center, no network wait. The answer is instant, and it stays available when the network does not. The model no longer sleeps in the cloud, it lives in the device you are holding.

What you gain when the computation stays local

The first gain is privacy, and it is anything but theoretical. Take scam detection: to judge whether a call is fraudulent, you have to analyze its content, which means listening. Handing that listening to a server would mean routing your most ordinary conversations through a company. By processing everything on the chip, the phone can read without keeping: the audio evaporates once analyzed. Your most intimate piece of data, your voice, never leaves your hand.

The second gain is independence from the network. Translation, transcription, dictation that work offline mean an assistant that follows you onto a plane, into a tunnel, into a foreign country with no data plan. Where the connected assistant falls silent the moment the signal bar drops, the on-device assistant keeps going. You reclaim a capability that no longer depends on a subscription or on coverage.

The third gain is quieter: latency. An assistant that answers in the instant changes how you use it. You speak to it the way you turn your head, without the delay that used to remind you, every time, that a machine on the other side of the world was thinking in your place.

The half that stays in the cloud

Still, "on the device" does not mean "everything, on the device." A three-billion-parameter model, however deft, weighs far less than the giants that answer online. It excels at short, well-framed tasks, and stalls on long reasoning or encyclopedic knowledge. Apple owns this: a local orchestrator handles what it can, then escalates the heavy requests to its servers, a setup called Private Cloud Compute, engineered to be stateless and verifiable, but which remains, by definition, outside your pocket.

The boundary shifts, and above all it is invisible. You do not decide, request by request, what stays local and what leaves for elsewhere: the device arbitrates for you. The privacy promise then becomes conditional, hanging on a trust in how that split is done. "On the device" is a sales pitch that is half true, and that half is not always the one you assume.

Reserved for the newest phones

There is one last toll, a hardware one. Running these models demands a recent chip and memory: at Samsung, on-device intelligence requires at least twelve gigabytes of RAM and a latest-generation processor, which, as of today, rules out anything that is not a Galaxy S26. Scam detection, for its part, works only in English and in the United States for now, and it is not enabled by default.

The irony bites. This protection against fraudsters is exactly what would help most on a parent's aging phone, and that is precisely where it will not arrive for a long while. The dividend of local AI, more privacy, more autonomy, goes first to those who renew their device at full price. Privacy, along the way, starts to look like a premium service.

A swing of the pendulum

For a decade, everything climbed toward the cloud: our files, our photos, our assistants. Watching a share of that intelligence come back down into the object we hold is a quiet reversal, and a rather healthy one. An assistant that reasons on the spot, with no informant and no signal, hands the phone's owner a little of the control the cloud had taken from them.

Provided you read the fine print. The autonomy regained is real, but bounded: by hardware you must renew, by an invisible line between chip and server, by a model that will always stay more modest than the one facing it. The right question to ask your next phone is not "can it do this," but "does it do it without anything leaving." The answer, more and more often, will be: partly.