What AI Chooses Not to See

In 2017, a founding paper declared that attention was all you needed. In 2026 the industry is quietly walking it back. To contain the cost, new models no longer learn to retain everything, but to choose what they will not look at.

In 2017, a paper with an almost arrogant title, "Attention Is All You Need," founded modern artificial intelligence. Its idea fit in a sentence: to understand a text, every word must consult every other. This total attention gave models their fluency. It also saddled them with a cost the industry now finds unbearable, because it grows with the square of the text's length.

On June 1, 2026, the Chinese firm MiniMax released M3, an open-weight model able to handle a million words in one pass. The trick has a name: sparse attention. Instead of making each word look at all the others, a lightweight branch first sorts which blocks of memory matter, and only those are read. MiniMax reports a per-word compute cost cut to one twentieth at a million words, and a setup phase nearly ten times faster.

Forgetting Becomes a Method

M3 is not an isolated case. At the 2026 ICLR conference, Google presented a method to compress the memory of words already read; everywhere, labs are multiplying ways to erase, pack down, and discard. At a million words, that memory devours 70 to 90 percent of a graphics card. The shared bet is plain: a machine that retains everything is too costly to run. So it is taught to forget, selectively.

The reversal deserves a pause. For eight years, progress meant seeing more, longer, finer. Here the gain comes from the opposite gesture, that of not looking. The craft no longer lies in remembering everything, but in choosing what to neglect without it showing. Performance shifts from memory to triage.

The Sorting No One Oversees

This economy resembles human attention more than we admit. We do not perceive everything: we filter, rank, forget, and that is precisely what frees us to think. A mind that held on to every detail would be paralyzed. Under the pressure of cost, engineers are rediscovering an old truth of cognition.

But a model that decides what it ignores also decides what it will never see. The branch that separates the relevant from the negligible becomes a quiet judge whose rulings no one can audit. In making AI leaner, we hand it a power we wield without noticing: that of looking away.