The End of Transformer Models
// by Pål Machulla, Architect 0, Aiakaki
The next time you ask an AI assistant for help with a complex task, imagine if it could remember your entire conversation history without prompting. Not just the last few exchanges, but every prior interaction spanning months — your preferences, ongoing projects, and personal context — all seamlessly integrated into its responses.
Large language models have exploded into public consciousness, dazzling us with their ability to converse, create, and process information. They feel brilliant, like top graduates from elite universities. Yet for all their impressive capabilities, today's AI assistants operate with a curious limitation: a remarkably short attention span, often called their "context window".
Imagine talking to that brilliant new graduate. They can discuss complex theories, but if you give them a long document and ask a question that requires understanding nuanced details from across many pages, they might struggle. Or, over a long project, you have to constantly remind them of previous discussions and decisions, because they can only hold so much recent interaction in their active memory.
This limitation will soon become a relic of the past.
## The Coming Memory Revolution
By this time next year, we'll likely witness a fundamental transformation in how AI agents function — one that extends far beyond incremental improvements in their conversational abilities.
The culprit behind current AI's short attention span is the foundational architecture powering most modern language models: the Transformer. The way a Transformer processes information involves comparing every incoming piece of information with every other piece it has seen in the current context.
Think of it like a meticulous scribe who, before writing the next word, must reread and compare every single word on a scroll of everything written so far. As the scroll gets longer, the amount of comparison work explodes. This creates both computational challenges and practical limitations for users.
This inherent inefficiency of the Transformer at scale is why researchers believe context is the next frontier in scaling AI. We've seen models grow more knowledgeable and articulate, but their fundamental ability to process large amounts of information at once has remained constrained.
## Enter the Long-Memory Agents
Enter subquadratic architectures. Unlike the Transformer's quadratic cost and growing state, these next-generation models aim for a linear cost — the computational work increases only proportionally to the context length — and maintain a fixed state size.
They are like a skilled note-taker who reads through a long document, constantly summarizing and compressing the information into a small, fixed-size notebook, keeping only the most essential points readily available. This allows them to process vast amounts of text without the computational burden exploding.
What does this mean for the everyday user? Within a year, we can expect AI assistants that can:
Maintain consistent awareness of your ongoing projects without requiring constant reminders
Analyze entire books or research papers rather than just excerpts
Learn your communication style and preferences over extended interactions
Develop personalized knowledge bases that evolve naturally through conversation
The practical applications are profound. Imagine a research assistant that can hold your entire literature review in mind while helping you draft a paper. Or a coding partner that remembers the full architecture of your application, not just the current function you're writing.
## Beyond Conversation: True Digital Companions
We're missing out on key benefits: the ability for AI agents to learn from their mistakes over extended interactions, the potential for simply showing the AI an entire dataset instead of needing complex fine-tuning, and the possibility of creating deeply personalized AI instances just by feeding them relevant personal or professional history.
This shift from short-term to long-term memory transforms AI assistants from clever but forgetful tools into something more akin to collaborative partners. The difference is similar to working with a brilliant but distracted colleague versus a dedicated team member who remembers your shared history and grows more effective with each interaction.
One group actively pushing these boundaries is Manifest AI, a research lab focused on the fundamental architectural shifts needed for long context. They are developing a family of subquadratic architectures known as linear Transformers. Their work exemplifies a broader industry movement toward models that can efficiently process and remember vast amounts of information.
## A Bold Timeline
The pace of this transition is predicted to be rapid. Jacob Buckman, CEO of Manifest AI and a researcher with extensive experience and citations, makes bold predictions:
By the end of 2025, every major AI development company will, at the very least, be actively working on a subquadratic foundation model.
By the end of 2026, Transformer models will largely be replaced, with almost nobody using them as the dominant architecture.
By the end of 2027, we will begin to fully realize the transformative benefits of long-context models.
If these predictions hold true, by this time next year we'll already be experiencing the early fruits of this architectural revolution. The AI assistants in our phones, computers, and workplaces will begin showing signs of a more persistent memory — recognizing patterns in our longer conversations, maintaining awareness of complex tasks over time, and requiring fewer explicit reminders of previously discussed information.
## The End of Forgetting
The era of AI's short attention span, defined by the limitations of the Transformer architecture, appears to be drawing to a close. Soon, we'll look back at our current need to constantly remind our AI assistants of prior context as quaintly primitive, like recalling the days of dial-up internet or flip phones.
The future belongs to models that can efficiently process and remember vast amounts of information, unlocking new capabilities and potentially reshaping the AI landscape itself.
For users, this means digital assistants that finally fulfill their promise: tools that truly understand us not just in the moment, but over time. They'll recognize the evolution of our thinking, recall our preferences without prompting, and provide continuity across interactions that span days, weeks, or months.
We stand at the threshold of AI's long memory revolution. And when it arrives — likely sooner than most expect — it will fundamentally transform how these systems serve as extensions of our own thinking, working not just as tools but as genuine cognitive partners with the capacity to remember.