
From tokenmaxxing to tokenmatching
LLMs are constantly evolving, becoming both more powerful and more specialized. Being able to route a prompt to the model that is both well-suited for the task and cost-effective is the way to maximize token effectiveness. Teams are doing this manually now, but AI itself will become the best way to make such decisions.
For example, Claude Code Router can route prompts to any number of popular models, depending on the type of work each prompt requires. And it’s open source.
The next layer that is coming is the preprocessing of prompts. We can work to write good prompts, but AI itself can improve upon what we ask. One of the best techniques in prompting is to tell the LLM to “ask the questions that I’m not asking but should be asking”. I can easily imagine a world in which you write a prompt, AI helps you clarify it, improves it, and then routes it to the best, most cost-effective model for an answer.

