IQh - a new way of pricing AI systems
Amidst all of the discussions around costs and capabilities of AI reasoning, it's the perfect time to rethink how these AI systems are priced.
As a physics major, I distinctly recall my professors stressing the importance of getting the units of measure correct. Even if the final number was off by a factor of pi, oftentimes having the appropriate units and being within the right order of magnitude earned me at least partial credit on exams. Looking ahead into the future of AI has me pondering the question: are tokens the right units to price AI models?
Take a simple example:
“Is 7428958698238956023987723967931 prime?” - roughly 20 tokens
“Take this hour long call transcript, plus this 30 page compliance guide and tell me if the sales person asked about pre-existing conditions“ - roughly 30,000 tokens.
Should these input and output tokens be priced the same? Clearly a model system will apply more effort answering one prompt over the other.
In these simple examples, high-level model tiers like 2.5, Pro, or Lite may suffice. However this starts to break down as model systems become increasingly dynamic and involve many inference steps to generate answers. While most are touting 2025 as the year of Agents, one of the most important trends is the rise of chain-of-thought reasoning systems, which involve multiple parallel model inferences to generate, critique, refine, and reason about a prompt. DeepSeek’s R1 reasoning model took the world by storm, open sourcing the architecture and model - making reasoning capabilities significantly cheaper. They weren’t the first to build an architecture in this direction though, OpenAI’s o-3 unveiling just at the end of last year sparked a huge debate around the future AI system design striving towards general intelligence. Meta also just released a paper outlining a reasoning system called COCONUT which uses embeddings rather than text at intermediate steps to improve performance and efficiency. Even early-stage startups like Fireworks have been building model systems for complex reasoning, and no doubt this system design pattern will continue to evolve well into 2025 and beyond.
One could argue that chain-of-thought reasoning is itself an Agentic pattern, but I’m not here to play semantics. Autonomously applying reinforcement learning and supervised-fine-tuning on systems of models at scale seems beyond the scope of database queries and customer service workflows that most people refer to as Agents but I digress. Pricing these reasoning systems requires a slew of new economic considerations that encapsulate how much time and effort went into generating an answer.
Number of refinement steps generating the answer
Number of refinement steps occurring in parallel
Size of the models at each step
Latency requirements
Quantity of reasoning tokens exchanged between steps
Amount of context pulled from retrieval or search systems
So many more…
As the levers of control with these systems become increasingly granular, I would bet that engineering teams will be able to provision reasoning capabilities with a certain degree of accuracy and precision. “I want a system with an IQ of 100 to work on this problem for up to 4 hours” more accurately represents how people think about labor economics. When you think about it, this is actually how people price consulting services - a higher hourly rate for a more capable person.
These reasoning systems can even dynamically route a prompt to an appropriately powerful model, and even generate reasoning plans within budget constraints. The reasoning capabilities of a model combined with the amount of effort to arrive at answer seems to be the best way to price these systems.
This led me to the notion of IQ-hour or IQh which succinctly captures the interplay between reasoning capability and effort applied. Of course the measure of IQ doesn’t come without its limitations and caveats. The relationship between IQ and socioeconomic status is highly contested, and IQ doesn’t capture domain specific performance. In my head, IQ is distinct from domain knowledge which comes in the form of context, like pulling medical research into a prompt. The model system then reasons on the body of knowledge with varying degrees of IQ. All of this said, IQ is the most widely accepted unit for quantifying intelligence and reasoning, and makes for a good shorthand approximation here.
An analogous unit to IQh in physics is the kWh unit of energy which is how your electricity usage is billed, where kW represents the power generated multiplied by time consumed in hours. The simple example would be with 10wH, you can turn on a 1W light for 10 hours, or turn on a more powerful 10W light for 1h. In this case, IQ represents the reasoning power of the model, multiplied by the time it gets used. A powerful model would solve a task in less time but could still cost the same as a less powerful model taking longer. time. After all, Andrew Ng famously proclaimed that “AI is the new electricity” back in 2017, great to see this coming to fruition.
I could see an argument that IQh is simply an extension of a GPU-hour metric, and that each step in the chain of thought reasoning can be priced by the amount of time the computer is being utilized. However I tend to favor output pricing over input pricing; your utility doesn’t bill for the natural gas consumed or cost of solar installations, they just charge you for the kWh you use. If you have two competing reasoning offerings, each could price at the same GPU-hour level but have different outcomes based on each system design. True outcome based pricing in these systems is intractable because of their general nature, and determining the value of outcomes is often an arduous task. Just ask your an engineer about the value of addressing technical debt and you’re in for hours of debate..
IQ is also more fungible as a potential standardized definition of reasoning, whereas it goes without saying that not all high performance compute clusters are created equal. As Nvidia releases new chipsets every year, AMD and others race to compete, and many cloud providers are provisioning their own chips. It’s easier to consider lowering the cost of IQh, than increasing the reasoning capability of a GPU-hr because compute clusters are so heterogeneous and sensitive to the economics of purchasing the hardware. Each cloud provider procures and provisions with their own methodology, but the output unit of IQ can be applied ubiquitously across any application.
I think IQh concept is well aligned with the economics and heuristics of building AI systems. Even if we don’t get down to IQ precision of measuring reasoning, some combination of reasoning power and time is the right direction. In the spirit of strong opinions, weakly held - I’d love to hear how other people are thinking about pricing AI reasoning systems.