Pricing your SaaS Products in the AI Age

Daniel McKinnon
May 6, 2026

People don't like "dumb" software. That's why companies and developers are rushing to embed AI into their software products, only to be hit with expensive API bills.

Adding a “rewrite” button looks like a simple API call, but the truth is, instead of working like a fixed-cost server, it works like a taxi meter. Imagine how many tokens are consumed with users editing photos or videos. Every click makes the bill go up. Now, the cost scales not with your infrastructure but with human behavior.

People don’t use AI features like the settings menu. They don’t just tap buttons and be satisfied with the results. When they’re dissatisfied with the first few options, they click the “regenerate” button repeatedly until they see what they’re actually looking for. Long-winded outputs and repeat attempts destroy your margins in ways that are hard to see until the bill arrives. The problem is you don’t have much control over user behavior and it eats into your revenues.

There’s also the compounding effect that may be difficult to foresee because you’re having fun adding AI features to your software: summary tool, chat assistant, auto-tagging and more. You feel the need to add every available feature because you fear that others will outcompete you. While each feature feels cheap on its own, the costs stack exponentially. You then pay for the same data to be processed in four different ways. Often, a huge chunk of this data are duplicates. You pay for the same result twice.

If your team is treating AI like a static utility, you’re in for a great surprise when the API bills arrive and your credit card declines. Shipping the feature without walls makes it susceptible to abuse from power users, edge cases who use more tokens than the rest of your customers. If you’re unprepared to deal with this, expect your tech budget to collapse. Your most active users are also your biggest liabilities.

The solution starts with careful planning. Setting soft caps that alert your team when you’re approaching spending limits and hard caps that stop usage automatically serve as early-warning. Track AI spending to see where people are using their tokens then optimize that area. Instead of averaging costs, you should make predictions with user behavior. Build your cost model around your power users then stress-test the model.

Since tokens are where you’ll spend most of your money on, it helps to pay attention to how you manage your API calls. Instead of having the model read your entire prompt all the time, you need to try prompt caching. This way, you are storing a block of tokens on the provider’s servers so that next time you send a request with that same block, the model doesn’t have to re-process the request from scratch. This approach is best for cases when system prompts and tools lists are huge, long conversations and when users ask multiple questions about the same document or codebase. Aside from caching, batching can also be implemented. This way, you’re not asking the model to give an instant answer, but rather, you tell it to reply within 24 hours. The model will process your request during off-peak hours which results in savings. This is best for data enrichment, summarization or any request that does not require immediate replies.

Then, the most crucial step is to use more efficient models fit for the task. For example, lighter models may be sufficient for less complex tasks while the more advanced models can be called upon for processing more complicated ones.

The primary problem that companies face with putting AI on their software isn’t infrastructure. It’s designing their software without going over the budget. Optimising your use of AI is becoming a key requirement in new development.

Enter your email to download this resource
Oops! Something went wrong while submitting the form.

People don't like "dumb" software. That's why companies and developers are rushing to embed AI into their software products, only to be hit with expensive API bills.

Adding a “rewrite” button looks like a simple API call, but the truth is, instead of working like a fixed-cost server, it works like a taxi meter. Imagine how many tokens are consumed with users editing photos or videos. Every click makes the bill go up. Now, the cost scales not with your infrastructure but with human behavior.

People don’t use AI features like the settings menu. They don’t just tap buttons and be satisfied with the results. When they’re dissatisfied with the first few options, they click the “regenerate” button repeatedly until they see what they’re actually looking for. Long-winded outputs and repeat attempts destroy your margins in ways that are hard to see until the bill arrives. The problem is you don’t have much control over user behavior and it eats into your revenues.

There’s also the compounding effect that may be difficult to foresee because you’re having fun adding AI features to your software: summary tool, chat assistant, auto-tagging and more. You feel the need to add every available feature because you fear that others will outcompete you. While each feature feels cheap on its own, the costs stack exponentially. You then pay for the same data to be processed in four different ways. Often, a huge chunk of this data are duplicates. You pay for the same result twice.

If your team is treating AI like a static utility, you’re in for a great surprise when the API bills arrive and your credit card declines. Shipping the feature without walls makes it susceptible to abuse from power users, edge cases who use more tokens than the rest of your customers. If you’re unprepared to deal with this, expect your tech budget to collapse. Your most active users are also your biggest liabilities.

The solution starts with careful planning. Setting soft caps that alert your team when you’re approaching spending limits and hard caps that stop usage automatically serve as early-warning. Track AI spending to see where people are using their tokens then optimize that area. Instead of averaging costs, you should make predictions with user behavior. Build your cost model around your power users then stress-test the model.

Since tokens are where you’ll spend most of your money on, it helps to pay attention to how you manage your API calls. Instead of having the model read your entire prompt all the time, you need to try prompt caching. This way, you are storing a block of tokens on the provider’s servers so that next time you send a request with that same block, the model doesn’t have to re-process the request from scratch. This approach is best for cases when system prompts and tools lists are huge, long conversations and when users ask multiple questions about the same document or codebase. Aside from caching, batching can also be implemented. This way, you’re not asking the model to give an instant answer, but rather, you tell it to reply within 24 hours. The model will process your request during off-peak hours which results in savings. This is best for data enrichment, summarization or any request that does not require immediate replies.

Then, the most crucial step is to use more efficient models fit for the task. For example, lighter models may be sufficient for less complex tasks while the more advanced models can be called upon for processing more complicated ones.

The primary problem that companies face with putting AI on their software isn’t infrastructure. It’s designing their software without going over the budget. Optimising your use of AI is becoming a key requirement in new development.

Enter your email to download this resource
Oops! Something went wrong while submitting the form.