AI subscriptions are becoming less attractive

MMaximilian Schwarzmüller
Computing/SoftwareSmall Business/StartupsBusiness News

Transcript

00:00:00This morning I woke up seeing this post here on X which mentions that Enthropic seemingly pulled the
00:00:09Cloud Code code plug from the Pro plan so that you need the more expensive subscription plans
00:00:17in order to be able to use Cloud Code or use your subscription in Cloud Code.
00:00:22Now, Enthropic was quick to comment on this mentioning that this is just a small
00:00:27test they're running on only 2% of new prosumer signups.
00:00:32I find it kind of weird to run this kind of test and I also think that Enthropic could have
00:00:40seen what was coming their way when running a test like this, the impact a test like this would have
00:00:47on their image and what people would think because of course clearly this kind of fits the narrative
00:00:53or what we're already seeing where we're getting less usage out of our subscriptions, we see
00:00:59stronger limits or stricter limits, we see degrading model performance as it seems all these
00:01:08things were happening over the last couple of weeks. I mean, Enthropic aggressively cracked
00:01:14down the usage of their subscription outside of Cloud Code. If you wanted to use it with Open Cloud
00:01:21for example, they cracked down on that so that all kind of gives us a clear, bigger picture.
00:01:28And what kind of fits that picture or narrative is this news article GitHub published a couple
00:01:37of days ago where they made it clear that they would pause new signups for the GitHub Copilot Pro,
00:01:43Pro Plus and Student plans and that they are tightening usage limits for individual plans
00:01:49and most importantly that the Opus models are no longer available in the Pro plans and that of
00:01:56course all kind of makes sense but we have to dive a bit deeper into the economics of what's going on
00:02:02to understand why this is happening and most importantly what this means for us also in the
00:02:07future. It clearly means that the days of unlimited usage and heavy subsidies are over and to understand
00:02:17is we have to understand the economics of these subscriptions and of token usage you could say or
00:02:25token consumption because of course these subscription models offered by Enthropic by
00:02:34OpenAI by GitHub they really only work if the majority of users are not really using up all
00:02:43the available usage they have. That's pretty much the case for any subscription offering out there
00:02:49not just for these AI subscriptions. If you have a Netflix subscription and you spend 24/7 watching
00:02:56Netflix you will probably very likely not be a super profitable customer for them but most people
00:03:02don't do that and that is how these companies can make a profit. That's true for all subscriptions
00:03:10obviously. Now we can see the the true price or a price that's closer to the true price of our AI
00:03:19requests if we take a look at the API pricing pages of these companies there for example we can see
00:03:26that the latest model by Enthropic Claude Opus 4.7 has an input token price of five dollars per
00:03:35million tokens and an output token price of 25 dollars per million tokens and we can put that in
00:03:42relation to other models they have we can of course also put it into relation to what OpenAI has to
00:03:47offer for example. There we see that GPD 5.4 which most codex users are probably using right now has
00:03:54an input price of two dollars fifty per million tokens so only half of what we had for Opus 4.7
00:04:03and that we have an output price of 22.50 so a bit less than what we saw for Opus. Now
00:04:11it's probably fair to assume that these API prices are prices that leave these companies
00:04:20at a break-even point or a small profit regarding their cross margin so if we just look at the
00:04:29inference cost specifically we can probably assume that they will turn a profit if you use their
00:04:36APIs. Now of course for that it's important to understand that the cost of running AI models
00:04:43does in the end depend on two main factors it is the training of the AI models that costs money and
00:04:53it's the inference of course so we have these two factors here that come into play for these
00:04:59AI companies. Now of course the training cost is a one-time thing right so you train a model once
00:05:06and that is super expensive but obviously it's a one-time thing of course these companies then
00:05:12train more and more models and it's a new one-time cost for each model but once a model was trained
00:05:18it's no longer occurring any training cost except maybe for further fine-tuning runs or derived
00:05:25models from that base model but yeah the big chunk of cost is only occurred once. Now for inference
00:05:33naturally that's different this is an ongoing cost it's per request in the end because of course
00:05:41inference is the process of producing the concrete output for your prompt for your task that you send
00:05:48to a model provider and inference is of course what's happening all the time when you're using
00:05:53clot code when you're using codex but also of course when you send a prompt on chatgpt or in
00:05:58any other way. Now this of course is where you wanna be at least break even with your API pricing
00:06:07because otherwise it means that you lose money on every request you receive and whilst you could of
00:06:13course be doing that to grow your market share and whilst I wouldn't rule out that companies
00:06:19are occasionally doing that doing it long term of course will not be viable because you'll go
00:06:25out of business. Now naturally you also need to earn your training cost at some point so ideally
00:06:34these incoming requests your users are sending to you give you enough of a gross margin on your
00:06:41inference cost so that that margin also covers your training cost your staff cost and so on. So of
00:06:48course that is that's the economics of how you can run and use these AI models. Now as mentioned the
00:06:57API pricing is probably the part where these companies are not losing massive amounts of
00:07:02money but of course as a consumer as a customer you do if you were to power clot code with these
00:07:10on-demand prices of opus you would be paying way way more than if you were using their subscriptions
00:07:18because of course with the max subscription for example for only 200 bucks you're getting
00:07:26lots of usage out of this plan you'll get many millions of tokens out of this plan and if you
00:07:34take a look at what output tokens would normally cost you per million tokens you can see that
00:07:39normally if you ignore the input tokens which you shouldn't but if you ignore them for these two
00:07:44hundred dollars here we should not even get 10 million output tokens right because one million
00:07:51costs us 25 dollars so we should only get eight million output tokens and then if you consider
00:07:56input tokens it would be less than that and clearly if you had any long running sessions if you've been
00:08:02using a clot code for example for a week you and you track your token usage you will see that you
00:08:08can go above that limit and you definitely could in the past and that makes it obvious why the
00:08:14companies are kind of trying to limit how much usage you can get out of your subscriptions
00:08:19and why i think we'll see higher subscription prices definitely in the future maybe already
00:08:25in the near future now of course it's not super easy for these companies to increase their prices
00:08:30though because market share obviously all these companies want to aggressively capture market share
00:08:37the reasoning being that if you're the main company that's established as the coding agent provider in
00:08:45lots of enterprises out there in lots of companies out there they will probably pay higher subscription
00:08:51prices in the future so you don't want to start increasing your prices too early because that could
00:08:57drive some of your customers to your competition which you don't want of course on the other hand
00:09:02you don't want to go broke i mean for example open ai recently raised 122 billion dollars
00:09:09to accelerate the next phase of ai and you could read that this would only give them
00:09:17around 18 months of a runway so 18 months until they need to raise again so clearly you can't
00:09:26continue subsidizing all that usage forever because if you go out of business then all your customers
00:09:32are going to your competition anyway so there is a trade-off here and that's of course exactly the
00:09:39difficult spot these companies are facing right now that's the economics here now of course as you
00:09:44probably read and also felt if you're a gamer for example we're at a point in time where because of
00:09:52all the ai stuff that's happening we are facing a big compute scarcity and crisis and high prices for
00:10:01memory and everything related to what these ai models and these ai data centers need so
00:10:08memory is expensive because inference needs lots of memory if you tried running models
00:10:13locally on your system you know you need lots of memory for that so the memory prices went up
00:10:19but it's not just memory it's also networking gear because of course you're running both the training
00:10:25and the inference not on one single chip but on huge racks and clusters of chips and all these
00:10:31clusters need connections between the clusters between the chips so that you can build super
00:10:36gpu so to say and this networking gear is in high demand and therefore expensive and then of course
00:10:43we also have energy and data centers we need both we need data centers to put those chips and that's
00:10:52why lots of constructions happening there but then these data centers they need energy right and you
00:10:58heard about that too energy is another big problem you can't get it from the grid it's simply not
00:11:05built for that there isn't enough energy available there that's why all these new data centers are
00:11:12moving to off-grid solutions so where the energy is produced next to the data center with gas turbines
00:11:21or nuclear power but that of course all takes time and it also takes components and there isn't an
00:11:28endless amount of companies that can build these power plants there isn't an endless amount of
00:11:35components that are needed for building these power plants so that's all limiting the amount of
00:11:42compute that can go online which in turn is missing for the inference and of course also for the
00:11:48training now historically and with that i mean only like one or two years ago the incentive for
00:11:54these companies was to dedicate a lot of compute resources towards training because that gives you
00:12:00better models which lets you stay ahead or get ahead in the ai race and that incentive still exists but
00:12:07of course nowadays there also is a bigger incentive and higher importance here on the inference part
00:12:14because it's the inference part that gives you customers that gives you visibility in the market
00:12:19because if nobody can use your models then it's great that you have good models but you're you're
00:12:25not gaining any market share so you need inference that has become way more important so companies have
00:12:30to split the the scarce compute resources and data center capacities between these two ends and of
00:12:38course especially since the beginning of this year we're also seeing changed usage behavior of customers
00:12:45they the github news article here actually is pretty open about this agentic workflows have
00:12:51fundamentally changed co-pilots compute demands long-running paralyzed sessions now regularly
00:12:57consume far more resources than the original plan structure was built to support and it's the same of
00:13:04course for anthropic and open ai in the past and again this only means like a year ago or so
00:13:10these companies not primarily but to a huge extent really only focused on occasional chat sessions a
00:13:20user a customer would occasionally come on and ask chatgpt or claud a question and of course that
00:13:27could have been multiple times a day but it was just a couple of questions just a couple of answers
00:13:33a couple of follow-up questions of course way less tokens than all these long-running agentic
00:13:39workflows and coding sessions have in those coding sessions or whichever agentic workflows you're
00:13:44running you're burning through hundreds of thousands and millions of tokens quickly very quickly far
00:13:51quicker than you could with just your occasional chat session now given the fact that all these
00:13:58modern models we're dealing with are thinking models typically the the token amount also got
00:14:05higher compared to a year or two ago because a response simply takes more tokens due to that
00:14:12thinking process which of course are still tokens even if you don't see them in the final response
00:14:17maybe so therefore the amount of tokens consumed got way way bigger now than it was a year or two
00:14:24ago again bringing us to the point that inference is becoming more important because you need way
00:14:29more inference to handle all that token generation that is going on and that's the reason why all
00:14:37these new models are pretty expensive when used through the api but even more importantly why
00:14:43these subscriptions are so difficult for these companies right now they introduced those
00:14:49subscriptions in the past when there were way less tokens being consumed and now they're at a point
00:14:56where for the same subscription price people are now using way more tokens that's the difficulty
00:15:03now especially for anthropic for example i could imagine they are feeling the the pain a bit more
00:15:09than open ai not just because their models seem to be more expensive to run if you just take a look
00:15:16at the api pricing but also of course because historically even a year ago already anthropic
00:15:22had more enterprise and business customers which is good for them to some extent it's a it's a stable
00:15:29revenue base and chat gpt or open ai has been more consumer based they had more normal people normal
00:15:38consumers as customers and now they're also moving more towards business but historically because they
00:15:43had the chat gpt moment they had more normal people as customers the disadvantage for anthropic now of
00:15:50course is that these business customers are exactly the customers that are running these agentic
00:15:55workflows or that tend to run these agentic workflows i mean your mom and dad if they're
00:16:00paying for chat gpt at all which they likely don't they are not running agentic workflows
00:16:06but you are your company is and that of course makes the subscription even more difficult for
00:16:11anthropic i would imagine than for open ai where there still are plenty of normies in the subscription
00:16:18i would guess still they're definitely feeling the pain as well and what does this all mean now what
00:16:24do changes like this or changes like in this xpost where anthropic is running tests to pull clot code
00:16:32from the cheaper plants what does this all mean for us i think it's pretty obvious we'll see even
00:16:38stricter limits in the future and therefore we of course might reach a point where the subscriptions
00:16:42don't really feel like they're worth it anymore and i think that will be the point where we'll see
00:16:48higher prices it's not unreasonable i think to believe that these coding subscriptions or generally
00:16:55these agentic usage subscriptions will cost many thousand dollars a month at some point not it not
00:17:03this year most likely but at some point because of course companies may start comparing the cost of
00:17:10these subscriptions against the cost of employees yeah and that's of course not great news and it
00:17:17may be totally wrong but it is definitely what i think will happen and of course when you make
00:17:23that comparison there's a lot of room for these subscriptions to get way way more expensive
00:17:30obviously the subscriptions then won't be for the normal people anymore so i think we'll also see
00:17:35new subscription offerings for them which simply have way stricter usage limits which are enough
00:17:41for jetgpt but not enough for agentic workflows but for the professional use for the agentic
00:17:47workflows we'll see stricter limits and higher prices i'm not sure when because you know market
00:17:52share right so what i mentioned before but eventually we'll see that because ultimately
00:17:58as mentioned open ai has around 18 months of runway they probably want to stay in the business
00:18:03same for enthropic and therefore that is what i think we'll see here in a year or so i don't know

Key Takeaway

AI providers are ending the era of unlimited usage and heavy subsidies because agentic workflows are consuming far more tokens than original subscription models were priced to support, necessitating tighter limits and eventual price increases.

Highlights

Anthropic is testing the removal of Claude Code access from Pro plans, requiring more expensive subscription tiers.

GitHub stopped new signups for Pro, Pro Plus, and Student plans while tightening usage limits due to heavy consumption.

Claude Opus 4.7 API pricing is $5 per million input tokens and $25 per million output tokens.

GPT-5.4 API pricing is $2.50 per million input tokens and $22.50 per million output tokens.

Agentic workflows create significantly higher compute demands than occasional chat sessions, as they process millions of tokens rapidly.

OpenAI raised $122 billion, yet projections suggest this only provides approximately 18 months of operational runway without sustained profitability.

AI compute scarcity, specifically for high-demand networking gear, memory, and energy, is limiting the available capacity for training and inference.

Timeline

Shift in Subscription Access and Limits

  • Anthropic is testing the restriction of Claude Code to higher-tier plans for 2% of new prosumer signups.
  • GitHub paused new signups for Copilot Pro and Student plans to implement stricter individual usage limits.
  • Companies are aggressively curtailing usage outside of official platforms to protect subscription margins.

Evidence indicates a clear trend toward tightening constraints on AI subscriptions. Recent actions by Anthropic and GitHub demonstrate a move to protect compute resources as users shift toward more intensive usage patterns. These restrictions follow a pattern of stricter limits and perceived performance degradation across major AI services.

Economics of Token Consumption

  • Subscription models rely on the majority of users not exhausting their allotted usage limits.
  • API pricing serves as a benchmark for the true cost of inference, with Claude Opus 4.7 costing $25 per million output tokens.
  • Training costs are one-time capital expenditures, while inference costs are ongoing and directly linked to usage volume.

The current economic model for AI subscriptions is unsustainable when users transition from occasional chat to high-volume agentic workflows. API pricing from providers like Anthropic and OpenAI highlights the massive disparity between flat-rate subscriptions and the actual cost of high-volume inference. Companies are currently subsidizing heavy users, a practice that cannot continue long-term without compromising business viability.

Compute Scarcity and Infrastructure Pressures

  • High demand for specialized networking gear, memory, and energy is driving up the cost of AI infrastructure.
  • Data center expansion is constrained by power grid limitations, forcing reliance on expensive, off-grid energy solutions.
  • Providers must now split scarce compute capacity between model training and customer-facing inference.

Infrastructure requirements have become a critical bottleneck for AI companies. The need for massive GPU clusters creates extreme pressure on memory, networking, and electricity supply. Because energy cannot be easily scaled via the public grid, building new capacity is a slow and capital-intensive process that limits the overall available compute for both training and inference.

Future Outlook for AI Costs

  • Agentic workflows and 'thinking' models require significantly more tokens than simple chat interfaces.
  • Enterprise-focused companies like Anthropic are feeling usage-related cost pressures earlier due to their customer base.
  • Subscription prices for professional agentic tools will likely increase to compete with the cost of human labor.

The transition to agentic workflows has fundamentally altered the economics of AI usage. Models now generate far more tokens per task due to internal reasoning processes, vastly increasing the cost of inference. As companies move past their initial cash runways, they will likely segment subscriptions, offering lower-cost options for casual users while significantly raising prices for professional, agent-heavy workloads.

Community Posts

View all posts