AI subscriptions are becoming less attractive

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

MMaximilian Schwarzmüller

Computing/SoftwareSmall Business/StartupsBusiness News

Transcript

00:00:00This morning I woke up seeing this post here on X which mentions that Enthropic seemingly pulled the

00:00:09Cloud Code code plug from the Pro plan so that you need the more expensive subscription plans

00:00:17in order to be able to use Cloud Code or use your subscription in Cloud Code.

00:00:22Now, Enthropic was quick to comment on this mentioning that this is just a small

00:00:27test they're running on only 2% of new prosumer signups.

00:00:32I find it kind of weird to run this kind of test and I also think that Enthropic could have

00:00:40seen what was coming their way when running a test like this, the impact a test like this would have

00:00:47on their image and what people would think because of course clearly this kind of fits the narrative

00:00:53or what we're already seeing where we're getting less usage out of our subscriptions, we see

00:00:59stronger limits or stricter limits, we see degrading model performance as it seems all these

00:01:08things were happening over the last couple of weeks. I mean, Enthropic aggressively cracked

00:01:14down the usage of their subscription outside of Cloud Code. If you wanted to use it with Open Cloud

00:01:21for example, they cracked down on that so that all kind of gives us a clear, bigger picture.

00:01:28And what kind of fits that picture or narrative is this news article GitHub published a couple

00:01:37of days ago where they made it clear that they would pause new signups for the GitHub Copilot Pro,

00:01:43Pro Plus and Student plans and that they are tightening usage limits for individual plans

00:01:49and most importantly that the Opus models are no longer available in the Pro plans and that of

00:01:56course all kind of makes sense but we have to dive a bit deeper into the economics of what's going on

00:02:02to understand why this is happening and most importantly what this means for us also in the

00:02:07future. It clearly means that the days of unlimited usage and heavy subsidies are over and to understand

00:02:17is we have to understand the economics of these subscriptions and of token usage you could say or

00:02:25token consumption because of course these subscription models offered by Enthropic by

00:02:34OpenAI by GitHub they really only work if the majority of users are not really using up all

00:02:43the available usage they have. That's pretty much the case for any subscription offering out there

00:02:49not just for these AI subscriptions. If you have a Netflix subscription and you spend 24/7 watching

00:02:56Netflix you will probably very likely not be a super profitable customer for them but most people

00:03:02don't do that and that is how these companies can make a profit. That's true for all subscriptions

00:03:10obviously. Now we can see the the true price or a price that's closer to the true price of our AI

00:03:19requests if we take a look at the API pricing pages of these companies there for example we can see

00:03:26that the latest model by Enthropic Claude Opus 4.7 has an input token price of five dollars per

00:03:35million tokens and an output token price of 25 dollars per million tokens and we can put that in

00:03:42relation to other models they have we can of course also put it into relation to what OpenAI has to

00:03:47offer for example. There we see that GPD 5.4 which most codex users are probably using right now has

00:03:54an input price of two dollars fifty per million tokens so only half of what we had for Opus 4.7

00:04:03and that we have an output price of 22.50 so a bit less than what we saw for Opus. Now

00:04:11it's probably fair to assume that these API prices are prices that leave these companies

00:04:20at a break-even point or a small profit regarding their cross margin so if we just look at the

00:04:29inference cost specifically we can probably assume that they will turn a profit if you use their

00:04:36APIs. Now of course for that it's important to understand that the cost of running AI models

00:04:43does in the end depend on two main factors it is the training of the AI models that costs money and

00:04:53it's the inference of course so we have these two factors here that come into play for these

00:04:59AI companies. Now of course the training cost is a one-time thing right so you train a model once

00:05:06and that is super expensive but obviously it's a one-time thing of course these companies then

00:05:12train more and more models and it's a new one-time cost for each model but once a model was trained

00:05:18it's no longer occurring any training cost except maybe for further fine-tuning runs or derived

00:05:25models from that base model but yeah the big chunk of cost is only occurred once. Now for inference

00:05:33naturally that's different this is an ongoing cost it's per request in the end because of course

00:05:41inference is the process of producing the concrete output for your prompt for your task that you send

00:05:48to a model provider and inference is of course what's happening all the time when you're using

00:05:53clot code when you're using codex but also of course when you send a prompt on chatgpt or in

00:05:58any other way. Now this of course is where you wanna be at least break even with your API pricing

00:06:07because otherwise it means that you lose money on every request you receive and whilst you could of

00:06:13course be doing that to grow your market share and whilst I wouldn't rule out that companies

00:06:19are occasionally doing that doing it long term of course will not be viable because you'll go

00:06:25out of business. Now naturally you also need to earn your training cost at some point so ideally

00:06:34these incoming requests your users are sending to you give you enough of a gross margin on your

00:06:41inference cost so that that margin also covers your training cost your staff cost and so on. So of

00:06:48course that is that's the economics of how you can run and use these AI models. Now as mentioned the

00:06:57API pricing is probably the part where these companies are not losing massive amounts of

00:07:02money but of course as a consumer as a customer you do if you were to power clot code with these

00:07:10on-demand prices of opus you would be paying way way more than if you were using their subscriptions

00:07:18because of course with the max subscription for example for only 200 bucks you're getting

00:07:26lots of usage out of this plan you'll get many millions of tokens out of this plan and if you

00:07:34take a look at what output tokens would normally cost you per million tokens you can see that

00:07:39normally if you ignore the input tokens which you shouldn't but if you ignore them for these two

00:07:44hundred dollars here we should not even get 10 million output tokens right because one million

00:07:51costs us 25 dollars so we should only get eight million output tokens and then if you consider

00:07:56input tokens it would be less than that and clearly if you had any long running sessions if you've been

00:08:02using a clot code for example for a week you and you track your token usage you will see that you

00:08:08can go above that limit and you definitely could in the past and that makes it obvious why the

00:08:14companies are kind of trying to limit how much usage you can get out of your subscriptions

00:08:19and why i think we'll see higher subscription prices definitely in the future maybe already

00:08:25in the near future now of course it's not super easy for these companies to increase their prices

00:08:30though because market share obviously all these companies want to aggressively capture market share

00:08:37the reasoning being that if you're the main company that's established as the coding agent provider in

00:08:45lots of enterprises out there in lots of companies out there they will probably pay higher subscription

00:08:51prices in the future so you don't want to start increasing your prices too early because that could

00:08:57drive some of your customers to your competition which you don't want of course on the other hand

00:09:02you don't want to go broke i mean for example open ai recently raised 122 billion dollars

00:09:09to accelerate the next phase of ai and you could read that this would only give them

00:09:17around 18 months of a runway so 18 months until they need to raise again so clearly you can't

00:09:26continue subsidizing all that usage forever because if you go out of business then all your customers

00:09:32are going to your competition anyway so there is a trade-off here and that's of course exactly the

00:09:39difficult spot these companies are facing right now that's the economics here now of course as you

00:09:44probably read and also felt if you're a gamer for example we're at a point in time where because of

00:09:52all the ai stuff that's happening we are facing a big compute scarcity and crisis and high prices for

00:10:01memory and everything related to what these ai models and these ai data centers need so

00:10:08memory is expensive because inference needs lots of memory if you tried running models

00:10:13locally on your system you know you need lots of memory for that so the memory prices went up

00:10:19but it's not just memory it's also networking gear because of course you're running both the training

00:10:25and the inference not on one single chip but on huge racks and clusters of chips and all these

00:10:31clusters need connections between the clusters between the chips so that you can build super

00:10:36gpu so to say and this networking gear is in high demand and therefore expensive and then of course

00:10:43we also have energy and data centers we need both we need data centers to put those chips and that's

00:10:52why lots of constructions happening there but then these data centers they need energy right and you

00:10:58heard about that too energy is another big problem you can't get it from the grid it's simply not

00:11:05built for that there isn't enough energy available there that's why all these new data centers are

00:11:12moving to off-grid solutions so where the energy is produced next to the data center with gas turbines

00:11:21or nuclear power but that of course all takes time and it also takes components and there isn't an

00:11:28endless amount of companies that can build these power plants there isn't an endless amount of

00:11:35components that are needed for building these power plants so that's all limiting the amount of

00:11:42compute that can go online which in turn is missing for the inference and of course also for the

00:11:48training now historically and with that i mean only like one or two years ago the incentive for

00:11:54these companies was to dedicate a lot of compute resources towards training because that gives you

00:12:00better models which lets you stay ahead or get ahead in the ai race and that incentive still exists but

00:12:07of course nowadays there also is a bigger incentive and higher importance here on the inference part

00:12:14because it's the inference part that gives you customers that gives you visibility in the market

00:12:19because if nobody can use your models then it's great that you have good models but you're you're

00:12:25not gaining any market share so you need inference that has become way more important so companies have

00:12:30to split the the scarce compute resources and data center capacities between these two ends and of

00:12:38course especially since the beginning of this year we're also seeing changed usage behavior of customers

00:12:45they the github news article here actually is pretty open about this agentic workflows have

00:12:51fundamentally changed co-pilots compute demands long-running paralyzed sessions now regularly

00:12:57consume far more resources than the original plan structure was built to support and it's the same of

00:13:04course for anthropic and open ai in the past and again this only means like a year ago or so

00:13:10these companies not primarily but to a huge extent really only focused on occasional chat sessions a

00:13:20user a customer would occasionally come on and ask chatgpt or claud a question and of course that

00:13:27could have been multiple times a day but it was just a couple of questions just a couple of answers

00:13:33a couple of follow-up questions of course way less tokens than all these long-running agentic

00:13:39workflows and coding sessions have in those coding sessions or whichever agentic workflows you're

00:13:44running you're burning through hundreds of thousands and millions of tokens quickly very quickly far

00:13:51quicker than you could with just your occasional chat session now given the fact that all these

00:13:58modern models we're dealing with are thinking models typically the the token amount also got

00:14:05higher compared to a year or two ago because a response simply takes more tokens due to that

00:14:12thinking process which of course are still tokens even if you don't see them in the final response

00:14:17maybe so therefore the amount of tokens consumed got way way bigger now than it was a year or two

00:14:24ago again bringing us to the point that inference is becoming more important because you need way

00:14:29more inference to handle all that token generation that is going on and that's the reason why all

00:14:37these new models are pretty expensive when used through the api but even more importantly why

00:14:43these subscriptions are so difficult for these companies right now they introduced those

00:14:49subscriptions in the past when there were way less tokens being consumed and now they're at a point

00:14:56where for the same subscription price people are now using way more tokens that's the difficulty

00:15:03now especially for anthropic for example i could imagine they are feeling the the pain a bit more

00:15:09than open ai not just because their models seem to be more expensive to run if you just take a look

00:15:16at the api pricing but also of course because historically even a year ago already anthropic

00:15:22had more enterprise and business customers which is good for them to some extent it's a it's a stable

00:15:29revenue base and chat gpt or open ai has been more consumer based they had more normal people normal

00:15:38consumers as customers and now they're also moving more towards business but historically because they

00:15:43had the chat gpt moment they had more normal people as customers the disadvantage for anthropic now of

00:15:50course is that these business customers are exactly the customers that are running these agentic

00:15:55workflows or that tend to run these agentic workflows i mean your mom and dad if they're

00:16:00paying for chat gpt at all which they likely don't they are not running agentic workflows

00:16:06but you are your company is and that of course makes the subscription even more difficult for

00:16:11anthropic i would imagine than for open ai where there still are plenty of normies in the subscription

00:16:18i would guess still they're definitely feeling the pain as well and what does this all mean now what

00:16:24do changes like this or changes like in this xpost where anthropic is running tests to pull clot code

00:16:32from the cheaper plants what does this all mean for us i think it's pretty obvious we'll see even

00:16:38stricter limits in the future and therefore we of course might reach a point where the subscriptions

00:16:42don't really feel like they're worth it anymore and i think that will be the point where we'll see

00:16:48higher prices it's not unreasonable i think to believe that these coding subscriptions or generally

00:16:55these agentic usage subscriptions will cost many thousand dollars a month at some point not it not

00:17:03this year most likely but at some point because of course companies may start comparing the cost of

00:17:10these subscriptions against the cost of employees yeah and that's of course not great news and it

00:17:17may be totally wrong but it is definitely what i think will happen and of course when you make

00:17:23that comparison there's a lot of room for these subscriptions to get way way more expensive

00:17:30obviously the subscriptions then won't be for the normal people anymore so i think we'll also see

00:17:35new subscription offerings for them which simply have way stricter usage limits which are enough

00:17:41for jetgpt but not enough for agentic workflows but for the professional use for the agentic

00:17:47workflows we'll see stricter limits and higher prices i'm not sure when because you know market

00:17:52share right so what i mentioned before but eventually we'll see that because ultimately

00:17:58as mentioned open ai has around 18 months of runway they probably want to stay in the business

00:18:03same for enthropic and therefore that is what i think we'll see here in a year or so i don't know

Key Takeaway

AI providers are ending the era of unlimited usage and heavy subsidies because agentic workflows are consuming far more tokens than original subscription models were priced to support, necessitating tighter limits and eventual price increases.

Highlights

Anthropic is testing the removal of Claude Code access from Pro plans, requiring more expensive subscription tiers.

GitHub stopped new signups for Pro, Pro Plus, and Student plans while tightening usage limits due to heavy consumption.

Claude Opus 4.7 API pricing is $5 per million input tokens and $25 per million output tokens.

GPT-5.4 API pricing is $2.50 per million input tokens and $22.50 per million output tokens.

Agentic workflows create significantly higher compute demands than occasional chat sessions, as they process millions of tokens rapidly.

OpenAI raised $122 billion, yet projections suggest this only provides approximately 18 months of operational runway without sustained profitability.

AI compute scarcity, specifically for high-demand networking gear, memory, and energy, is limiting the available capacity for training and inference.

Timeline

Shift in Subscription Access and Limits

Anthropic is testing the restriction of Claude Code to higher-tier plans for 2% of new prosumer signups.
GitHub paused new signups for Copilot Pro and Student plans to implement stricter individual usage limits.
Companies are aggressively curtailing usage outside of official platforms to protect subscription margins.

Evidence indicates a clear trend toward tightening constraints on AI subscriptions. Recent actions by Anthropic and GitHub demonstrate a move to protect compute resources as users shift toward more intensive usage patterns. These restrictions follow a pattern of stricter limits and perceived performance degradation across major AI services.

Economics of Token Consumption

Subscription models rely on the majority of users not exhausting their allotted usage limits.
API pricing serves as a benchmark for the true cost of inference, with Claude Opus 4.7 costing $25 per million output tokens.
Training costs are one-time capital expenditures, while inference costs are ongoing and directly linked to usage volume.

The current economic model for AI subscriptions is unsustainable when users transition from occasional chat to high-volume agentic workflows. API pricing from providers like Anthropic and OpenAI highlights the massive disparity between flat-rate subscriptions and the actual cost of high-volume inference. Companies are currently subsidizing heavy users, a practice that cannot continue long-term without compromising business viability.

Compute Scarcity and Infrastructure Pressures

High demand for specialized networking gear, memory, and energy is driving up the cost of AI infrastructure.
Data center expansion is constrained by power grid limitations, forcing reliance on expensive, off-grid energy solutions.
Providers must now split scarce compute capacity between model training and customer-facing inference.

Infrastructure requirements have become a critical bottleneck for AI companies. The need for massive GPU clusters creates extreme pressure on memory, networking, and electricity supply. Because energy cannot be easily scaled via the public grid, building new capacity is a slow and capital-intensive process that limits the overall available compute for both training and inference.

Future Outlook for AI Costs

Agentic workflows and 'thinking' models require significantly more tokens than simple chat interfaces.
Enterprise-focused companies like Anthropic are feeling usage-related cost pressures earlier due to their customer base.
Subscription prices for professional agentic tools will likely increase to compete with the cost of human labor.

The transition to agentic workflows has fundamentally altered the economics of AI usage. Models now generate far more tokens per task due to internal reasoning processes, vastly increasing the cost of inference. As companies move past their initial cash runways, they will likely segment subscriptions, offering lower-cost options for casual users while significantly raising prices for professional, agent-heavy workloads.

Community Posts

How to Cut Your Monthly AI Subscription Costs in Half

makedreamil y a 13 heures4650

Write about this video