00:00:00This morning I woke up seeing this post here on X which mentions that Enthropic seemingly pulled the
00:00:09Cloud Code code plug from the Pro plan so that you need the more expensive subscription plans
00:00:17in order to be able to use Cloud Code or use your subscription in Cloud Code.
00:00:22Now, Enthropic was quick to comment on this mentioning that this is just a small
00:00:27test they're running on only 2% of new prosumer signups.
00:00:32I find it kind of weird to run this kind of test and I also think that Enthropic could have
00:00:40seen what was coming their way when running a test like this, the impact a test like this would have
00:00:47on their image and what people would think because of course clearly this kind of fits the narrative
00:00:53or what we're already seeing where we're getting less usage out of our subscriptions, we see
00:00:59stronger limits or stricter limits, we see degrading model performance as it seems all these
00:01:08things were happening over the last couple of weeks. I mean, Enthropic aggressively cracked
00:01:14down the usage of their subscription outside of Cloud Code. If you wanted to use it with Open Cloud
00:01:21for example, they cracked down on that so that all kind of gives us a clear, bigger picture.
00:01:28And what kind of fits that picture or narrative is this news article GitHub published a couple
00:01:37of days ago where they made it clear that they would pause new signups for the GitHub Copilot Pro,
00:01:43Pro Plus and Student plans and that they are tightening usage limits for individual plans
00:01:49and most importantly that the Opus models are no longer available in the Pro plans and that of
00:01:56course all kind of makes sense but we have to dive a bit deeper into the economics of what's going on
00:02:02to understand why this is happening and most importantly what this means for us also in the
00:02:07future. It clearly means that the days of unlimited usage and heavy subsidies are over and to understand
00:02:17is we have to understand the economics of these subscriptions and of token usage you could say or
00:02:25token consumption because of course these subscription models offered by Enthropic by
00:02:34OpenAI by GitHub they really only work if the majority of users are not really using up all
00:02:43the available usage they have. That's pretty much the case for any subscription offering out there
00:02:49not just for these AI subscriptions. If you have a Netflix subscription and you spend 24/7 watching
00:02:56Netflix you will probably very likely not be a super profitable customer for them but most people
00:03:02don't do that and that is how these companies can make a profit. That's true for all subscriptions
00:03:10obviously. Now we can see the the true price or a price that's closer to the true price of our AI
00:03:19requests if we take a look at the API pricing pages of these companies there for example we can see
00:03:26that the latest model by Enthropic Claude Opus 4.7 has an input token price of five dollars per
00:03:35million tokens and an output token price of 25 dollars per million tokens and we can put that in
00:03:42relation to other models they have we can of course also put it into relation to what OpenAI has to
00:03:47offer for example. There we see that GPD 5.4 which most codex users are probably using right now has
00:03:54an input price of two dollars fifty per million tokens so only half of what we had for Opus 4.7
00:04:03and that we have an output price of 22.50 so a bit less than what we saw for Opus. Now
00:04:11it's probably fair to assume that these API prices are prices that leave these companies
00:04:20at a break-even point or a small profit regarding their cross margin so if we just look at the
00:04:29inference cost specifically we can probably assume that they will turn a profit if you use their
00:04:36APIs. Now of course for that it's important to understand that the cost of running AI models
00:04:43does in the end depend on two main factors it is the training of the AI models that costs money and
00:04:53it's the inference of course so we have these two factors here that come into play for these
00:04:59AI companies. Now of course the training cost is a one-time thing right so you train a model once
00:05:06and that is super expensive but obviously it's a one-time thing of course these companies then
00:05:12train more and more models and it's a new one-time cost for each model but once a model was trained
00:05:18it's no longer occurring any training cost except maybe for further fine-tuning runs or derived
00:05:25models from that base model but yeah the big chunk of cost is only occurred once. Now for inference
00:05:33naturally that's different this is an ongoing cost it's per request in the end because of course
00:05:41inference is the process of producing the concrete output for your prompt for your task that you send
00:05:48to a model provider and inference is of course what's happening all the time when you're using
00:05:53clot code when you're using codex but also of course when you send a prompt on chatgpt or in
00:05:58any other way. Now this of course is where you wanna be at least break even with your API pricing
00:06:07because otherwise it means that you lose money on every request you receive and whilst you could of
00:06:13course be doing that to grow your market share and whilst I wouldn't rule out that companies
00:06:19are occasionally doing that doing it long term of course will not be viable because you'll go
00:06:25out of business. Now naturally you also need to earn your training cost at some point so ideally
00:06:34these incoming requests your users are sending to you give you enough of a gross margin on your
00:06:41inference cost so that that margin also covers your training cost your staff cost and so on. So of
00:06:48course that is that's the economics of how you can run and use these AI models. Now as mentioned the
00:06:57API pricing is probably the part where these companies are not losing massive amounts of
00:07:02money but of course as a consumer as a customer you do if you were to power clot code with these
00:07:10on-demand prices of opus you would be paying way way more than if you were using their subscriptions
00:07:18because of course with the max subscription for example for only 200 bucks you're getting
00:07:26lots of usage out of this plan you'll get many millions of tokens out of this plan and if you
00:07:34take a look at what output tokens would normally cost you per million tokens you can see that
00:07:39normally if you ignore the input tokens which you shouldn't but if you ignore them for these two
00:07:44hundred dollars here we should not even get 10 million output tokens right because one million
00:07:51costs us 25 dollars so we should only get eight million output tokens and then if you consider
00:07:56input tokens it would be less than that and clearly if you had any long running sessions if you've been
00:08:02using a clot code for example for a week you and you track your token usage you will see that you
00:08:08can go above that limit and you definitely could in the past and that makes it obvious why the
00:08:14companies are kind of trying to limit how much usage you can get out of your subscriptions
00:08:19and why i think we'll see higher subscription prices definitely in the future maybe already
00:08:25in the near future now of course it's not super easy for these companies to increase their prices
00:08:30though because market share obviously all these companies want to aggressively capture market share
00:08:37the reasoning being that if you're the main company that's established as the coding agent provider in
00:08:45lots of enterprises out there in lots of companies out there they will probably pay higher subscription
00:08:51prices in the future so you don't want to start increasing your prices too early because that could
00:08:57drive some of your customers to your competition which you don't want of course on the other hand
00:09:02you don't want to go broke i mean for example open ai recently raised 122 billion dollars
00:09:09to accelerate the next phase of ai and you could read that this would only give them
00:09:17around 18 months of a runway so 18 months until they need to raise again so clearly you can't
00:09:26continue subsidizing all that usage forever because if you go out of business then all your customers
00:09:32are going to your competition anyway so there is a trade-off here and that's of course exactly the
00:09:39difficult spot these companies are facing right now that's the economics here now of course as you
00:09:44probably read and also felt if you're a gamer for example we're at a point in time where because of
00:09:52all the ai stuff that's happening we are facing a big compute scarcity and crisis and high prices for
00:10:01memory and everything related to what these ai models and these ai data centers need so
00:10:08memory is expensive because inference needs lots of memory if you tried running models
00:10:13locally on your system you know you need lots of memory for that so the memory prices went up
00:10:19but it's not just memory it's also networking gear because of course you're running both the training
00:10:25and the inference not on one single chip but on huge racks and clusters of chips and all these
00:10:31clusters need connections between the clusters between the chips so that you can build super
00:10:36gpu so to say and this networking gear is in high demand and therefore expensive and then of course
00:10:43we also have energy and data centers we need both we need data centers to put those chips and that's
00:10:52why lots of constructions happening there but then these data centers they need energy right and you
00:10:58heard about that too energy is another big problem you can't get it from the grid it's simply not
00:11:05built for that there isn't enough energy available there that's why all these new data centers are
00:11:12moving to off-grid solutions so where the energy is produced next to the data center with gas turbines
00:11:21or nuclear power but that of course all takes time and it also takes components and there isn't an
00:11:28endless amount of companies that can build these power plants there isn't an endless amount of
00:11:35components that are needed for building these power plants so that's all limiting the amount of
00:11:42compute that can go online which in turn is missing for the inference and of course also for the
00:11:48training now historically and with that i mean only like one or two years ago the incentive for
00:11:54these companies was to dedicate a lot of compute resources towards training because that gives you
00:12:00better models which lets you stay ahead or get ahead in the ai race and that incentive still exists but
00:12:07of course nowadays there also is a bigger incentive and higher importance here on the inference part
00:12:14because it's the inference part that gives you customers that gives you visibility in the market
00:12:19because if nobody can use your models then it's great that you have good models but you're you're
00:12:25not gaining any market share so you need inference that has become way more important so companies have
00:12:30to split the the scarce compute resources and data center capacities between these two ends and of
00:12:38course especially since the beginning of this year we're also seeing changed usage behavior of customers
00:12:45they the github news article here actually is pretty open about this agentic workflows have
00:12:51fundamentally changed co-pilots compute demands long-running paralyzed sessions now regularly
00:12:57consume far more resources than the original plan structure was built to support and it's the same of
00:13:04course for anthropic and open ai in the past and again this only means like a year ago or so
00:13:10these companies not primarily but to a huge extent really only focused on occasional chat sessions a
00:13:20user a customer would occasionally come on and ask chatgpt or claud a question and of course that
00:13:27could have been multiple times a day but it was just a couple of questions just a couple of answers
00:13:33a couple of follow-up questions of course way less tokens than all these long-running agentic
00:13:39workflows and coding sessions have in those coding sessions or whichever agentic workflows you're
00:13:44running you're burning through hundreds of thousands and millions of tokens quickly very quickly far
00:13:51quicker than you could with just your occasional chat session now given the fact that all these
00:13:58modern models we're dealing with are thinking models typically the the token amount also got
00:14:05higher compared to a year or two ago because a response simply takes more tokens due to that
00:14:12thinking process which of course are still tokens even if you don't see them in the final response
00:14:17maybe so therefore the amount of tokens consumed got way way bigger now than it was a year or two
00:14:24ago again bringing us to the point that inference is becoming more important because you need way
00:14:29more inference to handle all that token generation that is going on and that's the reason why all
00:14:37these new models are pretty expensive when used through the api but even more importantly why
00:14:43these subscriptions are so difficult for these companies right now they introduced those
00:14:49subscriptions in the past when there were way less tokens being consumed and now they're at a point
00:14:56where for the same subscription price people are now using way more tokens that's the difficulty
00:15:03now especially for anthropic for example i could imagine they are feeling the the pain a bit more
00:15:09than open ai not just because their models seem to be more expensive to run if you just take a look
00:15:16at the api pricing but also of course because historically even a year ago already anthropic
00:15:22had more enterprise and business customers which is good for them to some extent it's a it's a stable
00:15:29revenue base and chat gpt or open ai has been more consumer based they had more normal people normal
00:15:38consumers as customers and now they're also moving more towards business but historically because they
00:15:43had the chat gpt moment they had more normal people as customers the disadvantage for anthropic now of
00:15:50course is that these business customers are exactly the customers that are running these agentic
00:15:55workflows or that tend to run these agentic workflows i mean your mom and dad if they're
00:16:00paying for chat gpt at all which they likely don't they are not running agentic workflows
00:16:06but you are your company is and that of course makes the subscription even more difficult for
00:16:11anthropic i would imagine than for open ai where there still are plenty of normies in the subscription
00:16:18i would guess still they're definitely feeling the pain as well and what does this all mean now what
00:16:24do changes like this or changes like in this xpost where anthropic is running tests to pull clot code
00:16:32from the cheaper plants what does this all mean for us i think it's pretty obvious we'll see even
00:16:38stricter limits in the future and therefore we of course might reach a point where the subscriptions
00:16:42don't really feel like they're worth it anymore and i think that will be the point where we'll see
00:16:48higher prices it's not unreasonable i think to believe that these coding subscriptions or generally
00:16:55these agentic usage subscriptions will cost many thousand dollars a month at some point not it not
00:17:03this year most likely but at some point because of course companies may start comparing the cost of
00:17:10these subscriptions against the cost of employees yeah and that's of course not great news and it
00:17:17may be totally wrong but it is definitely what i think will happen and of course when you make
00:17:23that comparison there's a lot of room for these subscriptions to get way way more expensive
00:17:30obviously the subscriptions then won't be for the normal people anymore so i think we'll also see
00:17:35new subscription offerings for them which simply have way stricter usage limits which are enough
00:17:41for jetgpt but not enough for agentic workflows but for the professional use for the agentic
00:17:47workflows we'll see stricter limits and higher prices i'm not sure when because you know market
00:17:52share right so what i mentioned before but eventually we'll see that because ultimately
00:17:58as mentioned open ai has around 18 months of runway they probably want to stay in the business
00:18:03same for enthropic and therefore that is what i think we'll see here in a year or so i don't know