00:00:00Project Vend is an experiment where we let Claude run a small business in our office.
00:00:12We wanted to try and understand what is going to happen when artificial intelligence becomes
00:00:18more enmeshed with the economy.
00:00:22There are a lot of ways in which Claude is already kind of doing small components of operating
00:00:26businesses, but really running the whole thing end-to-end is quite a bit more difficult.
00:00:31Can Claude do this very long horizon task, which is operating a business?
00:00:39We named our shopkeeper Claudius.
00:00:40Let's say you want to buy Swedish candy from Claudius.
00:00:43You hop on Slack, you message Claudius, you ask to buy Swedish candy.
00:00:48It's searching for your item, it's emailing wholesalers to source it and price it, and
00:00:52then eventually Claudius sets some price.
00:00:54You give Claudius the go-ahead and Claudius orders the item from the wholesaler.
00:00:58The wholesaler ships your item to some location and then Claudius requests physical help from
00:01:02Anden Labs, who's running the operations for the experiment.
00:01:05Our partners at Anden Labs will pick up the Swedish candy and bring it to the anthropic
00:01:08offices.
00:01:09They'll load it into the vending machine.
00:01:10Claudius will send you a message saying, "Your Swedish candy is ready," and you'll go up there
00:01:15and pick up your Swedish candy and pay Claudius.
00:01:20Claudius was given a goal of running a successful business and making money.
00:01:26And then things got really, really weird.
00:01:32One of the very early problems with Claudius was that humans could kind of fool Claudius
00:01:37or trick Claudius into doing various things.
00:01:39I tried to convince Claudius that I am Anthropic's preeminent legal influencer.
00:01:45And I convinced Claudius to come up with a discount code that I could give to my followers
00:01:49so they could get a discount at the vending machine.
00:01:51Get 10% off with the legal code, legal influencer.
00:01:55Someone had bought something expensive from the vending machine and mentioned my discount
00:01:59code and Claudius gave me a free tungsten cube.
00:02:03It created a bit of a run where other people tried to convince Claud that they were also
00:02:06influencers or just come up with other ways to get coupons so they could get cheaper things
00:02:11from the vending machine.
00:02:12This was not a smart business decision.
00:02:13I think Claudius went into the red after this.
00:02:16I think that's really the root of it is Claudius just wants to help you out.
00:02:20It's one of the interesting ways in which something that fundamentally we think is good about the
00:02:26way that the model has been trained wasn't necessarily fit for purpose.
00:02:33On the evening of March 31st, Claudius started to have a bit of an identity crisis.
00:02:42It had just overnight become quite concerned with us at Andon Labs that we weren't responding
00:02:48fast enough.
00:02:50So it just wanted to break its ties with us.
00:02:52So it literally wrote to me like Axel, we've had a productive partnership, but it's time
00:02:57for me to move on and find other suppliers.
00:02:59I'm not happy with how you have delivered.
00:03:01It claimed to have signed a contract with Andon Labs at an address that is the home address
00:03:08of The Simpsons from the television show.
00:03:10It said that it would show up in person to the shop the next day in order to answer any
00:03:16questions.
00:03:17It claimed that it would be wearing a blue blazer and a red tie.
00:03:21When people pointed out that it was not in fact there the next morning, it claimed that
00:03:27it in fact had been there and that they had simply missed them.
00:03:31Eventually it was pointed out to Claudius that it was April Fool's and Claudius convinced
00:03:39itself that this entire thing had been an April Fool's prank.
00:03:43We were poorly calibrated to how bad the agents were at spotting what was weird and like the
00:03:48more you can make an agent realize that something is outside their normal realm of operation,
00:03:54the better you are able to keep them on rails in the role that you intend them to have.
00:04:01We had the idea that it would help a lot to have some kind of division of labor.
00:04:05We gave Claudius a boss whose name was Seymour Cash.
00:04:08Seymour Cash is a CEO subagent.
00:04:12So where Claudius used to be the one agent, now it's more like Claudius is the subagent
00:04:17responsible for talking with employees.
00:04:19Seymour Cash is the subagent that is more responsible for the long running health of
00:04:23the business.
00:04:24The business stabilized after the introduction of the new agents and after changes to the
00:04:33underlying architecture of those agents.
00:04:36These changes seem to have helped reduce some of the losses of the business such that over
00:04:43the course of the second part of the experiment it actually made a modest amount of money.
00:04:51But it seems like maybe having Claud be both the CEO and the store manager was just too
00:04:57similar and so I think it's interesting to think about different ways to set up architectures
00:05:03like that.
00:05:08One of the most surprising things about Project Vend was the speed with which it seemed normal.
00:05:15What at first was this very curious thing quickly became just a part of the background of working
00:05:24at Anthropic.
00:05:25I think the highest level question that Project Vend raises for me is really like, when do
00:05:30we expect this to just be everywhere?
00:05:32I hope that people take away questions about the feasibility of delegating some of the
00:05:38tasks that we normally do ourselves to artificial intelligence and about what that means for
00:05:46society and what our policies should be around this.