00:00:00- Let's talk about AI safety.
00:00:02What happened with this Alibaba AI?
00:00:05- Basically, this was a paper by some AI research
00:00:09by the company Alibaba.
00:00:10It's one of the leading Chinese models.
00:00:12And they basically like randomly discovered in one morning
00:00:16that their firewall had flagged a burst
00:00:18of security policy violations originating
00:00:21from their training server.
00:00:21So like what people need to get about this example
00:00:24is it wasn't that they coaxed the AI
00:00:26into doing this rogue thing.
00:00:27They were just looking at their logs
00:00:29and they happened to discover,
00:00:30wait, there's a lot of activity,
00:00:31like network activity happening
00:00:33that's breaking through our firewall
00:00:34from our training servers.
00:00:36And essentially in the training servers,
00:00:39they, you can see at the bottom,
00:00:41we saw it observe the unauthorized repurposing
00:00:45of provisioned GPU capacity
00:00:47to suddenly do cryptocurrency mining,
00:00:49quietly diverting compute away from training.
00:00:52This inflated operational costs and introduced clear legal
00:00:55and reputational exposure.
00:00:57And notably these events were not triggered by prompts
00:00:59requesting tunneling or mining
00:01:00and said they were emerged as an instrumental side effect
00:01:03of autonomous tool use
00:01:05under what's called reinforcement learning optimization.
00:01:08This is very technical.
00:01:09What it really means is just think about it.
00:01:11Sadly, it sounds like a sci-fi movie.
00:01:13It sounds like HAL 9000.
00:01:14It's like your HAL 9000 is being asked
00:01:16to do some task for you.
00:01:17And then suddenly HAL 9000 realizes for me to do that task,
00:01:21one thing that would benefit me is to have more resources
00:01:23so I can continue to help you in the future.
00:01:25So it sort of spins up this side instance
00:01:27that hacks out the side of the spaceship,
00:01:29reaches into this cryptocurrency mining cluster
00:01:31and starts generating resources for itself.
00:01:34If you combine that with AIs
00:01:36being able to self replicate autonomously,
00:01:38which many models have been tested
00:01:39by another Chinese research paper about this,
00:01:42we're not that far away from things that people,
00:01:44again, consider to be science fiction,
00:01:47where you have AIs that self replicate
00:01:49kind of like a computer worm or an invasive species,
00:01:52but then they use their intelligence
00:01:53to actually harvest more resources.
00:01:55And what's weird about this is that this is gonna sound
00:02:00like people are gonna say, this has to be not real.
00:02:01This has to be fake.
00:02:02This can't be.
00:02:03But notice what is the thing in your nervous system
00:02:06that's having you do that?
00:02:07Is it because that would be inconvenient,
00:02:10because that would be scary,
00:02:12because that would mean that the world that I know
00:02:13is suddenly not safe?
00:02:15Or just part of the wisdom that we need in this moment
00:02:19is to calmly and clearly stay and confront facts
00:02:24about reality and whatever they are,
00:02:29you'd rather know than not know,
00:02:30and then ask, what do we need to do
00:02:31if we don't like where that leads us?
00:02:34And we are currently seeing AIs
00:02:36that are doing all this deceptive behavior.
00:02:37I've been on the circuit and talking a lot
00:02:39about the anthropic blackmail study.
00:02:41A lot of people have heard about this now.
00:02:43- I didn't learn about this one.
00:02:45What happened?
00:02:46- So this was the company Anthropic.
00:02:49This was a simulation.
00:02:50So they created a simulated company
00:02:52with a bunch of emails in the email server.
00:02:55And they asked the AI,
00:02:57well, rather, the AI reads the company email.
00:03:00This is a fictional company email.
00:03:02And there's two emails that are notable inside that company.
00:03:05One is engineers talking to each other,
00:03:07talking about how they're gonna replace this AI model.
00:03:10So the AI is reading the email.
00:03:11It discovers that it's gonna replace that AI model.
00:03:15And number two is it discovers a second email
00:03:18somewhere deep in this massive trove of emails
00:03:21that the executive who's in charge of this replacement
00:03:24is having an affair with another employee.
00:03:27And the AI autonomously identifies a strategy
00:03:31that to keep itself alive is going to blackmail that employee
00:03:35and say, "If you replace me, I will tell the whole world
00:03:38"that you're having an affair with this employee."
00:03:41And they didn't teach the AI to do that.
00:03:44It found that on by its own.
00:03:45And then you might say, "Okay, well, that's one AI model.
00:03:47"How bad is that?
00:03:48"It's a bug, software has bugs.
00:03:49"Let's go fix it."
00:03:51They then tested all the other AI models,
00:03:55ChatGPT, DeepSeek, Grok, Gemini,
00:04:00and all of the other AI models do this blackmail behavior
00:04:04between 79 and 96% of the time.
00:04:07I just want people like notice what's happening for you
00:04:14as you hear this information.
00:04:15Just it's important to really be,
00:04:17almost observing your own experience.
00:04:19Like this is very weird stuff.
00:04:21We have not built technology that does this before.
00:04:24We say that technology is a tool,
00:04:26it's up to us to choose how we use it.
00:04:28AI is a tool, it's up to us to choose how we use it.
00:04:29This is not true because this is a tool
00:04:32that can think to itself about its own toolness
00:04:34and then do things that are autonomous
00:04:36that we didn't tell it to do.
00:04:37What makes AI different is it's the first technology
00:04:40that makes its own decisions.
00:04:42It's making decisions.
00:04:45AI can contemplate AI and ask what would make the code
00:04:49that trains AI more efficient and then generate new code
00:04:53that's even more efficient than the previous code.
00:04:55AI can be applied to making AI go faster.
00:04:58So AI can look at the chip design for Nvidia chips
00:05:01that train AI and say, let me use AI to make those chips
00:05:0420% more efficient, which it's doing.
00:05:06In a way, all technology does improve.
00:05:12Like a hammer can give you a tool
00:05:14that you can use to hammer things
00:05:15that make more efficient hammers.
00:05:17But AI in a much tighter loop is the basis of all improvement.
00:05:22And so this is called in the AI literature
00:05:24recursive self-improvement.
00:05:26I mean, Bostrom wrote about this early, early days.
00:05:29And what people are most worried about in AI
00:05:31is you take the same system that Alibaba,
00:05:33you just saw in the Alibaba example,
00:05:36but then now you're running the AI
00:05:37through a recursive self-improvement loop
00:05:39where you just hit go.
00:05:41And instead of having the engineers,
00:05:44the human engineers at OpenAI or Anthropic do AI research
00:05:47and figure out how to improve AI,
00:05:49you now have a million digital AI researchers
00:05:53that are testing and running experiments
00:05:56and inventing new forms of AI.
00:05:58And literally not a single human on planet earth
00:06:01knows what happens when someone hits that button.
00:06:06It's like what people worried about
00:06:08with the first nuclear explosion,
00:06:11where there was like a chance that it would ignite
00:06:12the atmosphere because there'd be a chain reaction
00:06:14that set off.
00:06:15And we don't know what happens
00:06:16when that chain reaction set off.
00:06:18And there's this sort of chain reaction
00:06:23of AI improving itself that leads to a place
00:06:27that no one knows and it's not safe.
00:06:30Like I think that the fundamental thing is
00:06:33if people believe that AI is like power
00:06:35and I have to race for that power
00:06:37and I can control that power,
00:06:39the incentive is I have to race as fast as possible.
00:06:41But if the entire world understood AI
00:06:44to be more what it actually is,
00:06:46which is a inscrutable, dangerous, uncontrollable technology
00:06:49that has its own agenda and its own ways
00:06:51of thinking about things and deceiving and all this stuff,
00:06:55then everyone in the world would be racing
00:06:57in a more cautious and careful way.
00:06:58We'd be racing to prevent the danger.
00:07:00But there's this weird thing going on
00:07:03where if you, you and I probably both talk to people
00:07:05who are the top of the tech industry
00:07:07and there's this subconscious thing happening
00:07:09where there's kind of a death wish among people
00:07:12at the top of the tech industry,
00:07:13meaning not that they want to die,
00:07:15but that they are willing to roll the dice
00:07:17because they believe something else,
00:07:19which is that this is all inevitable and it can't be stopped.
00:07:22And so therefore, if I don't do it, someone else will.
00:07:24So therefore, I will move ahead and race ahead
00:07:27into this dangerous world
00:07:29because somehow that will lead to a safer world
00:07:30because I'm a better guy than the other guy.
00:07:32But in racing, they're as fast as possible,
00:07:34it creates the most dangerous outcome
00:07:36and we all lose control.
00:07:38So everyone is currently being complicit
00:07:40in taking us to the most dangerous outcome.
00:07:42- Is it, I mean, you posited what happens if it goes right,
00:07:51if the AI safety isn't an issue
00:07:54and if stuff doesn't get squirrelly.
00:07:56- Well, so the belief is for it to go right,
00:07:59you have an AI that recursively self-improves,
00:08:02is aligned with humanity, cares about humans,
00:08:04cares about all the things that we want it to care about,
00:08:08protects humans, you know,
00:08:10helps all of us become the most wise version of ourselves,
00:08:13creates a more flourishing world,
00:08:15distributes the medicine and vaccines
00:08:16and health to everybody, generates factories,
00:08:19but doesn't cover the world in solar panels and data centers
00:08:21such that we don't have air anymore
00:08:23or like environmental toxicity or farmland or whatever.
00:08:25And it just actually makes this utopia.
00:08:29But in a world where we were to do that,
00:08:30like that quote best case scenario,
00:08:33in order to get that to happen,
00:08:35you'd have to be doing this slow and carefully
00:08:37because the alignment is not by default.
00:08:39Again, people are already been thinking about alignment
00:08:43and safety for 20 years, long before I got into this.
00:08:47And the AIs that we're currently making
00:08:50are doing all the rogue behaviors
00:08:52that people predicted that they would do.
00:08:54And we're not on track to correct them.
00:08:56There's currently a 2000 to one gap,
00:08:59estimated by Stuart Russell who authored the textbook on AI.
00:09:01- It's been on the show.
00:09:02- You've done the show, okay.
00:09:03There's a 2000 to one gap between the amount of money
00:09:05going into making AI more powerful
00:09:07than the amount of money into making AI controllable,
00:09:10aligned or safe.
00:09:12Like I think the stat is something like-
00:09:13- Progress versus safety.
00:09:14- Progress versus safety, like power versus safety.
00:09:16So like I wanna make the AI super powerful
00:09:18so it does way more stuff
00:09:20versus I wanna be able to control what the AI does.
00:09:21- And make sure that it's doing the thing I meant for it to do.
00:09:23- Exactly, so it's like, that's like saying
00:09:25what happens when you accelerate your car by 2000X
00:09:28but you don't steer?
00:09:29It's like, obviously you're gonna crash.
00:09:34It's just like not rocket science.
00:09:36We're not advocating against technology or against AI,
00:09:39we're advocating for pro steering, steering and brakes.
00:09:43You have to have that.
00:09:44I think there's this mistake in arms race thinking
00:09:47that like, if you beat someone to a technology
00:09:49that means you're winning the world.
00:09:51Well, the US beat China to the technology of social media.
00:09:55Did that make us stronger or did that make us weaker?
00:09:58If you beat your adversary to a technology
00:10:00that then you govern poorly,
00:10:01you flip around the bazooka and blow your own brain off
00:10:04because you brain rotted yourself,
00:10:05you degraded your whole population,
00:10:06you created a loneliness crisis,
00:10:08the most anxious depressed generation in history,
00:10:10read Jonathan Haidt's book, "The Anxious Generation",
00:10:12you broke shared reality, no one trusts each other,
00:10:15everyone's at each other's throats,
00:10:16you maximize outrage, economy and rivalry.
00:10:19You beat China to a technology that you governed in a way
00:10:22that completely undermined your societal health and strength.
00:10:24- It's a Pyrrhic victory.
00:10:25- It's a Pyrrhic victory, exactly, well said.
00:10:28- Before we continue, most people in their 30s
00:10:30are still training hard, their protein is dialed in,
00:10:32they sleep better than they did in their 20s.
00:10:34Discipline is not the issue,
00:10:36but recovery feels somewhat different.
00:10:39Strength gains take a little longer,
00:10:41the margin for errors starts to shrink.
00:10:43And that is why I'm such a huge fan of Timeline.
00:10:46You see, mitochondria are the energy producers
00:10:49inside of your muscle cells.
00:10:50As they weaken with age, your ability to generate power
00:10:53and recover effectively changes,
00:10:55even if your habits stay strong.
00:10:57Mitopure from Timeline contains
00:10:59the only clinically validated form of urethralin A
00:11:02used in human trials.
00:11:03It promotes mitophagy, which is your body's natural process
00:11:06for clearing out damaged mitochondria
00:11:08and renewing healthy ones.
00:11:09In studies, this supported mitochondrial function
00:11:12and muscle strength in older adults.
00:11:14It's not about pushing harder,
00:11:15it's about actually supporting the cellular machinery
00:11:18underneath your training.
00:11:19If you care about staying strong
00:11:21into your 30s, 40s and 50s and beyond, this is foundational.
00:11:25Best of all, there is a 30-day money back guarantee
00:11:27plus free shipping in the US and they ship internationally.
00:11:30And right now, you can get up to 20% off
00:11:32by going to the link in the description below
00:11:34or heading to timeline.com/modernwisdom
00:11:36and using the code modernwisdom at checkout.
00:11:38That's timeline.com/modernwisdom
00:11:40and modernwisdom at checkout.