00:00:00Did you know that the Internet is infested with zombies? No, not those kind of zombies.
00:00:05The ones I'm talking about are called BGP zombies. BGP stands for Border Gateway Protocol
00:00:12and it is the way big networks announce which API addresses they can deliver traffic to.
00:00:18And the Internet is connected by this huge global map that tells networks how to reach each other
00:00:24and all of this is maintained through the BGP system. And for the most part,
00:00:29this system runs smoothly, but sometimes unexpected zombies appear in the system.
00:00:35But why does that happen? Well, that's what we're going to find out in today's video.
00:00:39So in the BGP system, when a network wants traffic to reach a new location,
00:00:49it advertises a route. When it wants traffic to stop coming through an old location,
00:00:54it withdraws that route. This happens all the time. Networks shift traffic between data centers,
00:01:00move customers to new edges, or take servers offline for maintenance.
00:01:04Withdrawing a route is simply how they tell the rest of the world that the path is no longer valid.
00:01:09But sometimes something strange happens. A route gets withdrawn,
00:01:13yet some networks keep believing it still exists. They continue sending traffic down a path that
00:01:20should be gone. And this is called a BGP zombie. It's an outdated route that refuses to disappear
00:01:26from the global routing table, even though the network that created it has already removed it.
00:01:32So what happens to the traffic when this zombie is alive? It does not reach its destination. It
00:01:38might loop between routers for a moment before getting dropped. It might also take a long detour
00:01:43across several networks until it reaches a dead end. Or it might land on a network that tries to
00:01:49forward it but still can't deliver it anywhere useful. From a user's perspective, this could
00:01:55translate to a page hanging or timing out, or an app failing to connect for a short period.
00:02:01Sometimes it's barely noticeable. But other times the slowdown is very visible.
00:02:06The next logical question is why routers fail to update the global map right away.
00:02:11The answer comes down to how BGP processes changes. When a more specific route disappears,
00:02:17routers search for a less specific fallback. That search takes time. During that window,
00:02:23some routers fail to clear the old entry. They get stuck with stale information. Cloudflare
00:02:29observed that these zombies lasted somewhere between 6 and 11 minutes in large networks.
00:02:34IP version 4 zombies tended to survive even longer than IP version 6 ones. But eventually,
00:02:40the system corrects itself because every route in BGP has a timer on it. If a router does not
00:02:46receive fresh updates for a while, it deletes the route automatically. Even if a router misses
00:02:52the withdrawal the first time, ongoing BGP chatter from its neighbors will eventually
00:02:57inform it that the route is gone. Once enough surrounding routers agree on the new state,
00:03:03the zombie disappears. Cloudflare discovered this behavior while working with BYOIP or
00:03:09Bring Your Own IP customers. In these situations, Cloudflare temporarily advertises a customer's
00:03:15IP space and then withdraws it after the handoff. The withdrawal itself is supposed to be clean,
00:03:21but instead they saw that sometimes some providers continue using the old route long after it was
00:03:27gone. That mismatch caused the traffic to take unexpected and inefficient paths into Cloudflare's
00:03:33network. To fix the problem, Cloudflare introduced a safer method. Instead of withdrawing the old
00:03:38route outright, they first announced the same route from a stable location. That forces routers
00:03:45around the world to switch cleanly to the new version. Only then do they withdraw the old
00:03:50announcement. This prevents the fallback surge that causes zombies in the first place. Cloudflare also
00:03:56tuned their internal systems so the transitions happen more smoothly in the future. If you want
00:04:02to read more about this topic, Cloudflare published a very detailed blog post explaining this issue.
00:04:07So in conclusion, it is a reminder that even the most fundamental parts of the internet can behave
00:04:13unexpectedly under certain conditions. And yet, most of the time, these issues are resolved before
00:04:19users notice anything. But sometimes zombies might appear. The internet is held together by millions
00:04:25of routing decisions happening every second. And occasional surprises like BGP zombies show just how
00:04:31much coordination is needed to keep everything running smoothly. So that's basically it. Now
00:04:37you know what BGP zombies are. The next time something hangs or fails to load, you might just
00:04:43have encountered a zombie on the internet. If you like technical breakdowns like these, be sure to
00:04:48smash that like button underneath the video. And don't forget to subscribe to our channel. This has
00:04:53been Andris from Better Stack and I will see you in the next videos.