BGP Zombies: Why Vanished Routes Paralyze Networks

In the BGP (Border Gateway Protocol) system, which serves as the internet's signpost, the most perplexing moment is when a deleted route survives like a ghost and hijacks traffic. This phenomenon—where a specific IP prefix remains as an active route on some routers worldwide despite an engineer explicitly withdrawing it—is what we call a BGP Zombie.

This isn't just a simple data error. During data center migrations or maintenance, traffic flowing into zombie routes fails to find its destination, dissipating into thin air or falling into infinite loops. In the complex cloud environment of 2026, the ability to control these phantom routes is no longer optional but a mandatory skill for engineers aiming for 99.9% availability.

Three Causes of Stuck Data

Under normal circumstances, a withdrawal message sent from the origin AS (Autonomous System) should immediately update the Routing Information Base (RIB) of routers globally. However, if this chain breaks at a specific point, a zombie is born.

Software Bugs and TCP Jams: When handling massive amounts of routing information, memory management errors or the inability of the BGP process to accommodate TCP session data can lead to withdrawal messages being ignored. This creates a mismatch where the session remains active but the actual route is not updated.
Path Hunting and MRAI Timers: When a specific path disappears, a router may temporarily advertise invalid "ghost" paths while searching for an alternative. During this process, the MRAI (Minimum Route Advertisement Interval) timer, designed to prevent network instability, can delay updates and extend a zombie's lifespan by 30 minutes or more.
Route Reflector Synchronization Errors: If a Route Reflector used in large-scale networks fails to properly propagate withdrawal messages to its clients, the entire AS can become infected with zombie routes.

Black Holes That Destroy User Experience

Stale information generated by zombie routes deals a fatal blow to user experience. This is because routers always follow the Longest Prefix Match rule, prioritizing more specific routes.

For example, if AS1 deletes a route but a zombie route remains with an upstream provider, traffic will bounce between networks without reaching its destination until it is discarded. Users experience interrupted webpage loading or app communication errors, which leads directly to a decline in service reliability.

IPv4 is particularly susceptible to long-term zombie survival due to its vast table size. However, attention is also required for IPv6, as its traffic has surged recently and the impact of failures is growing.

A 2-Step Announcement Strategy for Availability

To minimize risk, global infrastructure companies utilize a "Make-Before-Break" approach.

Preemptive New Advertisement: Before deleting an existing route, advertise the target prefix from the new location first so that global routing tables recognize the new path.
Safe Withdrawal: After confirming that the global route has sufficiently stabilized (usually after several minutes), withdraw the old, unnecessary route.

This strategy ensures that at least one valid path is maintained even if a specific router misses a withdrawal message. It fundamentally lowers the probability of traffic choosing a non-existent zombie route.

Optimization Settings for 99.9% Availability

To quickly detect physical failures, default BGP timers should be adjusted to fit your environment. Implementing BFD (Bidirectional Forwarding Detection), a hardware-based detection mechanism, enables ultra-fast failure detection in less than a second.

Timer Type	Default Value	Recommended Optimized Value	Expected Effect
Keepalive	60s	7 ~ 10s	Increased neighbor state check frequency
Hold-time	180s	21 ~ 30s	Shorter failure declaration and session reset
MRAI (eBGP)	30s	0 ~ 5s	Accelerated route convergence speed

Securing Visibility for Resilient Infrastructure

BGP zombies arise from the structural limitations of a trust-based protocol. To defend against them, you must go beyond simply changing configurations and gain visibility from a global internet perspective.

Utilize BMP (BGP Monitoring Protocol) to monitor the integrity of your routing tables in real-time. Maintain a proactive stance by using tools like RIPE RIS or Cloudflare Radar to constantly monitor how your network's routes appear from the outside. The combination of advanced timer optimization and security standards like RPKI is the only way to protect your services from the zombie routes that roam like ghosts.

BGP Zombies: Why Vanished Routes Paralyze Networks

Three Causes of Stuck Data

Software Bugs and TCP Jams: When handling massive amounts of routing information, memory management errors or the inability of the BGP process to accommodate TCP session data can lead to withdrawal messages being ignored. This creates a mismatch where the session remains active but the actual route is not updated.
Path Hunting and MRAI Timers: When a specific path disappears, a router may temporarily advertise invalid "ghost" paths while searching for an alternative. During this process, the MRAI (Minimum Route Advertisement Interval) timer, designed to prevent network instability, can delay updates and extend a zombie's lifespan by 30 minutes or more.
Route Reflector Synchronization Errors: If a Route Reflector used in large-scale networks fails to properly propagate withdrawal messages to its clients, the entire AS can become infected with zombie routes.

Black Holes That Destroy User Experience

Stale information generated by zombie routes deals a fatal blow to user experience. This is because routers always follow the Longest Prefix Match rule, prioritizing more specific routes.

A 2-Step Announcement Strategy for Availability

To minimize risk, global infrastructure companies utilize a "Make-Before-Break" approach.

Preemptive New Advertisement: Before deleting an existing route, advertise the target prefix from the new location first so that global routing tables recognize the new path.
Safe Withdrawal: After confirming that the global route has sufficiently stabilized (usually after several minutes), withdraw the old, unnecessary route.

Optimization Settings for 99.9% Availability

Timer Type	Default Value	Recommended Optimized Value	Expected Effect
Keepalive	60s	7 ~ 10s	Increased neighbor state check frequency
Hold-time	180s	21 ~ 30s	Shorter failure declaration and session reset
MRAI (eBGP)	30s	0 ~ 5s	Accelerated route convergence speed

BGP Zombies: Why Vanished Routes Paralyze Networks

Related Video

The Internet Is Full of Zombies and Nobody Is Talking About It

BGP Zombies: Why Vanished Routes Paralyze Networks

Three Causes of Stuck Data

Black Holes That Destroy User Experience

A 2-Step Announcement Strategy for Availability

Optimization Settings for 99.9% Availability

Securing Visibility for Resilient Infrastructure

Comments (0)

BGP Zombies: Why Vanished Routes Paralyze Networks

Three Causes of Stuck Data

Black Holes That Destroy User Experience

A 2-Step Announcement Strategy for Availability

Optimization Settings for 99.9% Availability

Securing Visibility for Resilient Infrastructure