Equal-Cost Multi-Path Routing (ECMP) HowTo
About
This document aims to provide some example use cases and scenarios, how Equal-Cost Multi-Path (ECMP) routing can be applied and how it may behave in different situations.
Introduction
ECMP can allow for routes to be configured so that multiple paths can exist for the same destination, simultaneously. Meaning that different flows, aimed for the same destination, can potentially be routed on different paths. This can be useful for a number of different reasons, for example:
-
Load balancing of traffic flows, bound for the same destination, over different paths.
-
Distribution of frames on multiple links for better bandwidth utilization.
-
Allow traffic flows to select an alternate path on potential direct link failure, without the need for waiting on potential routing protocol updates.
This document provide a few different use cases and scenarios to help describe how ECMP can be applied and how it functions. They are intended to provide a basic understanding how they are used, so that they could possibly be applied to solve similar scenarios.
Case 1: Stateless Load Balancing
In this example use case, we aim to provide a router configuration that can provide some basic load balancing using ECMP, by providing two different default gateways. Refer to the example in Figure 1 below, for an example setup.
.--.-.
( ( )__
(_, \ ) ,_) Internet
'-'--`--'
| |
| |
.-------' '-------.
| |
| |
.----+----. .----+----.
| | | |
| ISP1 | | ISP2 |
| | | |
'----+----' '----+----'
.99 | | .99
| |
172.16.1.0/24 '-------. .-------' 172.16.2.0/24
| |
.1 | | .1
.-+-----+-.
| |
| R1 |
| | GW: 172.16.1.99
'----+----' 172.16.2.99
|
|
.--.-.
( ( )__
(_, \ ) ,_) Lan
'-'--`--'
On the device R1
we have simply configured two different default routes, with
the same cost, each pointing to one of the ISPs as the next hop gateway. The
intention is that traffic flows from devices in the Lan
will be distributed as
much as possible between ISP1
and ISP2
.
Note
This could of course be scaled up by having even more next hops towards additional devices. This example uses only two next hops for the sake of simplicity.
Configure
No specific configuration in regards to ECMP needs to be made on router R1
, we
simply configure two default routes towards two different next hop addresses:
Note
Remember to NOT configure the routes with different distances.
R1:/#> configure ip R1:/config/ip/#> route default 172.16.1.99 R1:/config/ip/#> route default 172.16.2.99 R1:/config/ip/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R1:/#>
Now that the routes have been configured, the routing table should look something like the following:
R1:/#> show ip route S - Static | C - Connected | K - Kernel route | > - Selected route O - OSPF | R - RIP | [Distance/Metric] | * - FIB route S>* 0.0.0.0/0 [1/0] via 172.16.1.99, vlan1, weight 1, 00:00:03 * via 172.16.2.99, vlan2, weight 1, 00:00:03 C>* 172.16.1.0/24 is directly connected, vlan1, 00:01:32 C>* 172.16.2.0/24 is directly connected, vlan2, 00:01:32
Notice that the 0.0.0.0/0
(default) destination have two next hop addresses
listed for it. Traffic bound for the Internet that enters the router R1
, from
any devices located in the Lan
network, should now be provided a next hop
towards either ISP1
or ISP2
, on a per flow level.
Note
The effectiveness of the load balancing depends how the hash mapping is performed. It is not a guarantee that there will be an exact 50/50 split across the two ISPs.
Case 2: Load Balancing and Redundant Routes
This use case will provide an example how we can configure multiple routers with ECMP routes to provide load balancing, while also providing several redundant routes. An example of this type of setup is provided in Figure 2 below.
Network-A Network-B
.--.-. .--.-.
( ( )__ ( ( )__
(_, \ ) ,_) (_, \ ) ,_)
'-'--`--' '-'--`--'
| .---------. .---------. |
| NET-1 | | NET-3 | | NET-5 |
172.16.1.0/24 .-------+ R2 +-----------+ R4 +-------. 172.16.2.0/24
| .´ .2| |.2 .4| |.4 `. |
| .´ '----+----+ +----+----' `. |
.----+----+´ | `. .´ | `+----+----.
| |.1 | `. .´ | .6| |
| R1 | NET-9 | `.´ | NET-10 | R6 |
| |.1 | NET-7.´ `.NET-8 | .6| |
'---------+. | .´ `. | .+---------'
`. .----+----+´ `+----+----. .´
`. .3| |.3 .5| |.5 .´
`-------+ R3 +-----------+ R5 +-------´
NET-2 | | NET-4 | | NET-6
'---------' '---------'
Networks ==========================================
NET-1: 192.168.10.0/24 NET-6: 192.168.60.0/24
NET-2: 192.168.20.0/24 NET-7: 192.168.70.0/24
NET-3: 192.168.30.0/24 NET-8: 192.168.80.0/24
NET-4: 192.168.40.0/24 NET-9: 192.168.90.0/24
NET-5: 192.168.50.0/24 NET-10: 192.168.100.0/24
===================================================
In the setup above, we want to ensure connectivity between the two networks
Network-A
and Network-B
. On each of the routers routes are configured with
the same distance, so that they are of equal cost, so that ECMP routing can be
utilized. Because of the redundant paths, link failures should not impact the
traffic sent between the two networks Network-A
and Network-B
.
With this setup many paths will exist at the same time that could be utilized, depending on how the multi-path hash is calculated for each traffic flow. Each of the routers will technically be able to send the traffic on any equal cost route. As an example, the following paths are of the same cost and could all be selected for different flows:
Network-A
->R1
->R2
->R4
->R6
->Network-B
Network-A
->R1
->R2
->R5
->R6
->Network-B
Network-A
->R1
->R3
->R5
->R6
->Network-B
Network-A
->R1
->R3
->R4
->R6
->Network-B
Note
Any route specified between R2
and R3
, and vice versa between R4
and R5
,
should be configured with a higher distance
, so that the cost will not be the
same as the other routes. The reason for this is that we do not want any of
those routes to be part of the ECMP routing, since that path requires an extra
hop. Rather, that route should be utilized of the other is not usable, because
of link failure.
In most link failures scenarios depending on the number of link failures and
where they are situated, connectivity should still be possible between the two
networks. However, depending on where the link failure is, it is possible that
some traffic flows may use a path that is not the most efficient. As an example,
if the link for NET-3
and NET-8
goes down, both of the following paths could
be valid:
Network-A
->R1
->R3
->R5
->R6
->Network-B
Network-A
->R1
->R2
->R3
->R5
->R6
->Network-B
As we can see, in that specific scenario some traffic flow could use a path that has an extra hop. This should mostly be a problem if we use static routing, with a dynamic routing protocol it should select the better path, once the updated routing information have propagated to each of the routers.
Warning
We could still end up in a situation with this setup where we can
route traffic to a device that will not be able to forward it further, if we
experience multiple link failures. As an example, if the links for networks
NET-3
, NET-7
and NET-10
on router R4
are down, router R6
could
still forward traffic towards it, since it cannot know that those links are
down, when we use static routing. If we would use a dynamic routing protocol
this should not be an issue.
Configure With Static Routes
This is an example how we can configure the devices with static routes. It is somewhat more cumbersome to do than to simply configure either RIP or OSPF. Nevertheless, it is more and more difficult the larger a topology becomes. However, since the topology in this example is not particularly extensive it can be a useful as an example.
For this configuration it is assumed that all of the interfaces have already
been configured as intended. We need to configure each of the routers with the
necessary static routes to be able to reach Network-A
at 172.16.1.0/24
and
Network-B
at 172.16.2.0/24
. Since we want to utilize ECMP routing, all the
possible next hops for each destination IP address must be considered.
We need to be aware when we set up the routes, over the networks NET-9
and
NET-10
, to use a higher distance value. Setting a higher distance value for
those routes will increase the cost, so that they will not be considered equal
as part of the ECMP routing. Because providing a next hop over those two
networks will always result in an additional hop to reach any destination in
either Network-A
or Network-B
. We only want those routes to be considered if
the outer routes are unavailable, due to link failure for instance. In essence,
they should serve as a backup path, and not be part of the regular multi-path
distribution of traffic flows that will be the result of the ECMP routing.
R1 Configuration:
R1:/#> configure R1:/config/#> ip R1:/config/ip/#> route 172.16.2.0/24 192.168.10.2 R1:/config/ip/#> route 172.16.2.0/24 192.168.20.3 R1:/config/ip/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R1:/#> show ip route S - Static | C - Connected | K - Kernel route | > - Selected route O - OSPF | R - RIP | [Distance/Metric] | * - FIB route C>* 172.16.1.0/24 is directly connected, vlan7, 02:29:32 S>* 172.16.2.0/24 [1/0] via 192.168.10.2, vlan1, weight 1, 00:29:57 * via 192.168.20.3, vlan2, weight 1, 00:29:57 C>* 192.168.10.0/24 is directly connected, vlan1, 01:02:26 C>* 192.168.20.0/24 is directly connected, vlan2, 00:29:57
R2 Configuration:
R2:/#> configure R2:/config/#> ip R2:/config/ip/#> route 172.16.1.0/24 192.168.10.1 R2:/config/ip/#> route 172.16.2.0/24 192.168.30.4 R2:/config/ip/#> route 172.16.2.0/24 192.168.80.5 R2:/config/ip/#> route 172.16.1.0/24 192.168.90.3 10 R2:/config/ip/#> route 172.16.2.0/24 192.168.90.3 10 R2:/config/ip/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R2:/#> show ip route S - Static | C - Connected | K - Kernel route | > - Selected route O - OSPF | R - RIP | [Distance/Metric] | * - FIB route S>* 172.16.1.0/24 [1/0] via 192.168.10.1, vlan2, weight 1, 01:04:50 S 172.16.1.0/24 [10/0] via 192.168.90.3, vlan4, weight 1, 01:28:08 S>* 172.16.2.0/24 [1/0] via 192.168.30.4, vlan1, weight 1, 00:31:51 * via 192.168.80.5, vlan3, weight 1, 00:31:51 S 172.16.2.0/24 [10/0] via 192.168.90.3, vlan4, weight 1, 01:22:15 C>* 192.168.10.0/24 is directly connected, vlan2, 01:04:50 C>* 192.168.30.0/24 is directly connected, vlan1, 00:31:51 C>* 192.168.80.0/24 is directly connected, vlan3, 00:32:10 C>* 192.168.90.0/24 is directly connected, vlan4, 01:28:08
R3 Configuration:
R3:/#> configure R3:/config/#> ip R3:/config/ip/#> route 172.16.1.0/24 192.168.20.1 1 R3:/config/ip/#> route 172.16.2.0/24 192.168.40.5 1 R3:/config/ip/#> route 172.16.2.0/24 192.168.70.4 1 R3:/config/ip/#> route 172.16.1.0/24 192.168.90.2 10 R3:/config/ip/#> route 172.16.2.0/24 192.168.90.2 10 R3:/config/ip/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R3:/#> show ip route S - Static | C - Connected | K - Kernel route | > - Selected route O - OSPF | R - RIP | [Distance/Metric] | * - FIB route S>* 172.16.1.0/24 [1/0] via 192.168.20.1, vlan1, weight 1, 00:39:17 S 172.16.1.0/24 [10/0] via 192.168.90.2, vlan4, weight 1, 01:34:24 S>* 172.16.2.0/24 [1/0] via 192.168.40.5, vlan2, weight 1, 00:39:15 * via 192.168.70.4, vlan3, weight 1, 00:39:15 S 172.16.2.0/24 [10/0] via 192.168.90.2, vlan4, weight 1, 01:29:50 C>* 192.168.20.0/24 is directly connected, vlan1, 00:39:17 C>* 192.168.40.0/24 is directly connected, vlan2, 00:39:15 C>* 192.168.70.0/24 is directly connected, vlan3, 01:25:02 C>* 192.168.90.0/24 is directly connected, vlan4, 01:34:24
R4 Configuration:
R4:/#> configure R4:/config/#> ip R4:/config/ip/#> route 172.16.1.0/24 192.168.30.2 1 R4:/config/ip/#> route 172.16.1.0/24 192.168.70.3 1 R4:/config/ip/#> route 172.16.2.0/24 192.168.50.6 1 R4:/config/ip/#> route 172.16.2.0/24 192.168.100.5 10 R4:/config/ip/#> route 172.16.1.0/24 192.168.100.5 10 R4:/config/ip/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R4:/#> show ip route S - Static | C - Connected | K - Kernel route | > - Selected route O - OSPF | R - RIP | [Distance/Metric] | * - FIB route S>* 172.16.1.0/24 [1/0] via 192.168.30.2, vlan2, weight 1, 00:41:40 * via 192.168.70.3, vlan3, weight 1, 00:41:40 S 172.16.1.0/24 [10/0] via 192.168.100.5, vlan4, weight 1, 01:27:57 S>* 172.16.2.0/24 [1/0] via 192.168.50.6, vlan1, weight 1, 00:40:34 S 172.16.2.0/24 [10/0] via 192.168.100.5, vlan4, weight 1, 01:27:57 C>* 192.168.30.0/24 is directly connected, vlan2, 00:41:40 C>* 192.168.50.0/24 is directly connected, vlan1, 00:40:34 C>* 192.168.70.0/24 is directly connected, vlan3, 01:27:53 C>* 192.168.100.0/24 is directly connected, vlan4, 01:27:57
R5 Configuration:
R5:/#> configure R5:/config/#> ip R5:/config/ip/#> route 172.16.1.0/24 192.168.40.3 1 R5:/config/ip/#> route 172.16.1.0/24 192.168.80.2 1 R5:/config/ip/#> route 172.16.2.0/24 192.168.60.6 1 R5:/config/ip/#> route 172.16.2.0/24 192.168.100.4 10 R5:/config/ip/#> route 172.16.1.0/24 192.168.100.4 10 R5:/config/ip/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R5:/#> show ip route S - Static | C - Connected | K - Kernel route | > - Selected route O - OSPF | R - RIP | [Distance/Metric] | * - FIB route S>* 172.16.1.0/24 [1/0] via 192.168.40.3, vlan1, weight 1, 00:43:08 * via 192.168.80.2, vlan3, weight 1, 00:43:08 S 172.16.1.0/24 [10/0] via 192.168.100.4, vlan4, weight 1, 01:29:03 S 172.16.2.0/24 [10/0] via 192.168.100.4, vlan4, weight 1, 01:29:03 S>* 172.16.2.0/24 [1/0] via 192.168.60.6, vlan2, weight 1, 01:30:28 C>* 192.168.40.0/24 is directly connected, vlan1, 00:43:12 C>* 192.168.60.0/24 is directly connected, vlan2, 01:30:28 C>* 192.168.80.0/24 is directly connected, vlan3, 00:43:08 C>* 192.168.100.0/24 is directly connected, vlan4, 01:29:03
R6 Configuration:
R6:/#> configure R6:/config/#> ip R6:/config/ip/#> route 172.16.1.0/24 192.168.50.4 1 R6:/config/ip/#> route 172.16.1.0/24 192.168.60.5 1 R6:/config/ip/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R6:/#> show ip route S - Static | C - Connected | K - Kernel route | > - Selected route O - OSPF | R - RIP | [Distance/Metric] | * - FIB route S>* 172.16.1.0/24 [1/0] via 192.168.50.4, vlan2, weight 1, 00:42:20 * via 192.168.60.5, vlan1, weight 1, 00:42:20 C>* 172.16.2.0/24 is directly connected, vlan7, 01:56:12 C>* 192.168.50.0/24 is directly connected, vlan2, 00:42:20 C>* 192.168.60.0/24 is directly connected, vlan1, 01:31:09
Configure With OSPF
In this example we will configure the routers with OSPF, instead of static routes. The outcome in the end should mostly be the same, in terms of the ECMP route selection. In addition we should also be more resistant to ending up in the scenarios where the static routes could end up not working.
Warning
This setup is supposed to replace the configuration in the example Configure With Static Routes above, not to be combined with it.
As in the previous example we assume that all of the relevant interfaces have already been configured. The configuration example will simply focus on the set up of OSPF on the routers. In this example we will use a very simple OSPF configuration, the only thing we really need to do is to specify to OSPF what networks it should know about.
Only for router R1
and R6
will we need to set an additional configuration
option, by using the redistribute connected
command. Because we want OSPF to
know about the networks Network-A
and Network-B
that are directly connected
to R1
and R6
respectively, but exist outside of the OSPF domain. Thus, if we
want to be able to route traffic between those two networks, their existence
must be made aware to the OSPF routing domain. Otherwise no routes towards those
destinations will be inserted in any of the routers routing tables.
Note
While using OSPF will ensure a more optimal path during link failures, the failover time could in some cases be a bit longer. The reason for this is that the routing information from the protocol must propagate throughout the network.
R1 Configuration:
R1:/#> configure R1:/config/#> router R1:/config/router/#> ospf R1:/config/router/ospf/#> network 192.168.10.0/24 R1:/config/router/ospf/#> network 192.168.20.0/24 R1:/config/router/ospf/#> redistribute connected R1:/config/router/ospf/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R1:/#>
R2 Configuration:
R2:/#> configure R2:/config/#> router R2:/config/router/#> ospf R2:/config/router/ospf/#> network 192.168.10.0/24 R2:/config/router/ospf/#> network 192.168.30.0/24 R2:/config/router/ospf/#> network 192.168.90.0/24 R2:/config/router/ospf/#> network 192.168.80.0/24 R2:/config/router/ospf/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R2:/#>
R3 Configuration:
R3:/#> configure R3:/config/#> router R3:/config/router/#> ospf R3:/config/router/ospf/#> network 192.168.20.0/24 R3:/config/router/ospf/#> network 192.168.90.0/24 R3:/config/router/ospf/#> network 192.168.70.0/24 R3:/config/router/ospf/#> network 192.168.40.0/24 R3:/config/router/ospf/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R3:/#>
R4 Configuration:
R4:/#> configure R4:/config/#> router R4:/config/router/#> ospf R4:/config/router/ospf/#> network 192.168.30.0/24 R4:/config/router/ospf/#> network 192.168.50.0/24 R4:/config/router/ospf/#> network 192.168.100.0/24 R4:/config/router/ospf/#> network 192.168.70.0/24 R4:/config/router/ospf/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R4:/#>
R5 Configuration:
R5:/#> configure R5:/config/#> router R5:/config/router/#> ospf R5:/config/router/ospf/#> network 192.168.100.0/24 R5:/config/router/ospf/#> network 192.168.60.0/24 R5:/config/router/ospf/#> network 192.168.80.0/24 R5:/config/router/ospf/#> network 192.168.40.0/24 R5:/config/router/ospf/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R5:/#>
R6 Configuration:
R6:/#> configure R6:/config/#> router R6:/config/router/#> ospf R6:/config/router/ospf/#> network 192.168.50.0/24 R6:/config/router/ospf/#> network 192.168.60.0/24 R6:/config/router/ospf/#> redistribute connected R6:/config/router/ospf/#> leave Applying configuration. Configuration activated. Remember "copy run start" to save to flash (NVRAM). R6:/#>
Status and Verification
If we have managed to configure everything correctly we should now be able to send traffic between the two networks. We could do some basic verification by simply pinging some device from one network to the other:
root@Network-A-Host1:/home/admin # ping 172.16.2.99 -c 3 PING 172.16.2.99 (172.16.2.99): 56 data bytes 64 bytes from 172.16.2.99: seq=0 ttl=60 time=4.063 ms 64 bytes from 172.16.2.99: seq=1 ttl=60 time=7.465 ms 64 bytes from 172.16.2.99: seq=2 ttl=60 time=6.815 ms --- 172.16.2.99 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 4.063/6.114/7.465 ms
In order to check what specific path we are taking to reach the target, we could utilize traceroute:
root@Host-A:/home/admin # traceroute 172.16.2.99 traceroute to 172.16.2.99 (172.16.2.99), 30 hops max, 46 byte packets 1 172.16.1.1 (172.16.1.1) 1.268 ms 0.956 ms 0.914 ms 2 192.168.10.2 (192.168.10.2) 2.167 ms 2.722 ms 1.618 ms 3 192.168.70.4 (192.168.70.4) 2.787 ms 1.928 ms 1.722 ms 4 192.168.60.6 (192.168.60.6) 3.757 ms 1.791 ms 1.287 ms 5 172.16.2.99 (172.16.2.99) 1.682 ms 2.385 ms 2.022 ms
The traceroute output corresponds to the following hops:
Network-A-Host1
->R1
->R2
->R4
->R6
->Network-B-Host1
When we perform a traceroute on another host located at 172.16.2.222, in this case, we can see the because of the ECMP routing we selected a different path:
root@Host-A:/home/admin # traceroute 172.16.2.222 traceroute to 172.16.2.222 (172.16.2.222), 30 hops max, 46 byte packets 1 172.16.1.1 (172.16.1.1) 1.522 ms 0.852 ms 0.884 ms 2 192.168.20.3 (192.168.20.3) 1.532 ms 1.306 ms 1.409 ms 3 192.168.80.5 (192.168.80.5) 2.445 ms 1.319 ms 1.289 ms 4 192.168.60.6 (192.168.60.6) 2.082 ms 1.702 ms 1.622 ms 5 172.16.2.222 (172.16.2.222) 5.326 ms 4.561 ms 3.892 ms
The traceroute output corresponds to the following hops:
Network-A-Host1
->R1
->R3
->R5
->R6
->Network-B-Host2
Link Failure
If we have a link failure traffic should still continue flow. As an example,
when we ping a host located in Network-B
at 172.16.2.99
from Network-A
it
takes the following route:
Network-A-Host1
->R1
->R2
->R4
->R6
->Network-B-Host1
Then a link failure occurs for NET-3
, between R2
and R4
, when this occurs
the traffic will instead be routed the following path:
Network-A-Host1
->R1
->R2
->R5
->R6
->Network-B-Host1
We now hop from R2
to R5
over NET-8
instead. This can clearly be seen with
traceroute:
root@Host-A:/home/admin # traceroute 172.16.2.99 traceroute to 172.16.2.99 (172.16.2.99), 30 hops max, 46 byte packets 1 172.16.1.1 (172.16.1.1) 1.017 ms 0.776 ms 1.826 ms 2 192.168.10.2 (192.168.10.2) 2.072 ms 2.215 ms 2.577 ms 3 192.168.80.5 (192.168.80.5) 3.365 ms 1.871 ms 1.691 ms 4 192.168.60.6 (192.168.60.6) 2.324 ms 2.509 ms 2.301 ms 5 172.16.2.99 (172.16.2.99) 2.441 ms 2.976 ms 2.020 ms