Nov
06

High availability solutions often utilize virtual gateway protocol to avoid single point of failure. We are going to discuss high availability for the IPsec tunnel in the sample topology presented below. In this topology we need to protect traffic between VLAN67 and VLAN58 travelling across VLAN146 segment. In order to accomplish this, we will configure R6 to establish an IPsec tunnel with a virtual gateway representing both R1 and R4.

IPsec HA Scenario

On the diagram, R1 and R4 represent a HSRP virtual gateway with the IP address “155.1.145.254” (VIP – virtual IP). Let’s assume R1 is the primary router in the group, used by R6 to reach the subnet behind R5. (We will discuss the routing options to achieve this below). Since R1 is the primary router, we configure it to track the Frame-Relay cloud connection, in order to give up primary forwarder function if this connection fails. Here is the sample HSRP configuration for R1 and R4:

R1:
interface FastEthernet0/0
 ip address 155.1.146.1 255.255.255.0
 standby 1 ip 155.1.146.254
 standby 1 timers 1 3
 standby 1 priority 200
 standby 1 preempt
 standby 1 name VLAN146
 standby 1 track Serial 0/0 110
 crypto map VPN redundancy VLAN146

R4:
interface FastEthernet0/1
 ip address 155.1.146.4 255.255.255.0
 standby 1 ip 155.1.146.254
 standby 1 timers 1 3
 standby 1 preempt
 standby 1 name VLAN146
 crypto map VPN redundancy VLAN146

We use the simplest form of tracking, by just monitoring the interface status. This will not work properly in case of single DLCI failure. Therefore, in real-life scenarios, you may want to track a sub-interface state or use IP SLA based trackers. Note the use of a name for the standby group. This is needed by IPsec HA configuration later.

Stateless IPsec HA

The stateless IPsec HA bases on the following steps and functions.

  1. Configure crypto-map to source IPsec Phase1/Phase2 packets off the HSRP VIP. This is accomplished by applying a crypto-map under interface using the following syntax: crypto-map VPN redundancy VLAN146 binding the ISAKMP/IPsec sockets to the virtual IP address. After this, you configure the remote end (in this case – R6) to establish IPsec tunnel with the VIP address.
  2. When needed, VIP migrates from the failed router to the backup by using HSRP failover mechanics. This procedure uses HSRP timers and/or the tracking configuration in the router. If the primary router loses connection to the shared segment, it would take slightly more than the hold-time for HSRP for the backup router to take the active role. Using HSRP millisecond timers you may set this interval to a subsecond value, e.g. standby 1 timers msec 15 msec 50
  3. ISKAMP uses DPD (Dead Peer Detection) feature to detect the loss of original peer. The procedure has been suggested and standardized by Cisco in RFC 3706. DPD is a scalable way to detect remote peer failure. Instead of sending periodic “hello” packets, DPD relies on the presence of inbound traffic to verify remote peer availability. This significantly reduces the amount of management traffic in large hub-and-spoke VPN deployments. Here is how DPD works
    • Dead peer detection is only performed on demand (there is an option to make it periodic, though). Suppose we have a special interval, called DPD interval, which control the amount of time the router believes that remote end is alive, without receiving any traffic from it.
    • If the DPD interval has expired and the local router has traffic to send to the remote end, the local router will send out special ISAKMP “R-U-THERE” message. The data traffic will still be encrypted and sent out at the same time. If the remote end responds with and ACK message, the local router resets the timer and continues sending outbound traffic. Every valid input data packet will also reset the DPD interval, letting the router know that the remote end is alive.
    • If the remote peer does not respond to R-U-THERE message, the local router will retry a few times (right now, it seems to be hard-coded to 5 times), waiting the retry-timeout interval (2 seconds by default) every turn. If, after some attempts the remote end still does not respond, the local router considers the tunnel to be dead, and deletes ISAKMP/IPsec SAs. After this, the router will try to re-establish the tunnel if the new traffic triggers the respective crypto-map entry.

    Note that ISAKMP DPD and HSRP failover occurs in parallel. Default HSRP timers are usually quick enough to fall-back to the backup router, before the DPD declares remote end dead. After this, ISAKMP re-negotiation with the new active router occurs and new IPsec tunnel is established.

    (Since the version 12.3(7)T, IOS supports periodic DPD messages, which turns the protocol essentially into a keepalive mechanism. Note that periodic messages will significantly increase the amount of traffic in deployments with large number of tunnels.)

  4. Special feature that allows only the active HSRP router to inject the route for the remote subnet (in our case it’s VLAN67) into the dynamic routing protocol. In our example this means R5 should try to route towards the remote network using the proper (active) exit point. For example, if both R1 and R4 inject the default route towards R5, and R1 loses connection to VLAN146, R1 may still advertise the default route.

    This goal could be achieved using RRI (reverse route injection) and redistributing static routes into IGP (RIP in our case) on both R1 and R4. Note that this procedure may slow down convergence, as the secondary router it will take some time to inject and propagate new information through the IGP domain. In addition to that, old routing information may still exist in the domain, further slowing down the process. In situation when there is single IGP running over the whole domain and no summarization takes place the use of RRI might be redundant, as the routing information will automatically be advertised off the active router. For example, if we were running RIP up to R6, then there would be no need for RRI, since VLAN67 information will be propagated down to R5 across both routers. In this situation, you may need to tune metric values (e.g. using offset-lists with RIP) to make sure R5 prefers active router first.

Here is a sample configuration for stateless IPsec HA. Note that we tuned ISAKMP DPD times down to the minimum allowed value. R1 and R4 are configured to use VIP for ISAKMP/IPsec tunnel source, and redistribute RRI routes into RIP.

R1:
crypto isakmp policy 10
 encr 3des
 hash md5
 authentication pre-share
!
crypto isakmp key CISCO address 155.1.146.6
crypto isakmp keepalive 10 periodic
!
!
crypto ipsec transform-set 3DES_MD5 esp-3des esp-md5-hmac
!
ip access-list extended TRAFFIC
 permit ip 155.1.58.0 0.0.0.255 155.1.67.0 0.0.0.255
!
crypto map VPN 10 ipsec-isakmp
 set peer 155.1.146.6
 set transform-set 3DES_MD5
 match address TRAFFIC
 reverse-route static
!
router rip
 redistribute static
!
interface FastEthernet0/0
crypto map VPN redundancy VLAN146

R4:
crypto isakmp policy 10
 encr 3des
 hash md5
 authentication pre-share
!
crypto isakmp key CISCO address 155.1.146.6
crypto isakmp keepalive 10 periodic
!
!
crypto ipsec transform-set 3DES_MD5 esp-3des esp-md5-hmac
!
ip access-list extended TRAFFIC
 permit ip 155.1.58.0 0.0.0.255 155.1.67.0 0.0.0.255
!
crypto map VPN 10 ipsec-isakmp
 set peer 155.1.146.6
 set transform-set 3DES_MD5
 match address TRAFFIC
 reverse-route static
!
router rip
 redistribute static
!
interface FastEthernet0/1
crypto map VPN redundancy VLAN146

R6:
ip route 0.0.0.0 0.0.0.0 155.1.146.254
!
crypto isakmp policy 10
 encr 3des
 hash md5
 authentication pre-share
!
crypto isakmp key CISCO address 155.1.146.254
crypto isakmp keepalive 10 periodic
!
!
crypto ipsec transform-set 3DES_MD5 esp-3des esp-md5-hmac
!
ip access-list extended TRAFFIC
 permit ip 155.1.67.0 0.0.0.255 155.1.58.0 0.0.0.255
!
crypto map VPN 10 ipsec-isakmp
 set peer 155.1.146.254
 set transform-set 3DES_MD5
 match address TRAFFIC
!
interface FastEthernet0/0.146
 encapsulation dot1Q 146
 ip address 155.1.146.6 255.255.255.0
 crypto map VPN

Testing stateless IPsec HA

Consider that R1 is the primary HSRP router. Assuming that tunnel is established initially, let’s check the state on all key routers in the topology.

Rack1R6#show crypto isakmp sa detail
Codes: C - IKE configuration mode, D - Dead Peer Detection
       K - Keepalives, N - NAT-traversal
       X - IKE Extended Authentication
       psk - Preshared key, rsig - RSA signature
       renc - RSA encryption

C-id  Local           Remote          I-VRF    Status Encr Hash Auth DH Lifetime Cap.
3     155.1.146.6     155.1.146.254            ACTIVE 3des md5  psk  1  23:55:02 D
       Connection-id:Engine-id =  3:1(software)

Rack1R1#show standby
FastEthernet0/0 - Group 1
  State is Active
    5 state changes, last state change 02:31:23
  Virtual IP address is 155.1.146.254
  Active virtual MAC address is 0000.0c07.ac01
    Local virtual MAC address is 0000.0c07.ac01 (v1 default)
  Hello time 1 sec, hold time 3 sec
    Next hello sent in 0.684 secs
  Preemption enabled
  Active router is local
  Standby router is 155.1.146.4, priority 100 (expires in 2.576 sec)
  Priority 200 (configured 200)
    Track interface Serial0/0 state Up decrement 110
  IP redundancy name is "VLAN146" (cfgd)

Rack1R1#show ip route static
     155.1.0.0/24 is subnetted, 9 subnets
S       155.1.67.0 [1/0] via 155.1.146.6

Rack1R5#ping 155.1.67.7 source 155.1.58.5

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 155.1.67.7, timeout is 2 seconds:
Packet sent with a source address of 155.1.58.5
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 64/98/220 ms

Next we start continuous ping flow from VLAN67 to VLAN58. We are going to check how long will it take for IPsec tunnel to switch over to the backup router. Initially R1 is the primary router on the segment, and we shut down R1’s Ethernet interface in process of pinging from SW1 (VLAN67). In the output below you can see the interval of missed pings, which lasted for approximately 24 seconds. It took around 20 seconds for R6 to declare the old SA dead. The remaining 4 seconds probably went to establishing the new SA after this and R4 advertising the new prefix to R5.

Rack1SW1#ping 155.1.58.5 repeat 1000000 timeout 2

Type escape sequence to abort.
Sending 1000000, 100-byte ICMP Echos to 155.1.58.5, timeout is 2 seconds:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.
...........!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.
Success rate is 98 percent (1221/1235), round-trip min/avg/max = 67/70/177 ms

Here a part of the output of the debug crypt isakmp command on R6. This output appears after we shut down R1’s Fa0/0 interface and R6 loses returning traffic from its peer. After this, we can see R6 trying to probe the VIP address five times before declaring the peer dead and removing the IPsec SA.

ISAKMP (0:134217729): incrementing error counter on sa, attempt 4 of 5: PEERS_ALIVE_TIMER
ISAKMP: set new node -1535446578 to QM_IDLE
CryptoEngine0: generate hmac context for conn id 1
ISAKMP:(0:1:SW:1):Sending NOTIFY DPD/R_U_THERE protocol 1
        spi 2235112032, message ID = -1535446578
ISAKMP:(0:1:SW:1): seq. no 0x42B0D70B
ISAKMP:(0:1:SW:1): sending packet to 155.1.146.254 my_port 500 peer_port 500 (I) QM_IDLE
ISAKMP:(0:1:SW:1):purging node -1535446578
ISAKMP:(0:1:SW:1):Input = IKE_MESG_FROM_TIMER, IKE_TIMER_PEERS_ALIVE
ISAKMP:(0:1:SW:1):Old State = IKE_P1_COMPLETE  New State = IKE_P1_COMPLETE 

ISAKMP (0:134217729): incrementing error counter on sa, attempt 5 of 5: PEERS_ALIVE_TIMER
ISAKMP:(0:1:SW:1):peer 155.1.146.254 not responding!
ISAKMP:(0:1:SW:1):peer does not do paranoid keepalives.

ISAKMP:(0:1:SW:1):deleting SA reason "P1 errcounter exceeded (PEERS_ALIVE_TIMER)" state (I) QM_IDLE       (peer 155.1.146.254)
IPSEC(key_engine): got a queue event with 1 kei messages
Delete IPsec SA by DPD, local 155.1.146.6 remote 155.1.146.254 peer port 500
IPSEC(delete_sa): deleting SA,
  (sa) sa_dest= 155.1.146.6, sa_proto= 50,
    sa_spi= 0x4C62A262(1281532514),
    sa_trans= esp-3des esp-md5-hmac , sa_conn_id= 2001,
  (identity) local= 155.1.146.6, remote= 155.1.146.254,
    local_proxy= 155.1.67.0/255.255.255.0/0/0 (type=4),
    remote_proxy= 155.1.58.0/255.255.255.0/0/0 (type=4)
crypto engine: deleting IPSec SA SW:1
crypto_engine: IPSec SA delete

A snapshot of routers output after the failover has occurred:

Rack1R4#show standby
FastEthernet0/1 - Group 1
  State is Active
    26 state changes, last state change 01:30:40
  Virtual IP address is 155.1.146.254
  Active virtual MAC address is 0000.0c07.ac01
    Local virtual MAC address is 0000.0c07.ac01 (v1 default)
  Hello time 1 sec, hold time 3 sec
    Next hello sent in 0.000 secs
  Preemption enabled
  Active router is local
  Standby router is unknown
  Priority 100 (default 100)
  IP redundancy name is "VLAN146" (cfgd)

Rack1R5#show ip route 155.1.67.0
Routing entry for 155.1.67.0/24
  Known via "rip", distance 120, metric 1
  Redistributing via rip
  Last update from 155.1.0.4 on Serial0/0, 00:00:06 ago
  Routing Descriptor Blocks:
  * 155.1.0.4, from 155.1.0.4, 00:00:06 ago, via Serial0/0
      Route metric is 1, traffic share count is 1

Rack1R4#show ip route static
     155.1.0.0/24 is subnetted, 8 subnets
S       155.1.67.0 [1/0] via 155.1.146.6

As for using this feature in real life, be aware of its slow convergence time. (not to mention odd bugs that may vary based on your IOS version ;) Still better then nothing, but you may resort to running a dynamic routing protocol in GRE/DMVPN tunnels encrypted by IPsec. Using IGP for fast convergence usually results in better convergence and stability. However, if you need a solution for remote VPN server redundancy, FHRPs (first-hop redundanly protocols) or SLB (server load-balancing) might be the only option.

About Petr Lapukhov, 4xCCIE/CCDE:

Petr Lapukhov's career in IT begain in 1988 with a focus on computer programming, and progressed into networking with his first exposure to Novell NetWare in 1991. Initially involved with Kazan State University's campus network support and UNIX system administration, he went through the path of becoming a networking consultant, taking part in many network deployment projects. Petr currently has over 12 years of experience working in the Cisco networking field, and is the only person in the world to have obtained four CCIEs in under two years, passing each on his first attempt. Petr is an exceptional case in that he has been working with all of the technologies covered in his four CCIE tracks (R&S, Security, SP, and Voice) on a daily basis for many years. When not actively teaching classes, developing self-paced products, studying for the CCDE Practical & the CCIE Storage Lab Exam, and completing his PhD in Applied Mathematics.

Find all posts by Petr Lapukhov, 4xCCIE/CCDE | Visit Website


You can leave a response, or trackback from your own site.

14 Responses to “IPsec VPN High Availability with HSRP”

 
  1. Petr,
    This is a great post and certainly directly pertains to the labs I am working on in the IE Volume 1 Workbook. Still I have questions. Assuming that I am trying to get this to function for the purpose of accomplishing it on the CCIE lab if I see it, what about the code versions? It doesn’t seem to me that I can do commands such as “reverse-route static” and “crypto isakmp keepalive 10 periodic” as the options for “static” and “periodic” are not there.

    So, am I overthinking it? If I can get it to fail from the active to the standby and the pings resume, but I cant get it to fail back in the opposite direction would you say I should move on anyhow? Is it supposed to fail back to the other router if preempt is enabled?

    Thanks!

    Brandon

  2. Brandon,

    IPsec HA first appeared in the 12.2T code train. Back when i was using 12.2T i recall it had a lot of bugs like not deleteing the IPsec SA once ISAKMP SA has been torn down. This might be the issue in your case, since this is what may happen when you switch back to the primary router. In my 12.4 tests i found it switching back correctly, but this is only with 12.4.

    As for the command syntax i’m using, it’s 12.4; omitting the “periodic” and “static” keywords in 12.2T should not affect the core functionality.

    In the lab they most likely use some show command to test your configuration, like “show standby”, “show crypto isakmp sa” and “show crypto ipsec sa”. Additionally they probably run simple ping test, but i’m sure they dont do in-depth testing like switching back and forth the active router.

  3. Maxwell says:

    Hi Petr,

    This is a basic question I am asking so please excuse me if I’m not up to standards.

    I have checked the HSRP scenario several times and its working fine, however I used only preempt on the Router which has high priority. What is the advantage or any standard of having preempted on both Routers who will have HSRP?

    As per my understanding the periodic keyword in ISAKMP will help DPD to reduce the management traffic in a large HUB & Spoke environments; however it is not a mandatory command isn’t it?

    If I look from the CCIE LAB point of view what is the best way to use the preempt & periodic, and maintain my points and still have the best practice?

    Regards
    Maxwell

  4. cisco asa says:

    IOS 12.2T had indeed a lot of bugs with IPSEC high availability using HSRP. We had to upgrade many routers to 12.4 to solve our problems but we are very happy with it now.

  5. Furqan Yaseen says:

    Very Good Stuff !!!! Solution clear more concepts about HSRP Tech.

  6. sunil says:

    I want to active both interface in my VPN router cat it will work.

    Please suggest

    I have 2 vpn box i want to configure 1500 remote site per vpn means load balance ,if 1 vpn goes down 2nd vpn will take care another 3000 site.

    please reply .

  7. Johan says:

    Hi,

    The exponation of how DPD works is far better then the one found on Cisco.com!

    Love this site, thanks
    Johan

  8. Brian says:

    Hi Petr,

    Any suggestions when using two ISP’s? I’m running a multihomed BGP environment with HSRP on the interfaces of my advertised subnet. I configured IPSec on both routers but it seems that I cannot not get phase 1/2 negotiation to begin unless I apply my crypto map to my external interface (ISP facing interface), which isn’t running HSRP. I was initially trying to apply the crypto map to the interface that has my advertised subnet VIP.

    • @Brian

      I would suggest running IPSec encrypted GRE tunnels and using either

      1) GRE keepalives with static routes
      2) Static reliable routing with object tracking
      3) IGP protocol across the tunnels

      to provide redundancy. This scheme should have relatively fast convergence and help you avoid any IPsec IKE re-negotiations in case of the primary gateway failure. Moreother, you may even use it for active-active load-balancing by means of equal-cost multipath.

  9. Asif says:

    Hi Petr

    I love this site, great work done. You Guyz are awsome

    I would like you guyz to post more curious scenario for VPN.
    Dual Head IPsec over GRE with load Balancing / load Sharing, full redundant scenarios ( not DMVPN )

    Referring to the same diagram as above ; how do we get this working ….

    R4 and R5 to establish IPSEC Gre Tunnel
    R1 and R5 to establish IPSEC Gre Tunnel
    Ospf / Eigrp will be routing protocol over GRE Tunnel

    Load on Two Tunnel is shared unequally as Internet Bandwidth of R4 is 10MB and Internet Bandwidth of R1 is 3MB.

  10. Hemant says:

    Hi Petr,

    I have a question here. In this case, why do we need to configure isakmp keepalives? As per my understanding, keepalives should be needed only when we need to detect if a peer is available or not and then failover to another peer if first peer is not available. However, in this case, from R5′s point of view, there’s only a single peer (because set peer on R5 would be HSRP Virtual IP address). So, in this case, HSRP should take care of the failover feature and it should be transparent to R5. Hence, keepalives shouldn’t be required. Am I wrong or misunderstanding something?

  11. [...] Peer Detection Leave a Comment Posted by joshualixin on January 27, 2011 (Extract from http://blog.ine.com/2008/11/06/ipsec-vpn-high-availability-with-hsrp/) ISKAMP uses DPD (Dead Peer Detection) feature to detect the loss of original peer. The procedure [...]

  12. Wilson says:

    Hi Petr Lapukhov,
    What IP address is set as remote on R5?

  13. Fahad Afzal says:

    Great post!! Thanks a ton..

 

Leave a Reply

Categories

CCIE Bloggers