Home > Networking, Virtualization > Fixing VM-based pfSense CARP announcement echoes when using teamed network adapters

Fixing VM-based pfSense CARP announcement echoes when using teamed network adapters

A few people who have tried to run pfSense as a virtual appliance in an ESX(i) host have found that CARP may refuse to work. Both pfSense nodes remain in “Backup” state – none of them is willing to take the Master role and start serving the VIP. This problem can be observed only if the VIP belongs to a virtual network interface that has multiple underlying physical adapters in a teamed configuration.

The best clue to the solution can be found in pfSense logs. There, one can see that the primary pfSense node actually tries to become a Master, but every time it receives a CARP announcement with the same advertisement frequency as its own, which makes it drop to Backup state (a router must see only lower frequency advertisements to remain Master). In fact, those advertisements are its own.

So, the real question is: why does the router VM receive IP packets that it has just sent? To answer it, we need to remember that, first, CARP advertisements are multicast, and, second, the typical ESX teaming setup uses the “Route based on the originating virtual switch port ID” option. This setting means that any given vNIC will consistently use the same pNIC, unless a hardware failure occurs. When this setting is used by the host, the physical switch, which the host’s pNICs are connected to, has all its corresponding ports configured by default, with no link aggregation.

Now, what happens when a CARP advertisement is sent? It exits the host on one pNIC, travels to the switch, where, being a multicast, it’s sent to other switch ports, including the other pNICs in the same team as the originating pNIC. The multicast comes back into the host, where it’s sent to all VMs on the same vSwitch, including the originating router. Oops.

We can argue whether or not this ESX behavior is correct, but the important fact is that VMware doesn’t seem to be interested in changing this behavior (the problem existed in 3.5 and still exists in 4.0). Instead, but there has been no VMware solution until 4.0U2. If you’re on this version, you can use the new Net.ReversePathFwdCheckPromisc option (refer to 4.0U2 release notes). Or we can fix it by ourselves, in a very simple way. We need to make the switch aware of the teamed nature of the pNICs involved. This way the switch won’t send the multicast packets back to the host.

I will let you figure out the correct setting for the switch (different vendors use different names for the same thing: Cisco has EtherChannel, Nortel calls it MultiLink Trunking, etc.). As for the host side, change the load balancing algorithm from the default to “Route based on IP hash”. Just keep in mind that until you have made the changes on both ends, that connection may not work, so make sure you’re not transferring anything important to/from the VMs on the same vSwitch while you’re making the changes. (I’m assuming your management network is on a different vSwitch; otherwise you’re on your own.)

Update: Thank you to Anne Jan Elsinga for pointing out that 4.0U2 provides a new option that can be used to solve the problem. I’ve modified the post to reflect this.

Advertisements
  1. March 30, 2010 at 19:56

    Thanks for posting that work around, I’ve known about this issue for quite some time but never encountered a situation where I needed to fix it before now.

    This VMware issue affects all multicast traffic, not just CARP, and such broken behavior also breaks other services. In the case I hit, OSPF was the thing that was causing grief. OpenOSPFD was seeing all its own traffic echoed back to it, hellos, LS-Updates, etc. OpenOSPFD sees that as some other router which caused it to flake out in weird, inconsistent ways.

    This did resolve the problem for OSPF as well, thanks!

  2. August 5, 2010 at 09:49

    Thanks for the post. During my search for an answer I also came across this one.

    Next to this solution there seems to be another one. Since ESX 4.0 Update 2 it is possible to discard multicast packets received from a second network adapter connected to the same vSwitch.

    According to the release notes: “Duplicate multicast packets are generated when the vSwitch has at least two vmnics and promiscuous mode enabled
    Consider a vSwitch that has more than one uplink and has the promiscuous mode enabled. Some of the packets that come in from the uplinks that are not currently used by the promiscuous port, are not discarded. This behavior might mislead some applications, such as the CARP protocol instance.
    This issue is resolved in this release. Starting with this release the Net.ReversePathFwdCheckPromisc configuration option is provided to explicitly discard all the packets coming in from the currently unused uplinks, for the promiscuous port.
    Note: If the value of the Net.ReversePathFwdCheckPromisc configuration option is changed when the ESX instance is running, you need to enable or re-enable the promiscuous mode for the change in the configuration to take effect.”

  3. Markus Mann
    February 24, 2013 at 16:43

    YMMD! I had the same problem on an ESXi 5.0u1. But i had to reboot the ESXi to make it work, disabling and re-enabling promisc mode on the vSwitch didn’t help in my case.

  1. March 24, 2010 at 09:49

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: