Archive

Archive for March, 2010

Fixing VM-based pfSense CARP announcement echoes when using teamed network adapters

A few people who have tried to run pfSense as a virtual appliance in an ESX(i) host have found that CARP may refuse to work. Both pfSense nodes remain in “Backup” state – none of them is willing to take the Master role and start serving the VIP. This problem can be observed only if the VIP belongs to a virtual network interface that has multiple underlying physical adapters in a teamed configuration.

The best clue to the solution can be found in pfSense logs. There, one can see that the primary pfSense node actually tries to become a Master, but every time it receives a CARP announcement with the same advertisement frequency as its own, which makes it drop to Backup state (a router must see only lower frequency advertisements to remain Master). In fact, those advertisements are its own.

So, the real question is: why does the router VM receive IP packets that it has just sent? To answer it, we need to remember that, first, CARP advertisements are multicast, and, second, the typical ESX teaming setup uses the “Route based on the originating virtual switch port ID” option. This setting means that any given vNIC will consistently use the same pNIC, unless a hardware failure occurs. When this setting is used by the host, the physical switch, which the host’s pNICs are connected to, has all its corresponding ports configured by default, with no link aggregation.

Now, what happens when a CARP advertisement is sent? It exits the host on one pNIC, travels to the switch, where, being a multicast, it’s sent to other switch ports, including the other pNICs in the same team as the originating pNIC. The multicast comes back into the host, where it’s sent to all VMs on the same vSwitch, including the originating router. Oops.

We can argue whether or not this ESX behavior is correct, but the important fact is that VMware doesn’t seem to be interested in changing this behavior (the problem existed in 3.5 and still exists in 4.0). Instead, but there has been no VMware solution until 4.0U2. If you’re on this version, you can use the new Net.ReversePathFwdCheckPromisc option (refer to 4.0U2 release notes). Or we can fix it by ourselves, in a very simple way. We need to make the switch aware of the teamed nature of the pNICs involved. This way the switch won’t send the multicast packets back to the host.

I will let you figure out the correct setting for the switch (different vendors use different names for the same thing: Cisco has EtherChannel, Nortel calls it MultiLink Trunking, etc.). As for the host side, change the load balancing algorithm from the default to “Route based on IP hash”. Just keep in mind that until you have made the changes on both ends, that connection may not work, so make sure you’re not transferring anything important to/from the VMs on the same vSwitch while you’re making the changes. (I’m assuming your management network is on a different vSwitch; otherwise you’re on your own.)

Update: Thank you to Anne Jan Elsinga for pointing out that 4.0U2 provides a new option that can be used to solve the problem. I’ve modified the post to reflect this.

Advertisement