Archive
Cisco ASA Group Policy Access Lists Are Evil
Anyone who ever had to use a Cisco ASA firewall knows how access control works. You have to create ACLs, which are collections of rules that specify allowed and denied traffic. Every rule specifies the source of the traffic and its destination. If the source and destination IPs and/or ports of the packet match the source and destination IPs and/or ports of a permissive rule, it’s allowed to pass. If the match is with a denying rule, the packet is not allowed. Simple, right?
Not exactly. There’s an exception to this principle, and that’s the ACLs used for filtering traffic going through VPN tunnels. They are called “group policy access lists” or just “vpn filters” (from the “vpn-filter” CLI command used to configure them). In those ACLs, the definition of source and destination is turned upside down. Though not always. It’s more like you no longer know what they are – at all. Source is now everything that’s on the remote side of the tunnel, and destination is local to the firewall. The logic is now very different, even though the syntax remains the same.
Now, let’s take a look at what this means for our security. We will consider the same ACL, first its meaning when used for standard access rules, and then its vpn filter meaning.
access-list web-access extended permit tcp 192.168.1.0 255.255.255.0 host 5.5.5.5 eq http
This rule will allow all our internal users on 192.168.1.0/24 network to access the external web server on 5.5.5.5. It’s very straightforward, does exactly what we want it to, no surprises here. But if that web server happens to sit on the other side of a VPN tunnel, everything changes. We need to rewrite the rule to put that web server as the source, since it’s on the remote end, and our internal network as the destination:
access-list web-access extended permit tcp host 5.5.5.5 eq http 192.168.1.0 255.255.255.0
It achieves the same result, but also something else, completely unintended and very insecure. Now, anyone, who has root access to that web server, will be able to access any tcp port on any host on 192.168.1.0 network, as long as they can open that connection with port 80 as the source. The ASA no longer knows which side has to initiate the connection and doesn’t have any means to distinguish between legitimate client-to-webserver connection and someone running netcat to hit the client network. The client network is suddenly no more secure than that remote web server.
And this is why VPN filters are evil.
P.S. VPN filters actually have one useful capability, which is conspicuously absent from standard ASA access control mechanism: by the virtue of being applied to traffic between two specific sets of networks, they effectively implement the concept of zones, so popular (and so useful!) in other vendors’ firewalls. I’ll cover this concept – or rather the lack of it and the nastiness of the necessary workarounds – in a future post.
Why pfSense is not production ready
(Caveat: everything said below is applicable only to pfSense 1.2.3, since this is only version I ever used.)
pfSense is a great piece of software. Easy to install, easy to configure, very powerful, lightweight, stable. It’s no surprise that so many people use it when they need a software firewall or router. But after running it for about half a year in production, I have formed the opinion that it was a wrong decision to use it in a critical role. And here’s why.
Over this period, I had exactly three issues with pfSense. One of them, the breakage of CARP due to multicasts coming back over teamed physical adapters , is mostly VMware’s fault, and I’m not going to count it against pfSense. The other two, however, are clearly a reflection of the FOSS mindset (or rather lack of resources).
The first of the two is the default number of entries in the state table: 10,000. This number is fine for home use or a small startup’s web site, but any organization beyond infancy will have more traffic and will need to increase the table size. The change is simple and can be made on the fly, so it may not seem like a problem, but it’s easy to miss, and difficult to troubleshoot: connections just randomly timeout or take a long time to establish, while pfSense happily keeps its system logs free of any notifications. Considering that each table entry occupies just 1K of memory, it would make a lot of sense to set the default to a much larger number, or, better yet, implement dynamic table resize.
The second problem is much nastier. There’s something broken with IP fragmentation handling. In our specific case it affected EDNS responses (DNSSEC-enabled servers now return 2-3KB-long responses, which necessarily become fragmented). pfSense’s scrub feature would reassemble them for analysis, then send them down to the destination, again in fragmented form, and the second fragment would come in with broken checksum, which made the reassembly at destination or any intermediary firewall impossible. There are some hints that this may actually be a problem with em driver checksum offload, but at this point it’s irrelevant: if pfSense can’t do something as basic as IP fragment processing, regardless of the underlying drivers and hardware (in this case it was actually pfSense-distributed virtual appliance, so no compatibility issues should be expected), it doesn’t qualify as a production-ready firewall.
I expect it to be gone from our environment in about two weeks.
Fixing VM-based pfSense CARP announcement echoes when using teamed network adapters
A few people who have tried to run pfSense as a virtual appliance in an ESX(i) host have found that CARP may refuse to work. Both pfSense nodes remain in “Backup” state – none of them is willing to take the Master role and start serving the VIP. This problem can be observed only if the VIP belongs to a virtual network interface that has multiple underlying physical adapters in a teamed configuration.
The best clue to the solution can be found in pfSense logs. There, one can see that the primary pfSense node actually tries to become a Master, but every time it receives a CARP announcement with the same advertisement frequency as its own, which makes it drop to Backup state (a router must see only lower frequency advertisements to remain Master). In fact, those advertisements are its own.
So, the real question is: why does the router VM receive IP packets that it has just sent? To answer it, we need to remember that, first, CARP advertisements are multicast, and, second, the typical ESX teaming setup uses the “Route based on the originating virtual switch port ID” option. This setting means that any given vNIC will consistently use the same pNIC, unless a hardware failure occurs. When this setting is used by the host, the physical switch, which the host’s pNICs are connected to, has all its corresponding ports configured by default, with no link aggregation.
Now, what happens when a CARP advertisement is sent? It exits the host on one pNIC, travels to the switch, where, being a multicast, it’s sent to other switch ports, including the other pNICs in the same team as the originating pNIC. The multicast comes back into the host, where it’s sent to all VMs on the same vSwitch, including the originating router. Oops.
We can argue whether or not this ESX behavior is correct, but the important fact is that VMware doesn’t seem to be interested in changing this behavior (the problem existed in 3.5 and still exists in 4.0). Instead, but there has been no VMware solution until 4.0U2. If you’re on this version, you can use the new Net.ReversePathFwdCheckPromisc option (refer to 4.0U2 release notes). Or we can fix it by ourselves, in a very simple way. We need to make the switch aware of the teamed nature of the pNICs involved. This way the switch won’t send the multicast packets back to the host.
I will let you figure out the correct setting for the switch (different vendors use different names for the same thing: Cisco has EtherChannel, Nortel calls it MultiLink Trunking, etc.). As for the host side, change the load balancing algorithm from the default to “Route based on IP hash”. Just keep in mind that until you have made the changes on both ends, that connection may not work, so make sure you’re not transferring anything important to/from the VMs on the same vSwitch while you’re making the changes. (I’m assuming your management network is on a different vSwitch; otherwise you’re on your own.)
Update: Thank you to Anne Jan Elsinga for pointing out that 4.0U2 provides a new option that can be used to solve the problem. I’ve modified the post to reflect this.
Missing access points after upgrade of Cisco Wireless LAN Controller to Release 5.2
If you upgrade your Cisco Wireless LAN Controller to Release 5.2, and suddenly some or all access points disappear, go to Controller, Advanced, Master Controller Mode, check the box, and power cycle the missing access points.
Root cause: 5.2 introduces CAPWAP protocol as the replacement for LWAPP. Some access points don’t transition to the new protocol unless the Master controller tells them to. Unfortunately, Master Controller Mode gets disabled after every reboot of the controller. Since you always have to reboot to apply the new software release, the problematic access points are left without guidance until you manually check that Master Controller Mode box.