Archive for November, 2010

Why pfSense is not production ready

November 2, 2010 8 comments

(Caveat: everything said below is applicable only to pfSense 1.2.3, since this is only version I ever used.)

pfSense is a great piece of software. Easy to install, easy to configure, very powerful, lightweight, stable. It’s no surprise that so many people use it when they need a software firewall or router. But after running it for about half a year in production, I have formed the opinion that it was a wrong decision to use it in a critical role. And here’s why.

Over this period, I had exactly three issues with pfSense. One of them, the breakage of CARP due to multicasts coming back over teamed physical adapters , is mostly VMware’s fault, and I’m not going to count it against pfSense. The other two, however, are clearly a reflection of the FOSS mindset (or rather lack of resources).

The first of the two is the default number of entries in the state table: 10,000. This number is fine for home use or a small startup’s web site, but any organization beyond infancy will have more traffic and will need to increase the table size. The change is simple and can be made on the fly, so it may not seem like a problem, but it’s easy to miss, and difficult to troubleshoot: connections just randomly timeout or take a long time to establish, while pfSense happily keeps its system logs free of any notifications. Considering that each table entry occupies just 1K of memory, it would make a lot of sense to set the default to a much larger number, or, better yet, implement dynamic table resize.

The second problem is much nastier. There’s something broken with IP fragmentation handling. In our specific case it affected EDNS responses (DNSSEC-enabled servers now return 2-3KB-long responses, which necessarily become fragmented). pfSense’s scrub feature would reassemble them for analysis, then send them down to the destination, again in fragmented form, and the second fragment would come in with broken checksum, which made the reassembly at destination or any intermediary firewall impossible. There are some hints that this may actually be a problem with em driver checksum offload, but at this point it’s irrelevant: if pfSense can’t do something as basic as IP fragment processing, regardless of the underlying drivers and hardware (in this case it was actually pfSense-distributed virtual appliance, so no compatibility issues should be expected), it doesn’t qualify as a production-ready firewall.

I expect it to be gone from our environment in about two weeks.

Categories: Networking