Connection tracking state replication for Netfilter =================================================== $Id: README,v 1.3 2004/08/12 14:26:49 hidden Exp $ Welcome to ct_sync, a kernel-level implementation of conntrack state replication for Netfilter. The ct_sync patchset and kernel module provides a part of the basic infrastructure required to be able to implement Netfilter firewall clusters capable of doing stateful failover. However, ct_sync is only part of the whole solution, it depends on infrastructure provided by other services. An IP-level failover solution is the base of the infrastructure. Throughout development I've used keepalived (without LVS, only VRRP), it is capable of doing everything which is needed for our purposes. Of course, the keepalived configuration should be tailored to our needs, and a bunch of scripts are necessary to interface ct_sync to keepalived. While any other solution could be used to control IP failover, in this README I'll assume keepalived. The idea is the following: you define a virtual router, with a virtual IP address, which is backed by more than one real routers. However, only one of them is forwarding packets at a time, the others are simply waiting. If something wrong happens to the currently active router (called master), one of the remaining slaves gets selected, and takes over the virtual IP address. This is all implemented by keepalived, and our only remaining task is to take care of replicating the state table of Netfilter's connection tracking subsystem from the master to the slave nodes. This is exactly what ct_sync is capable of. ct_sync does not synchronize the iptables rules, that must be done independently. (Although it is not required to have the same ruleset on all nodes, under some circumstances it might be meaningful to have distinct rulesets for each node.) The role of a node determines what ct_sync should be doing. In the case of a master node, ct_sync sends messages to the slaves when the state of a connection tracking entry changes, and takes care of re-sending lost messages when some of the slaves indicate an error. If the node is a slave, it receives the update messages generated by the master, and updates its own connection tracking table accordingly. 1. Prerequisites ---------------- Before using ct_sync, make sure that you have set up the hardware and software environment necessary to operate your cluster. ct_sync needs an Ethernet network for replication traffic. Every node needs a separate interface dedicated to ct_sync, and all the communication is done using multicast UDP packets, with no encryption nor authentication. I'd suggest setting up an Ethernet network exclusively for this purpose, network equipment is really cheap nowadays, for two nodes all you need is a crossover cable. The suggested network topology looks like this: Network A | ========= switch ========= / \ / \ / \ ======== ======== node A -----repl.----- node B ======== traffic ======== \ / \ / \ / ========= switch ========= | Network B The necessary software means your preferred IP-failover solution, plus the scripts propagating node state changes to the ct_sync kernel module. 2. Patching the kernel with the ct_sync patchtree, using quilt -------------------------------------------------------------- Unfortunately, ct_sync depends on a couple of changes to the core Netfilter code, so it cannot be used without patching and recompiling your kernel. To make this easier, a complete patch-tree is provided.Our patch set is managed with quilt, so you're encouraged to apply the tree using quilt if you plan to contribute to the project. To do this, install quilt first: download from its Savannah project page (http://savannah.nongnu.org/projects/quilt), or install the binary package for your favourite Linux distribution (Debian testing/unstable includes quilt, for example.) The next step is to unpack a vanilla kernel source (2.4.26 in our case), and make two directories for quilt: '.pc' and 'patches': hidden@nienna:patchtree$ tar xjf linux-2.4.26.tar.bz2 && cd linux-2.4.26 hidden@nienna:linux-2.4.26$ mkdir .pc patches The you should copy the contents of the 'patches' directory of the CVS module to the 'patches' directory you've just created inside the kernel sources: hidden@nienna:linux-2.4.26$ cp ~/cvs/netfilter/netfilter-ha/patches/* \ patches/ At this point, quilt should know everything about the patches to be applied. You can easily test this: hidden@nienna:linux-2.4.26$ quilt unapplied raw pf_packet pf_packet_remove_warning connmark nfnetlink-ctnetlink kgdb-1.9 export_ip_conntrack_helpers export_ip_nat_helpers export_ip_conntrack_find export_hash_conntrack export_ct_id export_ip_nat_lock export_ip_nat_hash ct_sync_config_and_makefile To apply all those patches, type: hidden@nienna:linux-2.4.26$ quilt push -a NOTE: the patchtree contains a few other useful goodies as well, which are not strictly needed by ct_sync. There are kgdb 1.9, connection marks, and the raw table with NOTRACK/TRACE targets. You may safely remove these from quilt's 'series' file if you don't need them. The patchtree does not include the actual ct_sync sources, you have to copy them into your kernel tree after applying the complete patchtree. To make things easily manageable using quilt, use the following magic incantations (NOTE: a couple of commands were too long, I broke them into two lines): # First, we create the a new patch, which will contain our addon files hidden@nienna:linux-2.4.26$ quilt new ct_sync.patch Patch ct_sync is now on top # We name those files _before_ copying them to the tree hidden@nienna:linux-2.4.26$ quilt add include/linux/netfilter_ipv4/\ > {cts_buff.h,ct_sync.h,ct_sync_main.h,ct_sync_proto.h,ct_sync_sock.h} File include/linux/netfilter_ipv4/cts_buff.h added to patch ct_sync File include/linux/netfilter_ipv4/ct_sync.h added to patch ct_sync File include/linux/netfilter_ipv4/ct_sync_main.h added to patch ct_sync File include/linux/netfilter_ipv4/ct_sync_proto.h added to patch ct_sync File include/linux/netfilter_ipv4/ct_sync_sock.h added to patch ct_sync # Add C source files hidden@nienna:linux-2.4.26$ quilt add net/ipv4/netfilter/\ > {ct_sync_main.c,ct_sync_proto.c,ct_sync_sock.c} File net/ipv4/netfilter/ct_sync_main.c added to patch ct_sync File net/ipv4/netfilter/ct_sync_proto.c added to patch ct_sync File net/ipv4/netfilter/ct_sync_sock.c added to patch ct_sync # Copy those files hidden@nienna:linux-2.4.26$ cp ~/cvs/netfilter/netfilter-ha/ct_sync/*.h\ > include/linux/netfilter_ipv4 hidden@nienna:linux-2.4.26$ cp ~/cvs/netfilter/netfilter-ha/ct_sync/*.c\ > net/ipv4/netfilter/ # Refresh our quilt patch hidden@nienna:linux-2.4.26$ quilt refresh OK, we are ready. Now run your favourite configuration tool, and enable the following options under Networking options/Netfilter configuration: Netfilter netlink interface <*> Connection tracking event notifications Connection tracking netlink interface Connection tracking state synchronization At this point, you're ready to recompile your kernel, and continue setting up the system. 3. Setting up keepalived ------------------------ First, download and install keepalived. This is really simple if keepalived is available in a binary package for your favourite distribution. Debian Sarge contains keepalived, and RPMs for Fedora can be downloaded from keepalived's home page. (http://keepalived.sf.net) Once you have keepalived installed, you have to customize keepalived conf to your needs. Take a look at the example configuration file in the ct_sync distribution, probably the only things you must change are the IP addresses and the path to the ct_sync scripts. There are two important points in the example configuration. The first is the use of a sync group to bind the state of both sides of the firewall, so state transitions will take place on both VRRP instances (which are on probably two network interfaces to network A and B, respectively). The second important thing is that the priorities of all nodes should be the same, so that preemption won't occur. That is, given a node which was formerly a master and was just restarted, it won't become master unless the current master dies. This is usually the suggested mode of operation. 4. Marking traffic to be replicated ----------------------------------- If CONNMARK support is compiled in ct_sync does not replicate all conntrack entries, only those which have a special bit set in their connection mark field. For example: you usually don't need your administrative SSH connections to be replicated. You must use the CONNMARK target to mark important connections. For example, this command restores the old way of operation, and marks every connection: # iptables -t mangle -A PREROUTING -m state --state NEW \ > -j CONNMARK --set-mark 0x40000000/0x40000000 This way, all your new conntrack entries will have bit 30 of their connection mark field set. To tell ct_sync which bit of the mark field should be considered as the special 'to-be-synced' bit, you have to use the 'cmarkbit=30' parameter when loading the module. NOTE: the patch-tree of ct_sync contains an updated CONNMARK patch, which is unfortunately incompatible with older versions of iptables. If you get 'invalid argument' errors when trying to add rules using the CONNMARK target, you should upgrade your iptables user-space to the CVS version, or apply the user-space portion of the CONNMARK patch to your iptables source and recompile. 5. Loading the ct_sync module ----------------------------- Loading ct_sync is actually quite easy. The only thing you should take care of is giving the correct module parameters when loading ct_sync. The possible parameters are the following: - syncdev: the name of the Ethernet interface connected to the dedicated replication network. This interface must not be used for other purposes. - state: decides whether the node should be a master or a slave initially. It is generally a good idea to load the module as a slave, and use the /proc interface later to transition into master state. - id: a unique node id (possible values: 0-255), this should be statically configured to distinct values on all nodes of the cluster. - l2drop: whether you'd like the slave nodes to drop all layer 2 (link layer) packets except the ones coming in on the dedicated replication interface if the node is slave. This can be useful to make sure that only the master processes packets coming in. All of the parameters except l2drop and cmarkbit are mandatory, ct_sync won't load if you don't specify them, or give an invalid value. 6. Initiating state transitions using the /proc interface --------------------------------------------------------- ct_sync creates a file in /proc/net, which contains 0 if the current state of the node is slave, and 1 if it is master. Transitions can be initiated by writing the appropriate value into this file, for example to set the node to master state you need to issue the following command: # echo 1 > /proc/sys/net/ipv4/netfilter/ct_sync/state You can check if this was successful by checking the contents of the same file: # cat /proc/sys/net/ipv4/netfilter/ct_sync/state The scripts called by keepalived use this interface to synchronize ct_sync state to keepalived. 7. Unimplemented features ------------------------- Currently, ct_sync is incompatible with protocol and NAT helper modules of iptables. That is, any module which uses expectations won't work, and may cause serious problems with ct_sync. Implementing replication of expectations is high on our TODO list, so please be patient... :) 8. Further information --------------------- For more information on ct_sync, take a look at these documents: Harald Welte: ,,How to replicate the fire?'', Proceedings of the Ottawa Linux Symposium 2002, pp. 565-572 http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz Krisztian's presentation on the proof-of-concept implementation (outdated, this is now inaccurate in a couple of places) http://people.netfilter.org/~kadlec/workshop-2003-budapest/nfws_ha.sxi Harald's more up-to-date presentations at http://svn.gnumonks.org/trunk/presentation/netfilter-failover-lt2004/ http://svn.gnumonks.org/trunk/presentation/netfilter-failover-ols2004/ If you have questions, you can ask them on the netfilter-failover mailing list: netfilter-failover (at) lists (dot) netfilter (dot) org. You can subscribe to the mailing list at: http://lists.netfilter.org/mailman/listinfo/netfilter-failover