Friday, May 05, 2006

Conning the Mark: Multiwan connections using IPTables, MARK, CONNMARK and iproute2

Over the past few months, I have been configuring a replacement multi-wan NAT router/firewall for work. My collegues and I decided to use Voyage Linux (a derivative of Debian Linux for embedded devices) on a Soekris net4801 box. See also the pictures on my coworker's (cyboc) blog.

Unlike other organizations who use their multi wan connections to do automatic load balancing, and traffic shaping, we simply use our extra WAN connection for redundancy. Both connections DNAT to an internal server with two distinct external IP addresses. The idea is that users can access the server using either of the IP addresses, though they might normally prefer one over the other. Users would be able to switch to the other connection should the one they were using provide a less than optimal result. No automatic load balancing is required nor desired. In simple terms, our network looks something like this:


As part of this configuration, we wanted to have network traffic that came in on one interface properly exit again through that same interface. I was able to configure most of the firewall and NAT parts of the router with relative ease using iptables but was stumped when it came to the routing table and how to route packets in and out of their own respective interfaces.

The Problem Defined

Traditional routing tables generally only allow for one default gateway at a time. Multiple default gateways have to be specified in priority sequence. Thus, there is no guarantee that an incoming packet on one line will receive a reply routed back through that same interface. At best, the return packet will go out the default gateway or some other static gateway defined according to the routing table. Furthermore, traditional routing tables only allow destination based routing. That is, we can create specific routing entries to dictate a route given a destination address but not based on the source address.

Enter IPROUTE2

After some research, I discovered that IPROUTE2 solves a lot of my problems. IPRoute2, amongst many other things, allows for source based routing, and also allows for routing based on packet markers. More on packet markers later.

Be warned though, IPROUTE2 is a rather complex beast! The user manual is far from friendly, and it took me a few tries to get it to do what I wanted to do.

Attempt 1: Source based routing

My first attempt at my problem involved source-based routing: Since most of the time, users will be using WAN connection #1 for this server, route all traffic originating from the IP address of my server out WAN connection 1. This works, however, it requires a manual change to the routing table when WAN connection 1 goes down. An administrator would have to switch the source based routing rule to now say route all traffic originating from server IP address out WAN connection #2.

Wouldn't it be simpler if the router could somehow just remember what connection the packet came in and route subsequent replies through that same interface?

Attempt 2: Packet marking based routing

My second attempt at my problem centered around being able to track connections and routing accordingly. To do this, I discovered iptables' packet marking and connection marking.

In short, iptables has two types of targets that one can use to mark packets: CONNMARK and MARK. CONNMARK marks a connection. Once marked, packets in the same "conversation" are also marked with the same CONNMARK indicator.

Another marker is the packet marker denoted by iptables' MARK target. (Couldn't they have come up with better names?!) The MARK target only marks individual packets. They are not resilient like the connmark indicators - i.e. they only retain their value for the duration of that one packet's lifespan.

Now when I first went diving into this, I erroneously thought that one could simply set the CONNMARK when a packet came in one WAN line, and have the routing tables detect that connmark and route accordingly. As I soon discovered though, iproute2 only recognizes packet MARKs not CONNMARKs. Thus, to do what I wanted, the CONNMARK value had to be copied to the MARK value each time a packet was about to be routed.

Solution Part 1: Configuring the mangle table in iptables

Given the above restrictions with CONNMARK and MARK, I devised in plain English the steps I want my router to take when marking packets and when routing.

  • If this is the first packet in a connection (i.e. it doesn't have a CONNMARK nor a MARK) then, set the MARK of the packet to 1 or 2 depending on which line it came in. Save this MARK to the CONNMARK value and accept the packet for routing.
  • If, however, a CONNMARK does exist, then restore that CONNMARK to the MARK value. Check to see what the MARK value is. If it is 1 or 2, then ACCEPT the packet for routing.
Once the packet is accepted for routing, route basis these rules:

  • If the packet has a MARK value of 1 then use the routing table for WAN connection #1.
  • Else if the packet has a MARK value of 2, then use the routing table for WAN connection #2.

Now that you understand the English algorithm, I will translate it into pseudocode in the same order in which it must appear in iptables' mangle table:

  • Restore the packet's CONNMARK to the MARK. (If one doesn't exist, then no mark is set.)
  • If packet MARK is 1, then it means that there is already a connection mark and the original packet came in on WAN #1, so ACCEPT.
  • Else, we need to mark the packet. If the packet is incoming on eth1 then set MARK to 1
  • If packet MARK is 2, then it means there is already a connection mark and the original packet came in on WAN #2, so ACCEPT.
  • Else, we need to mark the packet. If the packet is incoming on eth2 then set MARK to 2
  • Save MARK to CONNMARK. This rule will be hit only if the previous rules (2, and 4) did not match. A new mark would have been written according to rules (3 and 5) and it is saved here to the connection mark indicator.

Finally, the actual iptables commands:
iptables -A PREROUTING -t mangle -j CONNMARK --restore-mark
iptables -A PREROUTING -t mangle --match mark --mark 1 -j ACCEPT
iptables -A PREROUTING -t mangle -i eth1 -j MARK --set-mark 1
iptables -A PREROUTING -t mangle --match mark --mark 2 -j ACCEPT
iptables -A PREROUTING -t mangle -i eth2 -j MARK --set-mark 2
iptables -A PREROUTING -t mangle -j CONNMARK --save-mark

Solution Part 2: Configuring iproute2 to route according to the packet markers

Now that the connection and packets are marked as they come in, we need to instruct the routing table to route according to the markers on each packet. This is done using the Routing Policy database available in iproute2. In essence, this database defines a bunch of rules which when matched, ask the router to consider specific routing tables rather than the default routing table. In this way, we can define specific rules that say when the packet has a marker value of say "1", use wan_one routing table. Similarly if the packet has marker value of "2", use the wan_two routing table.

Several things need to be done in order to put all this together:

1. Modify the file /etc/iproute2/rt_tables.
2. Add two custom tables at the bottom of the file. Number the table numbers similar to your packet marker numbers for simplicity.
myrouter:/etc/iproute2# more rt_tables
#
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# local
#
1 wan_one
2 wan_two

3. Define each routing table (wan_one and wan_two) by specifying rules specific to that connection. Note, however, that you must also specify rules that dictate how other packets will behave as well (notably packets destined for the local LAN). This is because once in the special routing table, the routing process does not consult your default routing table anymore. This is what I have in my two routing tables:
myrouter:/etc/iproute2# ip route show table wan_one
172.16.1.0/24 dev eth0 scope link
default via 149.99.251.145 dev eth1

myrouter:/etc/iproute2# ip route show table wan_two
172.16.1.0/24 dev eth0 scope link
default via 66.119.160.1 dev eth2

These are the commands I entered to get the routing tables above:
ip route add 172.16.1.0/24 dev eth0 table wan_one
ip route add default via 149.99.251.145 dev eth1 table wan_one

ip route add 172.16.1.0/24 dev eth0 table wan_two
ip route add default via 66.119.160.1 dev eth2 table wan_two

4. Next, you must define the iproute2 rules that will tell iproute2 to use the special routing tables. Do this by issuing the following commands:
ip rule add fwmark 1 table wan_one prio 1024
ip rule add fwmark 2 table wan_two prio 1025

Note: the prio (priority) numbers are simply there to ensure that they get placed in the right order and relatively near the top of the rules. You may need to adjust this number if you have other rules in your policy database.

You can verify that the rules were entered correctly by issuing an ip rule show command.
myrouter:/usr/local/sbin# ip rule show
0: from all lookup local
1024: from all fwmark 0x1 lookup wan_one
1025: from all fwmark 0x2 lookup wan_two
32766: from all lookup main
32767: from all lookup default

5. Add a default gateway to the default routing table to define the default path unmarked packets must take.

Conclusion

You're done! Packets now coming in wan connection one should be marked with 1, which then get routed according to table wan_one. Similarly for wan_two.

A few interesting notes in addition:
  • I have not described here any of the firewalling or nat processes. Obviously you need to have these setup and tested correctly before doing the CONNMARKing and MARKing.
  • Packets originating from inside the LAN will not receive a connection mark at first, and thus will fall through to the default routing table. They will route out the default gateway specified there. However, the first ack packet and every subsequent related packet should receive a connection mark, and follow one of the special routing tables.
  • Because of this peculiar behaviour for packets originating from inside the LAN, and because of the nature of network address translation, it is necessary to explicitly state the ISP's gateway in each of the default rules in the special tables. In other words, it is not enough to simply put "ip route add default dev eth2 table wan_two". Instead, this should be issued: "ip route add default via 66.119.160.1 dev eth2 table wan_two".
  • Debugging the above solution can be a bit of a pain. I found that the iptables (mangling) part of the whole exercise can be done relatively easily through logging and the "iptables -L --line-numbers -n -v -t mangle" command, but there is no equivalent functionality in iproute2. This, probably more than anything caused more grief when things weren't working than anything else.
  • I have posted an addendum to this article which includes a few important details left out in this article.

3 comments:

Jonathan & Karen Ng said...

GGruendgens,

Thank you for your comment and the link. You bring up some interesting points. In fact, we started our original research with the same article you mentioned. However, I believe the difference is highlighted in this quote from the LARTC article:


"It will work for all processes running on the router itself, and for the local network, if it is masqueraded. If it is not, then you either have IP space from both providers [...] you will want to add rules selecting which provider to route out from based on the IP address of the machine in the local network."


In our setup, we have several servers behind the router/fw on the local LAN. Each of these servers have corresponding "external" IP addresses provided by the respective two ISPs. Depending on which ISP, the router has bound to its own interfaces these external IP addresses and DNATs/SNATs them accordingly to/from our servers so that to the outside world, it appears that our servers all have external IP addresses with ports open on the services desired. This is useful so that if we wanted to host two identical services (say Web servers) using two different external IPs, we could.

Here's an illustration:

ISP 1:
123.234.1.10 --maps-to--> router's eth1
123.234.1.11 --router-DNATs-to--> server 1, port 1494
123.234.1.12 --router-DNATs-to--> server 2, port 25,80
123.234.1.13 --router-DNATs-to--> server 3, port 80

ISP 2:
216.111.1.10 --maps-to--> router's eth2
216.111.1.11 --router-DNATs-to--> server 1, port 1494
216.111.1.12 --router-DNATs-to--> server 2, port 25

Router has local LAN address 172.16.1.1
Server 1 has local LAN address 172.16.1.131
Server 2 has local LAN address 172.16.1.130
Server 3 has local LAN address 172.16.1.140

Now because most of these servers have two external IPs (i.e. users can access the service via either provider) and because they map to only ONE internal IP, there is no way for the router to tell which interface to send SNATed packets back out. The only thing it can do is consult the default routing table which would have a default gateway, or at best (as the article suggests) consult a rule in the routing policy database that says essentially "if the packet originates FROM 172.16.1.x, send it out according to routing table {aaaa}", where aaaa is the routing table for one of the providers.

This latter part actually works (sometimes) but only insomuch as receiving packets from one provider and sending a reply packet out via another provider. However, this is less than desirable, as the whole point of this is to handle the situation when one of the providers might be down. Also, many ISPs will not let you send a packet with a source address other than one of their own.

Thus, in our situation, it is not possible to do what we want to do without connection marking.

Hope that helps to clarify things.

BTW, if you haven't already, you may also wish to read the addendum article I also posted which clarifies a few additional details.

Jonathan.

Anonymous said...

Great article, even after all those years.

I also saw your second post.

It works very well, thanks.

Harry said...

Very helpful article, much appreciated!

Come to think of it, we don't have to setup the MARK for the ORIGINAL, ingress traffic for the "Per-interface default routes" feature, since no routing table will be looked upon at all when the traffic is targeting the local host, as long as we can remember the CONNMARK for the connection and restore it proper to the REPLY packets' MARK (as in the OUTPUT chain of the mangle table), the interface specific routing table will be used for the REPLY traffic instead of the main one.

Put it another way, if we setup the MARK = CONNMARK for the ingress, ORIGINAL traffic, as you have pointed out, we would have to setup relevant interface-specific routing table with route to downstream private network that may be hidden by DNAT rules. Alternatively, we can fall back on the main routing table which is aware of routes to downstream interfaces for the ORIGINAL traffic, but only use interface-specific routing table for the REPLY traffic. Thus we can spare the trouble to populate interface-specific routing tables routes to private networks other than the default one.