Non Deterministic docker Networking and Source Based IP Routing


In the open source docker engine a new networking model was introduced in docker 1.9 which enabled the creation of separate "networks" for containers to be attached to. This, however, can lead to a nasty little problem where a port that is supposed to be exposed on the host isn't accessible from the outside. There are a few bug reports that are related to this issue.


This problem happens because docker wires up all of these containers to each other and the various "networks" using port forwarding/NAT via iptables. Let's take a popular example application which exhibits the problem, the Docker 3rd Birthday Application, and show what the problem is and why it happens.

We'll clone the git repo first and then check out the latest commit as of 2016-05-25:

# git clone
# cd docker-birthday-3/
# git checkout 'master@{2016-05-25}'
HEAD is now at 4f2f1c9... Update Dockerfile

Next we'll bring up the application:

# cd example-voting-app/
# docker-compose up -d
Creating network "examplevotingapp_front-tier" with the default driver
Creating network "examplevotingapp_back-tier" with the default driver
Creating db
Creating redis
Creating examplevotingapp_voting-app_1
Creating examplevotingapp_worker_1
Creating examplevotingapp_result-app_1

So this created two networks and brought up several containers to host our application. Let's poke around to see what's there:

# docker network ls
NETWORK ID          NAME                          DRIVER
23c96b2e1fe7        bridge                        bridge
cd8ecb4c0556        examplevotingapp_front-tier   bridge
5760e64b9176        examplevotingapp_back-tier    bridge
bce0f814fab1        none                          null
1b7e62bcc37d        host                          host
# docker ps -a --format "table {{.Names}}\t{{.Image}}\t{{.Ports}}"
NAMES                           IMAGE                         PORTS
examplevotingapp_result-app_1   examplevotingapp_result-app>80/tcp
examplevotingapp_voting-app_1   examplevotingapp_voting-app>80/tcp
redis                           redis:alpine        >6379/tcp
db                              postgres:9.4                  5432/tcp
examplevotingapp_worker_1       manomarks/worker

So two networks were created and the containers running the application were brought up. Looks like we should be able to connect to the examplevotingapp_voting-app_1 application on the host port 5000 that is bound to all interfaces. Does it work?:

# ip -4 -o a
1: lo    inet scope host lo\       valid_lft forever preferred_lft forever
2: eth0    inet brd scope global dynamic eth0\       valid_lft 2921sec preferred_lft 2921sec
3: docker0    inet scope global docker0\       valid_lft forever preferred_lft forever
106: br-cd8ecb4c0556    inet scope global br-cd8ecb4c0556\       valid_lft forever preferred_lft forever
107: br-5760e64b9176    inet scope global br-5760e64b9176\       valid_lft forever preferred_lft forever
# curl --connect-timeout 5 &>/dev/null && echo success || echo failure
# curl --connect-timeout 5 &>/dev/null && echo success || echo failure

Does it work? Yes and no?

That's right. There is something complicated going on with the networking here. I can connect from localhost but can't connect to the public IP of the host. Docker wires things up in iptables so that things can go into and out of containers following a strict set of rules; see the iptables output if you are interested. This works fine if you only have one network interface per container but can break down when you have multiple interfaces attached to a container.

Let's jump in to the examplevotingapp_voting-app_1 container and check out some of the networking:

# docker exec -it examplevotingapp_voting-app_1 /bin/sh
/app # ip -4 -o a
1: lo    inet scope host lo\       valid_lft forever preferred_lft forever
112: eth1    inet scope global eth1\       valid_lft forever preferred_lft forever
114: eth0    inet scope global eth0\       valid_lft forever preferred_lft forever
/app #
/app # ip route show
default via dev eth0 dev eth1  src dev eth0  src

So there is a clue. We have two interfaces, but our default route is going to go out of the eth0 on the network. It just so happens that our iptables rules (see linked iptables output from above) performed DNAT for tcp dpt:5000 to: So traffic from the outside is going to come in to this container on the eth1 interface but leave it on the eth0 interface, which doesn't play nice with the iptables rules docker has set up.

We can prove that here by asking what route we will take when a packet leaves the machine:

/app # ip route get from from via dev eth0

Which basically means it will leave from eth0 even though it came in on eth1. The Docker documentation was updated to try to explain the behavior when multiple interfaces are attached to a container in this git commit.

Test Out Theory Using Source Based IP Routing

To test out the theory on this we can use source based IP routing (some reading on that here). Basically the idea is that we create policy rules that make IP traffic leave on the same interface it came in on.

To perform the test we'll need our container to be privileged so we can add routes. Modify the docker-compose.yml to add privileged: true to the voting-app:

    build: ./voting-app/.
     - ./voting-app:/app
      - "5000:80"
      - front-tier
      - back-tier
    privileged: true

Take down and bring up the application:

# docker-compose down
# docker-compose up -d

Exec into the container and create a new policy rule for packets originating from the network. Tell packets matching this rule to look up routing table 200:

# docker exec -it examplevotingapp_voting-app_1 /bin/sh
/app # ip rule add from table 200

Now add a default route for to routing table 200. Show the routing table after that and the rules as well:

/app # ip route add default via dev eth1 table 200
/app # ip route show table 200
default via dev eth1
/app # ip rule show
0:      from all lookup local
32765:  from lookup 200
32766:  from all lookup main
32767:  from all lookup default

Now ask the kernel where a packet originating from our address will get sent:

/app # ip route get from from via dev eth1

And finally, go back to the host and check to see if everything works now:

# curl --connect-timeout 5 &>/dev/null && echo success || echo failure
# curl --connect-timeout 5 &>/dev/null && echo success || echo failure


I don't know if source based routing can be incorporated into docker to fix this problem or if there is a better solution. I guess we'll have to wait and find out.



NOTE I used the following versions of software for this blog post:

# rpm -q docker docker-compose kernel-core