This little article is about the network namespaces utilities that exist in Linux which lies at the heart of container networking as can be seen in container orchestration tools and service meshes.
veth
pair- One Container
- A Quick Note On The Routing Table, While He’s Away
- Bridging
- Routing
- Listing the
ARP
Table - Conclusion
As everyone knows, containers are only possible because of the addition of namespaces
and cgroups
to the Linux kernel. Because of this, we’ve begun to see a lot of projects that take advantage of these additions to allow for some very cool technologies.
One of these is virtual networking. Because the net
network namespace allows for processes (i.e., containers) to have their own network stacks, we can create virtual networks in software that are analogous to their hardware counterparts.
For instance, we can create virtual bridges (more accurately known as multi-port switches) and routers that interface to subnetworks that create domains of containers. These subnetworks can and usually do exist within their own network namespace, isolated from other network namespaces (such as the root network namespace) and any resources contained therein.
So, what comprises a network stack? It includes:
This article will be a brief introduction into the bits and bobs needed to create a fully functioning virtual network that will be able to not only access the other containers in its subnetwork but the other network interfaces in the root network namespace and the outside Internet.
We’ll become marginally acquainted with the following tools and utilities:
Let’s get started!
weeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
I highly suggest doing these steps in a virtual machine or some other environment that won’t affect the network stack on your host machine. But do what you want, you always do.
veth
pair
veth
pairs are virtual devices that reside in the host’s network namespace. We can think of them as two ends of a tunnel, where traffic sent from one end will automatically appear on the other.
Here is the command to create a veth
pair, with one end named veth0
and the other ceth0
(“c” will indicate the end that will be in the container):
$ sudo ip link add veth0 type veth peer name ceth0
Note that they have created, but are in the DOWN
state and do not have an assigned IP
address.
$ ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:8d:c0:4d brd ff:ff:ff:ff:ff:ff
altname enp0s3
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
valid_lft 84821sec preferred_lft 84821sec
inet6 fe80::a00:27ff:fe8d:c04d/64 scope link
valid_lft forever preferred_lft forever
3: ceth0@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 72:45:4d:cb:0b:b8 brd ff:ff:ff:ff:ff:ff
4: veth0@ceth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 1a:b9:5e:69:61:53 brd ff:ff:ff:ff:ff:ff
Let’s now assign an IPv4
address to the veth0
interface and bring it up:
$ sudo ip addr add 172.18.0.10/12 dev veth0
$ sudo ip link set veth0 up
$ ip a show veth0
4: veth0@ceth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ce:72:8c:c9:87:3c brd ff:ff:ff:ff:ff:ff
inet 172.18.0.10/12 scope global veth0
valid_lft forever preferred_lft forever
inet6 fe80::cc72:8cff:fec9:873c/64 scope link
valid_lft forever preferred_lft forever
Interestingly, once the interface is assigned an address and is brought up, the kernel creates a routing rule based on the CIDR
address that was given when the virtual Ethernet device was created (recall that before there weren’t any rules):
Here was the state of the routing table before the veth
pair was created and at least one end brought UP
:
$ ip r
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
Now, we see that the kernel has helpfully created a new rule for us:
$ ip r
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
172.16.0.0/12 dev veth0 proto kernel scope link src 172.18.0.10 linkdown
This new derived routing rule was able to be created by the kernel in the root namespace because the veth0
device was given a CIDR
address when created.
Note, however, if you were to add that device without the netmask information (the decimal suffix
/12
, in this case), the kernel would be unable to create a routing rule, as it would not have been given enough information to do so.
In addition, if you were to delete the device after creating it, you’d see that the kernel would also then remove the routing rule (if it had been automatically created by the kernel).
The routing table will report
linkdown
in the new rule because the other end of the veth pair in the newnet0
namespace is down.
We’ll also assign an IPv4
address to the other end of the veth
pair:
$ sudo ip link set ceth0 up
$ sudo ip addr add 172.18.0.20/12 dev ceth0
Let’s check the state of things now:
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:8d:c0:4d brd ff:ff:ff:ff:ff:ff
altname enp0s3
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
valid_lft 85380sec preferred_lft 85380sec
inet6 fe80::a00:27ff:fe8d:c04d/64 scope link
valid_lft forever preferred_lft forever
valid_lft forever preferred_lft forever
3: ceth0@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:56:bc:1c:58:c0 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.20/12 scope global ceth0
valid_lft forever preferred_lft forever
inet6 fe80::5056:bcff:fe1c:58c0/64 scope link
valid_lft forever preferred_lft forever
4: veth0@ceth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ce:72:8c:c9:87:3c brd ff:ff:ff:ff:ff:ff
inet 172.18.0.10/12 scope global veth0
valid_lft forever preferred_lft forever
inet6 fe80::cc72:8cff:fec9:873c/64 scope link
valid_lft forever preferred_lft forever
And, the routing table:
$ ip r
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
172.16.0.0/12 dev veth0 proto kernel scope link src 172.18.0.10
172.16.0.0/12 dev ceth0 proto kernel scope link src 172.18.0.20
Notice that the last rule was just added by the kernel when the other virtual ethernet device was assigned an IP address with a CIDR
.
Now, we can ping each interface, but that’s not very useful:
~$ ping -c2 172.18.0.10
PING 172.18.0.10 (172.18.0.10) 56(84) bytes of data.
64 bytes from 172.18.0.10: icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from 172.18.0.10: icmp_seq=2 ttl=64 time=0.082 ms
--- 172.18.0.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1014ms
rtt min/avg/max/mdev = 0.049/0.065/0.082/0.016 ms
$ ping -c2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
64 bytes from 172.18.0.20: icmp_seq=1 ttl=64 time=0.042 ms
64 bytes from 172.18.0.20: icmp_seq=2 ttl=64 time=0.123 ms
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1029ms
rtt min/avg/max/mdev = 0.042/0.082/0.123/0.040 ms
These endpoints were created in the root network namespace, so this is not very interesting. Let’s now move one end into another (new) network namespace to enable communication between the namespaces.
Here are all the commands we ran in this section in a handy little script:
1_linux_networking.sh
#!/bin/bash
set -eo pipefail
LANG=C
umask 0022
if [ -z "$1" ]
then
printf "Usage: %s add|delete\n" "$0"
exit 1
fi
if [ "$1" = "delete" ]
then
# Deleting one end of the pair will remove the other automatically.
sudo ip link delete veth0
elif [ "$1" = "add" ]
then
sudo ip link add veth0 type veth peer name ceth0
sudo ip link set veth0 up
sudo ip link set ceth0 up
sudo ip addr add 172.18.0.10/12 dev veth0
sudo ip addr add 172.18.0.20/12 dev ceth0
printf "ping %s\n" 172.18.0.10
printf "ping %s\n" 172.18.0.20
else
printf "Unrecognized parameter \`%s\`.\n" "$1"
exit 1
fi
One Container
Let’s now turn our furrowed brow towards how we can begin building a container.
For this next exercise, you will need tools that come with the iproute2
package. Most distributions will have them installed by default unless you’re running a bare-bones installation.
We’ll begin again by creating our new friends, the veth
pair, assigning one end an IP address and bringing it up. Nothing new here.
$ sudo ip link add veth0 type veth peer name ceth0
$ sudo ip addr add 172.18.0.10/12 dev veth0
$ sudo ip link set veth0 up
However, this time we’re going to create a new network namespace. This is an essential part of creating the container and is crucial to the success of our endeavor.
$ sudo ip netns add net0
So, why is this necessary for this exercise?
Well, in order to begin isolating one process from another, there needs to be separation. Namespaces facilitate that, with the net
namespace the focus of this article, of course.
Once the new network namespace is created, you can use the nsenter
utility from the util-linux
package to run programs in different namespaces.
For example, here we’re using the --net
option to enter the network namespace and run the bash
program:
$ sudo nsenter --net=/run/netns/net0 bash
If no command is given, it will execute the default shell (re: the value of the
SHELL
environment variable).
Of course, all eight namespaces (as of kernel 5.6) can be specified. Just replace net
above with the namespace of your choice. See the nsenter
manpage for details.
Note that the
bash
process doesn’t run in its ownpid
namespace, as it wasn’t created (or, if it was, it wasn’t listed as an option to thensenter
command above).We can see this from inside the container after running the above command:
# ps -1 PID TTY STAT TIME COMMAND 1 ? Ss 0:11 /lib/systemd/systemd --system --deserialize=22
What does this mean? Well, if the process had its own
pid
namespace, thensystemd
would not be PID 1, it would bebash
, which was listed as the program in the command above.Also, notice that we’re not running in a rootless container, as the
UID
s andGID
s don’t map to a non-privileged user on the host.# cat /proc/$$/[u,g]id_map 0 0 4294967295 0 0 4294967295
If this had been created as a rootless container, then the 0
UID
in the second column would be that of a non-privileged user on the host, like:# cat /proc/$$/[u,g]id_map 0 1000 1 0 1000 1
Anyway…
While we’re still in the new net0
network namespace, let’s see what network devices we have access to:
# ip link list
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
We can see that it does indeed have its own network stack as evidenced by the fact that there is not an ethernet device listed (the eth0
device of the host, that is).
Additionally, there are no routing rules:
# ip route list
Error: ipv4: FIB table does not exist.
Dump terminated
Issue the exit
command to leave return to the host.
If you don’t want to enter the namespace, you can use the exec
subcommand of the ip-netns
command:
$ sudo ip netns exec net0 ip l
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Next, we’ll move one of the devices into the net0
namespace, leaving the other in the host (root) namespace.
$ sudo ip link set ceth0 netns net0
If we list the network devices again, we’ll see that the ceth0
device is no longer listed in the root namespace because of it having been moved into the net0
namespace:
$ ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 08:00:27:8d:c0:4d brd ff:ff:ff:ff:ff:ff
altname enp0s3
4: veth0@if3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 16:c9:69:a9:8b:90 brd ff:ff:ff:ff:ff:ff link-netns net0
Let’s run the same command in the net0
namespace to view veth0'
s peer:
$ sudo nsenter --net=/run/netns/net0 ip l
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: ceth0@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 82:90:89:29:74:4a brd ff:ff:ff:ff:ff:ff link-netnsid 0
Now that each end of the veth
pair is in a different namespace, we are on our way to creating a container.
Weeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
In order to send traffic between these devices, we must bring them both up and assign them IP4 or IP6 addresses:
$ sudo ip link set veth0 up
$ sudo ip addr add 172.18.0.11/16 dev veth0
$ ip address show veth0
4: veth0@if3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
link/ether 16:c9:69:a9:8b:90 brd ff:ff:ff:ff:ff:ff link-netns net0
inet 172.18.0.11/16 scope global veth0
valid_lft forever preferred_lft forever
Do the same in the net0
namespace:
$ sudo nsenter --net=/run/netns/net0 bash
# ip link set ceth0 up
# ip addr add 172.18.0.10/16 dev ceth0
# ip a show ceth0
3: ceth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 82:90:89:29:74:4a brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.0.10/16 scope global ceth0
valid_lft forever preferred_lft forever
inet6 fe80::8090:89ff:fe29:744a/64 scope link
valid_lft forever preferred_lft forever
For good practice, let’s bring up the loopback device, as well:
# ip link set lo up
It’s go time (still in the net0
namespace):
# ping -c2 172.18.0.11
PING 172.18.0.11 (172.18.0.11) 56(84) bytes of data.
64 bytes from 172.18.0.11: icmp_seq=1 ttl=64 time=0.027 ms
64 bytes from 172.18.0.11: icmp_seq=2 ttl=64 time=0.140 ms
--- 172.18.0.11 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1012ms
rtt min/avg/max/mdev = 0.027/0.083/0.140/0.056 ms
Unfortunately, we can’t (yet) ping the physical network device on the host nor the outside Internet.
And, if we exit back to the host root namespace, we can also ping the ceth
device in the net0
namespace:
$ ping -c2 172.18.0.10
PING 172.18.0.10 (172.18.0.10) 56(84) bytes of data.
64 bytes from 172.18.0.10: icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from 172.18.0.10: icmp_seq=2 ttl=64 time=0.096 ms
--- 172.18.0.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.064/0.080/0.096/0.016 ms
So, if we step back, we essentially have the makings of a “container”. Yes, it’s too simplistic and not very functional, but it should give you an idea of how a container and its isolation can be achieved.
Here are all the commands we ran in this section in a handy little script:
2_linux_networking.sh
#!/bin/bash
set -eo pipefail
LANG=C
umask 0022
if [ -z "$1" ]
then
printf "Usage: %s add|delete\n" "$0"
exit 1
fi
if [ "$1" = "delete" ]
then
# Removing the namespace will also remove the interfaces within it,
# which subsequently also removes the other end of the pair in the
# root network namespace.
sudo ip netns delete net0
elif [ "$1" = "add" ]
then
sudo ip netns add net0
sudo ip link add veth0 type veth peer name ceth0
sudo ip link set veth0 up
sudo ip addr add 172.18.0.10/12 dev veth0
sudo ip link set ceth0 netns net0
sudo ip netns exec net0 ip addr add 172.18.0.20/12 dev ceth0
sudo ip netns exec net0 ip link set ceth0 up
sudo ip netns exec net0 ip link set lo up
printf "ping %s\n" 172.18.0.10
printf "ping %s\n" 172.18.0.20
printf "sudo ip netns list\n"
else
printf "Unrecognized parameter \`%s\`.\n" "$1"
exit 1
fi
To use this script, simply:
$ bash /vagrant/2_linux_networking.sh add
ping 172.18.0.10
ping 172.18.0.20
sudo ip netns list
This establishes a veth
pair and moves one end into a new net
namespace. By providing the CIDR
when adding an IP address to the virtual devices, we don’t need to explicitly add a route.
The command will echo to stdout
the ping and netns-list
commands.
To cleanup:
$ bash /vagrant/2_linux_networking.sh delete
A Quick Note On The Routing Table, While He’s Away
Let’s now revisit the derived routing rule that was created for us in the net0
network namespace. Again, here is the routing rule:
# ip r
172.18.0.0/16 dev ceth0 proto kernel scope link src 172.18.0.10
What is this doing? It’s sending any packets destined for the 172.18.0.0/16
network through the ceth0
, discarding all others. This is why we cannot reach the host network or the Internet.
Bridging
Let’s now take a look at a more advanced use case. Specifically, creating two containers and wiring up the bits that will allow them to not only communicate with each other but also with the outside world.
What would happen if we duplicated all of our previous steps? That is, in addition to what we’ve already done, we:
- Create another network namespace.
- Add a veth pair.
- Move one half of the pair into the new namespace.
- Bring both interfaces up and assigned each one a
CIDR
address, thus automatically also creating the new routing rules.
Well, there would be routing conflicts, because the table would have conflicting routing rules:
$ ip r
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
172.16.0.0/12 dev veth0 proto kernel scope link src 172.18.0.100
172.16.0.0/12 dev veth1 proto kernel scope link src 172.18.0.101
Here’s a listing of the network devices and the routing table of the net0
network namespace:
$ sudo ip netns exec net0 ip a; echo; ip r
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
13: ceth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 22:20:56:8c:2e:83 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.0.10/12 scope global ceth0
valid_lft forever preferred_lft forever
inet6 fe80::2020:56ff:fe8c:2e83/64 scope link
valid_lft forever preferred_lft forever
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
172.16.0.0/12 dev veth0 proto kernel scope link src 172.18.0.100
172.16.0.0/12 dev veth1 proto kernel scope link src 172.18.0.101
And, the net1
network namespace:
$ sudo ip netns exec net1 ip a; echo; ip r
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
15: ceth1@if16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether c2:7f:11:fd:44:66 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.0.20/12 scope global ceth1
valid_lft forever preferred_lft forever
inet6 fe80::c07f:11ff:fefd:4466/64 scope link
valid_lft forever preferred_lft forever
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
172.16.0.0/12 dev veth0 proto kernel scope link src 172.18.0.100
172.16.0.0/12 dev veth1 proto kernel scope link src 172.18.0.101
The device in the net0
namespace would be able to ping its half of the pair (172.18.0.11
) and the new veth device in the root network namespace (172.18.0.21
), but the new virtual device in the net1
namespace (not shown) would not be able to ping either.
Ah, success:
$ sudo ip netns exec net0 ping -c2 172.18.0.101
PING 172.18.0.101 (172.18.0.101) 56(84) bytes of data.
64 bytes from 172.18.0.101: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 172.18.0.101: icmp_seq=2 ttl=64 time=0.080 ms
--- 172.18.0.101 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1071ms
rtt min/avg/max/mdev = 0.020/0.050/0.080/0.030 ms
Ah, failure:
$ sudo ip netns exec net1 ping 172.18.0.100
PING 172.18.0.100 (172.18.0.100) 56(84) bytes of data.
^C
--- 172.18.0.100 ping statistics ---
28 packets transmitted, 0 received, 100% packet loss, time 29337ms
As mentioned, there are conflicting routing table rules, and this needs to be addressed. And, the order of the routing rules matters. For instance, if the rules for the veth
pairs were reversed, then the net1
device would have no problem reaching the net0
device, but the latter would then have issues.
If we want to keep these devices in the same IP network, then the way to fix this is to introduce a bridge
device (alternatively, we could introduce a new IP network for each new network namespace and veth pair).
Importantly, though, the veth interface in the net0
namespace cannot reach the physical interface on the host machine:
$ sudo ip netns exec net0 ping -c2 10.0.2.15
ping: connect: Network is unreachable
Although, the physical interface on the host machine in the root network namespace can ping the interface in the net0
namespace:
$ ping -c2 172.18.0.10
PING 172.18.0.10 (172.18.0.10) 56(84) bytes of data.
64 bytes from 172.18.0.10: icmp_seq=1 ttl=64 time=0.066 ms
64 bytes from 172.18.0.10: icmp_seq=2 ttl=64 time=0.165 ms
--- 172.18.0.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1073ms
rtt min/avg/max/mdev = 0.066/0.115/0.165/0.049 ms
But not the interface in the net1
namespace:
$ ping -c2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
From 172.18.0.100 icmp_seq=1 Destination Host Unreachable
From 172.18.0.100 icmp_seq=2 Destination Host Unreachable
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1305ms
pipe 2
There’s nothing to be done about this except sit and cry. Unless…
If we were to create a bridge and connect veth0
and veth1
to it instead of the host net namespace, that would immediately fix these connection issues.
Let’s do that now.
To start, we’ll create a virtual bridge device to which we’ll attach one end of all the veth
pairs that are created:
$ sudo ip link add name br0 type bridge
$ ip l show br0
7: br0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether de:b1:9e:51:f9:57 brd ff:ff:ff:ff:ff:ff
Next, we’ll bring it UP
:
$ sudo ip link set br0 up
$ ip l show br0
7: br0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/ether de:b1:9e:51:f9:57 brd ff:ff:ff:ff:ff:ff
Note that, like the other virtual network devices we’ve created, the kernel doesn’t automatically create a route until an IP address is assigned to the device with a CIDR
:
$ ip r
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
$
$ sudo ip address add 172.18.0.1/12 dev br0
$ ip r
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
172.16.0.0/12 dev br0 proto kernel scope link src 172.18.0.1 linkdown
The next bit will look familiar. Create new net
namespace, create a new veth
pair and bring the veth0
end UP
:
$ sudo ip netns add net0
$ sudo ip link add veth0 type veth peer name ceth0
$ sudo ip link set veth0 up
Next, we’ll execute the crucial command that will attach the veth0
endpoint to the new master br0
bridge. For a sanity check, we’ll then list all the devices attached to the bridge:
$ sudo ip link set dev veth0 master br0
$ ip link show master br0
9: veth0@ceth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop master br0 state DOWN mode DEFAULT group default qlen 1000
link/ether 42:28:24:4e:f9:72 brd ff:ff:ff:ff:ff:ff
The other end is attached to the new net0
namespace. This will allow the virtual bridge device to act like a multi-port switch. It will learn where to send any traffic received by it that should be forwarded on to a subdomain:
$ sudo ip link set ceth0 netns net0
Notice that as soon as this endpoint is attached to a different network namespace that it is no longer listed in the root net
namespace:
$ ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 08:00:27:8d:c0:4d brd ff:ff:ff:ff:ff:ff
altname enp0s3
7: br0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/ether de:b1:9e:51:f9:57 brd ff:ff:ff:ff:ff:ff
9: veth0@if8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master br0 state DOWN mode DEFAULT group default qlen 1000
link/ether 42:28:24:4e:f9:72 brd ff:ff:ff:ff:ff:ff link-netns net0
$ sudo ip netns exec net0 ip address add 172.18.0.153/12 dev ceth0
$ sudo ip netns exec net0 ip r
172.16.0.0/12 dev ceth0 proto kernel scope link src 172.18.0.153
$ ping -c2 172.18.0.153
PING 172.18.0.153 (172.18.0.153) 56(84) bytes of data.
64 bytes from 172.18.0.153: icmp_seq=1 ttl=64 time=0.124 ms
64 bytes from 172.18.0.153: icmp_seq=2 ttl=64 time=0.043 ms
--- 172.18.0.153 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.043/0.083/0.124/0.040 ms
However, we still can’t ping the host’s eth0
device from inside the net0
namespace:
$ sudo ip netns exec net0 ping 10.0.2.15
ping: connect: Network is unreachable
This problem is easy to diagnose: there’s no default route to the bridge device in the net0
namespace’s routing table. We’ll fix that now:
$ sudo ip netns exec net0 ip route add default via 172.18.0.1
$ sudo ip netns exec net0 ping -c2 10.0.2.15
PING 10.0.2.15 (10.0.2.15) 56(84) bytes of data.
64 bytes from 10.0.2.15: icmp_seq=1 ttl=64 time=0.030 ms
64 bytes from 10.0.2.15: icmp_seq=2 ttl=64 time=0.047 ms
--- 10.0.2.15 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.030/0.038/0.047/0.008 ms
$
$ sudo ip netns exec net0 ip r
default via 172.18.0.1 dev ceth0
172.16.0.0/12 dev ceth0 proto kernel scope link src 172.18.0.153
Ok, fantastic. But, there’s still a fairly big issue: we still can’t send or receive traffic from the outside world. We’ll look at resolving that in the next section.
$ sudo ip netns exec net0 ping -c2 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
^C
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1005ms
Here are all the commands we ran in this section in a handy little script:
4_linux_networking.sh
#!/bin/bash
set -eo pipefail
LANG=C
umask 0022
if [ -z "$1" ]
then
printf "Usage: %s add|delete\n" "$0"
exit 1
fi
BRIDGE=br0
if [ "$1" = "delete" ]
then
sudo ip link delete "$BRIDGE"
for i in {0..1}
do
# Removing the namespace will also remove the interfaces within it,
# which subsequently also removes the other end of the pair in the
# root network namespace.
sudo ip netns delete "net$i"
done
elif [ "$1" = "add" ]
then
sudo ip link add name "$BRIDGE" type bridge
sudo ip address add 172.18.0.1/12 dev "$BRIDGE"
sudo ip link set "$BRIDGE" up
for i in {0..1}
do
sudo ip netns add "net$i"
sudo ip link add "veth$i" type veth peer name "ceth$i"
sudo ip link set "veth$i" up
# Attach the new interfaces to the bridge device.
sudo ip link set dev "veth$i" master "$BRIDGE"
sudo ip link set "ceth$i" netns "net$i"
INCREMENT=$((10 + 10 * "$i"))
sudo ip netns exec "net$i" ip address add "172.18.0.$INCREMENT/12" dev "ceth$i"
sudo ip netns exec "net$i" ip link set "ceth$i" up
# Add the route to the bridge interface so the new namespaces can reach the root namespace.
sudo ip netns exec "net$i" ip route add default via 172.18.0.1
done
else
printf "Unrecognized parameter \`%s\`.\n" "$1"
exit 1
fi
Routing
To enable traffic to flow to and from the outside world, there are two things that need to happen.
The first is to turn the machine into a router:
$ echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
1
Of course, this setting won’t survive a reboot. To persist this value, edit the
/etc/sysctl.conf
configuration file.
The second is to enable network address translation (NAT) so that the router can swap its own IP address with that of the private IP addresses of the containers:
$ sudo iptables -t nat -A POSTROUTING -s 172.18.0.0/12 ! -o br0 -j MASQUERADE
$ sudo ip netns exec net0 ping -c2 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=61 time=16.5 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=61 time=18.3 ms
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 16.492/17.388/18.285/0.896 ms
Listing the ARP
Table
Check the arp
tables:
$ sudo ip netns exec net0 ip neigh
172.18.0.20 dev ceth0 lladdr 7e:31:d1:3d:07:ab REACHABLE
$ sudo ip netns exec net1 ip neigh
172.18.0.10 dev ceth1 lladdr a2:a9:f3:ae:d0:a8 REACHABLE
Conclusion
That’s it, Charlie. It’s way too long.