This post is going to be a brief look at DNS for Pods and Services in Kubernetes. Its intent is only to provide a good mental model for how DNS is assigned to new resources in the cluster.
Much of this article is jump-started from the DNS for Services and Pods documentation on the official Kubernetes site. Go and read it now.
Of course, there are different DNS configurations that can be given in a Pod depending upon the value for .spec.dnsPolicy
, and I’m going to assume it’s ClusterFirst
, which is the default if none is given. This decision has been made to keep this post to a reasonable length, so if any of the details don’t apply to the DNS policy that you’re using or interested in, there you go, you’ve been warned.
For
ClusterFirst
, any DNS query that does not match the configured cluster domain suffix, such aswww.kubernetes.io
, is forwarded to an upstream nameserver by the DNS server.
It’s easy to take DNS and service discovery for granted in Kubernetes. While Kubernetes doesn’t provide many things out-of-the-box, it does provide a DNS server that allows for cluster-scoped DNS resolution (usually with CoreDNS).
However, how does this process work? For example, when I deploy a workload into a cluster, whether it being a single Pod or a Deployment, how do the Pods become “DNS aware”? And, what about Services?
The most important player in this scenario is the kubelet
. It is essentially the supervisor of the Node, i.e., an agent that runs on each Node (sometimes even the master control plane Node), that ensures that the PodSpec(s) that it is given to it by the apiserver
is created. Importantly, it will make sure that the container(s) defined in the PodSpec is running and healthy.
kubelet
If we look at the parameters passed to the kubelet
using ps
, we can find the config file that it is using as its configuration and which it writes into every container that it manages.
Here is a dirty pipeline that will help us find the parameter that we’re interested in (--config
), whose value is the aforementioned configuration file:
$ ps aux | ag [k]ubelet | head -1 | awk '{ for (i=11; i<=NF; ++i) print $i}'
/usr/bin/kubelet
--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf
--kubeconfig=/etc/kubernetes/kubelet.conf
--config=/var/lib/kubelet/config.yaml
--container-runtime-endpoint=unix:///run/containerd/containerd.sock
--node-ip=172.18.0.5
--node-labels=zone=1
--pod-infra-container-image=registry.k8s.io/pause:3.9
--provider-id=kind://docker/beta/beta-worker
--runtime-cgroups=/system.slice/containerd.service
The
awk
command is printing the 11th column onwards.
Here ‘tis:
--config=/var/lib/kubelet/config.yaml
Accessing the configuration at /var/lib/kubelet/config.yaml
depends on how you created the cluster. For example, I used kind
, which is “Kubernetes in Docker”, and clearly, it uses Docker, which makes me a weenie (although I’m using a virtual machine, because everyone knows to never, ever install Docker on the host).
In other posts I’ve used
kubeadm
andminikube
to create the cluster. It’s good to know more than one way to locally create a cluster.
We know that every Node has a kubelet
as its agent, so we should be able to log into any Node in the cluster to view that configuration. Since we used kind
, we can simply use Docker commands, since kind
created all of the Nodes to run in a container.
Let’s list the running containers:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7cb692d8baa9 kindest/node:v1.29.2 "/usr/local/bin/entr…" 2 hours ago Up 2 hours beta-worker3
9e0d20e5f51b kindest/node:v1.29.2 "/usr/local/bin/entr…" 2 hours ago Up 2 hours beta-worker
2229f2f47381 kindest/node:v1.29.2 "/usr/local/bin/entr…" 2 hours ago Up 2 hours beta-worker2
c0ca7a51bff0 kindest/node:v1.29.2 "/usr/local/bin/entr…" 2 hours ago Up 2 hours beta-worker4
3df3a630ab3c kindest/node:v1.29.2 "/usr/local/bin/entr…" 2 hours ago Up 2 hours 127.0.0.1:43511->6443/tcp beta-control-plane
Just typing the word “[Dd]ocker” makes me feel dirty.
Since the kubelet
will be running on every Node (possibly even the control plane Node, again, depending on how the cluster was created), we can simply exec
into any one of them and view the config file:
$ docker exec -it beta-worker cat /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
cgroupRoot: /kubelet
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: ""
cpuManagerReconcilePeriod: 0s
evictionHard:
imagefs.available: 0%
nodefs.available: 0%
nodefs.inodesFree: 0%
evictionPressureTransitionPeriod: 0s
failSwapOn: false
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageGCHighThresholdPercent: 100
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
flushFrequency: 0
options:
json:
infoBufferSize: "0"
verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
Let’s print just the DNS-related configuration:
$ docker exec -it beta-worker grep -i -A2 dns /var/lib/kubelet/config.yaml
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
So, what is that IP address? It is the Kubernetes Service that exposes the CoreDNS Pods, which the cluster uses for service discovery.
Let’s turn to it now.
CoreDNS
CoreDNS is a newer dns
server written in Go that is modular, meaning that it depends on plugins to build out its functionality, with each plugin performing a DNS function. Using it at a high-level is very easy, and it’s simple to implement most of the use cases that you’ll want in a basic, general way with very little up-front work.
As mentioned, the IP address 10.96.0.10
is that of the CoreDNS Service, which is acting as a load balancer to two coredns
Pods:
$ kubectl -n kube-system get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 27h
This Service will round-robin the requests to the two pods that were created by default:
$ kubectl -n kube-system get po -l k8s-app=kube-dns -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-76f75df574-9lrgf 1/1 Running 1 (126m ago) 27h 10.244.0.2 beta-control-plane <none> <none>
coredns-76f75df574-vwwv5 1/1 Running 1 (126m ago) 27h 10.244.0.4 beta-control-plane <none> <none>
The IP addresses of the coredns
Pods are the endpoints, of course:
$ kubectl -n kube-system describe endpoints
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=CoreDNS
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2024-03-17T18:54:04Z
Subsets:
Addresses: 10.244.0.2,10.244.0.4
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
dns-tcp 53 TCP
dns 53 UDP
metrics 9153 TCP
Events: <none>
So, the kubelet
will write the IP address of the Service (10.96.0.10
and which is aware of how to reach every coredns
Pod through the endpoints abstraction) to every container’s /etc/resolv.conf
file:
$ kubectl exec debug -it -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local home
nameserver 10.96.0.10
options ndots:5
Let’s also dump the coredns
Service’s .spec
as json
:
$ kubectl -n kube-system get svc kube-dns -ojsonpath='{.spec}' | jq
{
"clusterIP": "10.96.0.10",
"clusterIPs": [
"10.96.0.10"
],
"internalTrafficPolicy": "Cluster",
"ipFamilies": [
"IPv4"
],
"ipFamilyPolicy": "SingleStack",
"ports": [
{
"name": "dns",
"port": 53,
"protocol": "UDP",
"targetPort": 53
},
{
"name": "dns-tcp",
"port": 53,
"protocol": "TCP",
"targetPort": 53
},
{
"name": "metrics",
"port": 9153,
"protocol": "TCP",
"targetPort": 9153
}
],
"selector": {
"k8s-app": "kube-dns"
},
"sessionAffinity": "None",
"type": "ClusterIP"
}
For backwards compatability, the name
kube-dns
is still used, even though Kubernetes no longer users it.
Corefile
Let’s briefly look at the configuration for CoreDNS, the Corefile
. This can be found in a ConfigMap in the kube-system
namespace:
$ kubectl -n kube-system describe cm coredns
Name: coredns
Namespace: kube-system
Labels: <none>
Annotations: <none>
Data
====
Corefile:
----
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
BinaryData
====
Events: <none>
. { }
is the most minimalCorefile
.
Each coredns
Pod will mount a volume to give the underlying container access to the Corefile
, which the server expects to find at a certain location:
Observe:
$ kubectl -n kube-system get pod coredns-76f75df574-xdxtt -ojsonpath='{.spec.volumes[0]}' | jq
{
"configMap": {
"defaultMode": 420,
"items": [
{
"key": "Corefile",
"path": "Corefile"
}
],
"name": "coredns"
},
"name": "config-volume"
}
$ kubectl -n kube-system get pod coredns-76f75df574-xdxtt -ojsonpath='{.spec.containers[0].volumeMounts[0]}' | jq
{
"mountPath": "/etc/coredns",
"name": "config-volume",
"readOnly": true
}
$ kubectl -n kube-system get pod coredns-76f75df574-xdxtt -ojsonpath='{.spec.containers[0].args}' | jq
[
"-conf",
"/etc/coredns/Corefile"
]
If you want to see the logs of the DNS requests, you’ll have to add the log
plugin. Simply edit the coredns
ConfigMap and add it:
.:53 {
log
errors
health {
lameduck 5s
}
ready
...
coredns
will reload itself without any service disruption because of thereload
plugin. Simply wait a short amount of time, and then you’ll be able to dump the logs.
Ok, so all that is so very, very interesting. So, we’ve learned that the kubelet
, according to its config, will inject the IP address of the coredns
Service into each container’s /etc/resolv.conf
. That’s nice of it.
Pods
Here is the Pod manifest that we use to create the Pods in this section:
---
apiVersion: v1
kind: Pod
metadata:
name: debug
labels:
name: debug
spec:
containers:
- name: debug
image: btoll/debug:latest
tty: true
stdin: true
imagePullPolicy: Always
securityContext:
capabilities:
add: ["NET_ADMIN"]
restartPolicy: Always
The
NET_ADMIN
capability was added because thedebug
container image containsiptables
which requires escalated privileges.
Let’s also see the endpoint that was assigned to each Pod:
$ kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
debug 1/1 Running 0 2m14s 10.244.4.2 beta-worker <none> <none>
debug2 1/1 Running 0 22s 10.244.2.2 beta-worker4 <none> <none>
Pod A/AAAA Records
A and AAAA records aren’t created for Pod names. Instead, they are based upon the IP address and have the following form:
pod-ipv4-address.namespace.pod.cluster-domain.example
The debug
Pod will then have the following A record:
10-244-4-2.default.pod.cluster.local
So, let’s ping
the debug
Pod from debug2
to see this in action:
$ kubectl exec debug2 -it -- ping -c2 10-244-4-2.default.pod
PING 10-244-4-2.default.pod.cluster.local (10.244.4.2) 56(84) bytes of data.
64 bytes from 10.244.4.2 (10.244.4.2): icmp_seq=1 ttl=62 time=0.086 ms
64 bytes from 10.244.4.2 (10.244.4.2): icmp_seq=2 ttl=62 time=0.286 ms
--- 10-244-4-2.default.pod.cluster.local ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.086/0.186/0.286/0.100 ms
And, with dig
(not specifying a record defaults to the A record):
$ kubectl exec debug2 -it -- dig +short +search 10-244-4-2.default.pod
10.244.4.2
Note that
dig
needs the+search
option to be able to resolve a query that isn’t afqdn
.
Interestingly, there is another way to create an A record. Everybody knows that the .metadata.name
of the Pod will also be its hostname
within the Pod. Similarly, if the optional .spec.hostname
field is present, this overrides the name of the Pod as the hostname
.
For example, if the .spec.hostname: foo
, then:
$ kubectl exec debug -it -- hostname
foo
In addition, there is also an optional .spec.subdomain
field which is used to indicate that the Pod is part of a subgroup of the namespace
. If we specify both the .spec.hostname
and the .spec.subdomain
in the PodSpec, it will create a fqdn
as the Pod in /etc/hosts
:
spec:
hostname: foo
subdomain: bar
containers:
- name: debug
image: btoll/debug:latest
tty: true
stdin: true
imagePullPolicy: Always
securityContext:
capabilities:
add: ["NET_ADMIN"]
restartPolicy: Always
And, in /etc/hosts
:
$ kubectl exec debug -- cat /etc/hosts | tail -1
10.244.4.2 foo.bar.default.svc.cluster.local foo
$ kubectl exec debug2 -- cat /etc/hosts | tail -1
10.244.2.2 foo2.bar.default.svc.cluster.local foo
So, foo.bar.default.svc.cluster.local
will be a resolvable name in the same node. Right now, that’s not very helpful, as it can only be resolved in the Pod.
However, if we expose the Pods with a Service that is named the same as the value of .spec.subdomain
, then an A record will be created with that same fqdn
that can be used within the cluster.
We’ll do just that in the next section.
Services
Here is the (headless) Service manifest that we’ll be using:
apiVersion: v1
kind: Service
metadata:
name: bar
labels:
name: svc-bar
spec:
selector:
name: debug
clusterIP: None
ports:
- name: foo
port: 1234
- name: bar
port: 5678
Some important notes about this manifest:
- It’s name must be the same as the PodSpec’s
subdomain
(i.e.,bar
). - The
label
can be named anything, as usual. - The
selector
must match the Pod labels, i.e.,name=debug
.
Now, if we look at its endpoints, we can see that it’s referencing the two Pods named debug
and will round-robin the requests to them:
$ kubectl get ep bar -ojsonpath='{.subsets[0].addresses}' | jq
[
{
"hostname": "foo",
"ip": "10.244.4.2",
"nodeName": "beta-worker2",
"targetRef": {
"kind": "Pod",
"name": "debug",
"namespace": "default",
"uid": "e134e429-34f6-4cde-bed0-cbe16d163dc3"
}
},
{
"hostname": "foo2",
"ip": "10.244.2.2",
"nodeName": "beta-worker3",
"targetRef": {
"kind": "Pod",
"name": "debug2",
"namespace": "default",
"uid": "f6134bcf-4a51-4469-be9e-9cda6c2d8cbc"
}
}
]
Let’s check out its services. This is listing the open ports on both the debug
and the debug2
Pods:
$ kubectl exec debug -it -- dig +short +search bar.default SRV
0 25 5678 foo.bar.default.svc.cluster.local.
0 25 1234 foo.bar.default.svc.cluster.local.
0 25 5678 foo2.bar.default.svc.cluster.local.
0 25 1234 foo2.bar.default.svc.cluster.local.
Can we ping
them?
$ kubectl exec debug -it -- ping -c2 foo2.bar
PING foo2.bar.default.svc.cluster.local (10.244.4.2) 56(84) bytes of data.
64 bytes from foo2.bar.default.svc.cluster.local (10.244.4.2): icmp_seq=1 ttl=62 time=0.085 ms
64 bytes from foo2.bar.default.svc.cluster.local (10.244.4.2): icmp_seq=2 ttl=62 time=0.188 ms
--- foo2.bar.default.svc.cluster.local ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1024ms
rtt min/avg/max/mdev = 0.085/0.136/0.188/0.051 ms
vagrant🐧vagrant-kind 1:-bash ~~> ~:
$ kubectl exec debug2 -it -- ping -c2 foo.bar
PING foo.bar.default.svc.cluster.local (10.244.2.2) 56(84) bytes of data.
64 bytes from foo.bar.default.svc.cluster.local (10.244.2.2): icmp_seq=1 ttl=62 time=0.218 ms
64 bytes from foo.bar.default.svc.cluster.local (10.244.2.2): icmp_seq=2 ttl=62 time=0.245 ms
--- foo.bar.default.svc.cluster.local ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.218/0.231/0.245/0.013 ms
You bet!
As a sanity, let’s create one more Pod on the same subdomain
to see what happens. We expect that it we be added to the backend Pods of the bar
Service and be able to be discovered by its fqdn
like the previous two debug
pods in the same domain (and in the same namespace
):
---
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
labels:
name: debug
spec:
hostname: dnsutils
subdomain: bar
containers:
- name: debug
image: registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3
tty: true
stdin: true
imagePullPolicy: Always
restartPolicy: Always
Notice that this has a different container
spec, but it shouldn’t matter. The important thing is that it is part of the bar
subdomain.
After applying the manifest, we check out the Service endpoints:
[
{
"hostname": "foo",
"ip": "10.244.1.2",
"nodeName": "beta-worker2",
"targetRef": {
"kind": "Pod",
"name": "debug",
"namespace": "default",
"uid": "e134e429-34f6-4cde-bed0-cbe16d163dc3"
}
},
{
"hostname": "dnsutils",
"ip": "10.244.2.5",
"nodeName": "beta-worker",
"targetRef": {
"kind": "Pod",
"name": "dnsutils",
"namespace": "default",
"uid": "b45f9f0c-51c8-4c4f-bb3b-f33d22e17582"
}
},
{
"hostname": "foo2",
"ip": "10.244.4.2",
"nodeName": "beta-worker3",
"targetRef": {
"kind": "Pod",
"name": "debug2",
"namespace": "default",
"uid": "f6134bcf-4a51-4469-be9e-9cda6c2d8cbc"
}
}
]
That looks promising. Can we ping
it?
$ kubectl exec debug -it -- ping -c2 dnsutils.bar
PING dnsutils.bar.default.svc.cluster.local (10.244.2.5) 56(84) bytes of data.
64 bytes from dnsutils.bar.default.svc.cluster.local (10.244.2.5): icmp_seq=1 ttl=62 time=0.105 ms
64 bytes from dnsutils.bar.default.svc.cluster.local (10.244.2.5): icmp_seq=2 ttl=62 time=0.277 ms
--- dnsutils.bar.default.svc.cluster.local ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.105/0.191/0.277/0.086 ms
Kool Moe Dee.
Here is the new Pod container’s /etc/hosts
:
$ kubectl exec dnsutils -- cat /etc/hosts | tail -1
10.244.2.5 dnsutils.bar.default.svc.cluster.local dnsutils
Service A/AAAA Records
Both normal and headless Service are assigned an A and AAAA record in the form of:
service-name.namespace.svc.cluster-domain.example
So, for our bar
Service, it will look like this:
bar.default.svc.cluster.local
The difference between the two types of Services is what the record resolves to.
For headless Services, i.e., that without a ClusterIP
, the A record will resolve to the ClusterIP
. A headless Service is created with the value None
being assigned to the .spec.clusterIP
field, which is what we defined above.
The A record of this headless Service will resolve to the set of IP addresses of all the Pods selected by the Service:
$ kubectl exec debug -- dig +short bar.default.svc.cluster.local A
10.244.2.2
10.244.1.2
10.244.4.2
Now, if we were to change that to a normal Service, i.e, one that has a ClusterIP
assigned to it and acts as a load balancer, we can observe how the value of the DNS A record changes.
Do the following:
$ kubectl delete -f service.yaml
$ sed -i '/clusterIP: None/d' service.yaml
$ kubectl apply -f service.yaml
This will re-create the Service, but with a ClusterIP
. Let’s list it:
$ kubectl get svc -l name=svc-bar
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
bar ClusterIP 10.96.67.41 <none> 1234/TCP,5678/TCP 8m40s
Now, when we list the A record, it should only list this ClusterIP
, rather than all of the Pods its selector matched:
$ kubectl exec debug -- dig +short bar.default.svc.cluster.local A
10.96.67.41
That’s right, bubba.
Service SRV Records
For both a normal and a headless Service, SRV records will be created in the form of:
_port-name._port-protocol.service-name.namespace.svc.cluster-domain.example
Like A and AAAA records, the SRV record will resolve to a different value based upon the kind of Service. For a headless Service, it will return multiple answers, one for each Pod that is backing the Service, and contains the port number and the domain name of the Pod of the form hostname.service-name.namespace.svc.cluster-domain.example
.
Let’s see what dig
returns for the headless Service:
$ kubectl exec debug -- dig _kilgore._tcp.bar.default.svc.cluster.local SRV
; <<>> DiG 9.18.24-1-Debian <<>> kilgore.tcp.bar.default.svc.cluster.local SRV
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57268
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 4
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 03419e27de6dcc5d (echoed)
;; QUESTION SECTION:
;kilgore.tcp.bar.default.svc.cluster.local. IN SRV
;; ANSWER SECTION:
kilgore.tcp.bar.default.svc.cluster.local. 30 IN SRV 0 33 1234 foo.bar.default.svc.cluster.local.
kilgore.tcp.bar.default.svc.cluster.local. 30 IN SRV 0 33 1234 foo2.bar.default.svc.cluster.local.
kilgore.tcp.bar.default.svc.cluster.local. 30 IN SRV 0 33 1234 dnsutils.bar.default.svc.cluster.local.
;; ADDITIONAL SECTION:
foo2.bar.default.svc.cluster.local. 30 IN A 10.244.4.2
foo.bar.default.svc.cluster.local. 30 IN A 10.244.1.2
dnsutils.bar.default.svc.cluster.local. 30 IN A 10.244.2.2
;; Query time: 0 msec
;; SERVER: 10.96.0.10#53(10.96.0.10) (UDP)
;; WHEN: Mon Mar 18 19:41:51 UTC 2024
;; MSG SIZE rcvd: 523
That was the long form of dig
. Let’s use the +short
option for both named ports:
$ kubectl exec debug -- dig +short _kilgore._tcp.bar.default.svc.cluster.local SRV
0 33 1234 foo2.bar.default.svc.cluster.local.
0 33 1234 dnsutils.bar.default.svc.cluster.local.
0 33 1234 foo.bar.default.svc.cluster.local.
$ kubectl exec debug -- dig +short _trout._udp.bar.default.svc.cluster.local SRV
0 33 5678 foo2.bar.default.svc.cluster.local.
0 33 5678 dnsutils.bar.default.svc.cluster.local.
0 33 5678 foo.bar.default.svc.cluster.local.
Note that I found the SRV records return the same information regardless of whether the port name and protocol are prefixed by an underscore (
_
) as the docs suggest.
Let’s turn it into a normal Service with a ClusterIP
. Here are the same instructions as above:
$ kubectl delete -f service.yaml
$ sed -i '/clusterIP: None/d' service.yaml
$ kubectl apply -f service.yaml
Let’s see the IP address that has been assigned:
$ kubectl get svc -l name=svc-bar
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
bar ClusterIP 10.96.221.177 <none> 1234/TCP,5678/UDP 5m55s
And the SRV record:
$ kubectl exec debug -- dig _kilgore._tcp.bar.default.svc.cluster.local SRV
; <<>> DiG 9.18.24-1-Debian <<>> _kilgore._tcp.bar.default.svc.cluster.local SRV
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39593
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 566f507c65539ddc (echoed)
;; QUESTION SECTION:
;_kilgore._tcp.bar.default.svc.cluster.local. IN SRV
;; ANSWER SECTION:
_kilgore._tcp.bar.default.svc.cluster.local. 30 IN SRV 0 100 1234 bar.default.svc.cluster.local.
;; ADDITIONAL SECTION:
bar.default.svc.cluster.local. 30 IN A 10.96.221.177
;; Query time: 0 msec
;; SERVER: 10.96.0.10#53(10.96.0.10) (UDP)
;; WHEN: Mon Mar 18 19:53:09 UTC 2024
;; MSG SIZE rcvd: 221
You can still query the SRV record by the _port-name._protocol
form, but it will return that of the Service
:
$ kubectl exec debug -- dig +short _kilgore._tcp.bar.default.svc.cluster.local SRV
0 100 1234 bar.default.svc.cluster.local.
$ kubectl exec debug -- dig +short _trout._udp.bar.default.svc.cluster.local SRV
0 100 5678 bar.default.svc.cluster.local.
Weeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
What happens when the internal DNS server cannot resolve a hostname? It will send it upstream, which in this case will be one of the Nodes. Here is the beta-worker
Node’s /etc/resolv.conf
with its nameserver listed, which in my case happens to be a bridge
device to which all of the veth
pairs have one end connected (the other end is in another net
namespace). These requests will then in turn get sent to the next upstream resolver.
Check out my article on Linux Container Networking to learn more about the
net
namespace, virtual ethernet adapters and how container orchestrators will create Pod networking.
$ docker exec -it beta-worker cat /etc/resolv.conf
search home
nameserver 172.18.0.1
options ndots:0
$
$ ip a show br-a6972fbee62e
5: br-a6972fbee62e: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:45:3c:b2:ca brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-a6972fbee62e
valid_lft forever preferred_lft forever
inet6 fc00:f853:ccd:e793::1/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::42:45ff:fe3c:b2ca/64 scope link
valid_lft forever preferred_lft forever
inet6 fe80::1/64 scope link
valid_lft forever preferred_lft forever
$
$ ip link list master br-a6972fbee62e
7: veth30ce669@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-a6972fbee62e state UP mode DEFAULT group default
link/ether 82:a7:bb:27:b2:86 brd ff:ff:ff:ff:ff:ff link-netnsid 0
9: veth9b4ec1d@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-a6972fbee62e state UP mode DEFAULT group default
link/ether 96:46:6f:8a:b6:5b brd ff:ff:ff:ff:ff:ff link-netnsid 1
11: veth4366ae6@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-a6972fbee62e state UP mode DEFAULT group default
link/ether 9a:e3:8c:8a:bc:a0 brd ff:ff:ff:ff:ff:ff link-netnsid 3
13: veth31754a5@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-a6972fbee62e state UP mode DEFAULT group default
link/ether ee:9a:5e:ab:ed:06 brd ff:ff:ff:ff:ff:ff link-netnsid 2
15: vethf6d50eb@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-a6972fbee62e state UP mode DEFAULT group default
link/ether f2:a7:5f:79:42:47 brd ff:ff:ff:ff:ff:ff link-netnsid 4
Conclusion
So, there you have it. We’ve all learned something, and we’ve done it together, like a couple of pals.