The Pod spec for your apps can be one of the more complex parts of your Kubernetes manifest design, and needs many features enabled to be a save and reasonably secure default.
This single-file repository is meant to be a starting point for your Pod specs, to add to Deployments, DaemonSets, StatefulSets, initContainers, etc.
It's based on years of consulting, the Kubernetes courses and workshops I do, and this tweet when I first had the idea.
The spec from ./pod.yaml
spec: containers: # basic container details - name: my-container-name # never use reusable tags like latest or stable image: my-image:tag # hardcode the listening port if Dockerfile isn't set with EXPOSE ports: - containerPort: 8080 protocol: TCP readinessProbe: # I always recommend using these, even if your app has no listening ports (this affects any rolling update) httpGet: # Lots of timeout values with defaults, be sure they are ideal for your workload path: /ready port: 8080 livenessProbe: # only needed if your app tends to go unresponsive or you don't have a readinessProbe, but this is up for debate httpGet: # Lots of timeout values with defaults, be sure they are ideal for your workload path: /alive port: 8080 resources: # Because if limits = requests then QoS is set to "Guaranteed" limits: memory: "500Mi" # If container uses over 500MB it is killed (OOM) #cpu: "2" # Not normally needed, unless you need to protect other workloads or QoS must be "Guaranteed" requests: memory: "500Mi" # Scheduler finds a node where 500MB is available cpu: "1" # Scheduler finds a node where 1 vCPU is available # per-container security context # lock down privileges inside the container securityContext: allowPrivilegeEscalation: false # prevent sudo, etc. privileged: false # prevent acting like host root terminationGracePeriodSeconds: 600 # default is 30, but you may need more time to gracefully shutdown (HTTP long polling, user uploads, etc) # per-pod security context # enable seccomp and force non-root user securityContext: seccompProfile: type: RuntimeDefault # enable seccomp and the runtimes default profile runAsUser: 1001 # hardcode user to non-root if not set in Dockerfile runAsGroup: 1001 # hardcode group to non-root if not set in Dockerfile runAsNonRoot: true # hardcode to non-root. Redundant to above if Dockerfile is set USER 1000
- For
spec.containers.resources, it's good to review how Kubernetes Quality of Service (QoS) works, as it'll affect when your pod is evicted from a node when it runs out of resources. For example, if your limits don't match your requests, then your pod only receives a QoS class of Burstable rather than the highest level of Guaranteed. - You can remove
runAsUser/runAsGroupif you are using a Dockerfile that sets the user/group to non-root (or ko or buildpacks, thanks @e_k_anderson), but some teams will still require these values hardcoded in the manifest (or in admission controller) to enforce at the server-side. - If
runAsNonRootis true (as it should be), you may get errorCreateContainerConfigError: Error: container has runAsNonRoot and image has non-numeric user (username), cannot verify user is non-root.if your DockerfileUSERisn't an ID. Kubernetes wants it as an ID (not friendly username likenode) to ensure it's not just a user mapping to UID 0 (root). I think this can be avoided if you hardcode the user as well in the manifest (runAsUser), but I haven't tested that. - If you have over ~1,000 services in a namespace, maybe set
pod.spec.enableServiceLinks: falseto avoid minor container startup and TCP round-trip delays thanks @e_k_anderson. - You can likely avoid needing
pod.spec.containers.imagePullPolicybecause the defaults are smart and tend to do the right thing. pod.spec.containers.securityContext.readOnlyRootFilesystemis a good idea if possible, but usually doesn't work out-of-the-box with monoliths and traditional apps. YMMV.