What is Pod Priority?
Pod priority is a Kubernetes scheduling feature that allows Kubernetes to make scheduling decisions comparing other pods based on priority number.
Let’s Discuss two components of Pod Priority:-
- Pod Preemption
- Pod Priority Class
Pod Preemption: When there are no node resources available and higher priority pods are in the scheduling queue, Kubernetes’ pod preemption capability enables it to preempt (evict) lower priority pods from nodes.
Pod Priority Class: Pods can have priorities with help of PriorityClass which is a non-namespaced object that contains a numeric integer value that is equal to or lower than 1,000,000,000 which determines the priority of a pod over other pods. PriorityClass ensures that production or mission-critical workloads are allocated resources/nodes on priority over other non-critical resources. PriorityClass is used to order the scheduling queue in Kubernetes or also evict the least priority pods in order to make high priority pods run. We can assign a pod a PriorityClass by specifying the name of the PriorityClass in the podspec of a Pod priorityClassName attribute.
By Default Kubernetes ships with two priority Classes:
- system-node-critical: This class has a value of 2000001000. Pods like etcd, kube-apiserver, and Controller manager use this priority class.
- system-cluster-critical: This class has a value of 2000000000. Addon Pods like coredns, calico controller, metrics server, etc use this Priority class.
Let’s start some hands-on :
In order to test priority class, i am using minikube.
lets start minikube cluster
minikube start --memory 1700 --cpus 2
Here i am starting minikube with memory limit 1700MB, the total usual memory for the pods would only be around 1Gi this is purposefully done in order to simulate a resource crunch & cause PriorityClass to provide more priority to high priority pods over other pods.
Below i have created PriorityClass with name high-priority
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
namespace: default
value: 1000000
globalDefault: false
description: "Used for High Priority Pods"
Note: Value determines the priority. It can be 1,000,000,000 (one billion) or lower. Larger the number, the higher the priority.
Now let’s create three pods in which 1Gi is memory available in total.
- no-priority-app (memory: 500Mi) (1Gi — 500Mi) — 500Mi remaining.
- high-priority-app1 (memory: 500Mi) (524Mi — 500Mi) — 24Mi remaining.
- high-priority-app2 (memory: 500Mi) — will cause preemption as no resource is available.
Creating first no-priority-app
apiVersion: v1
kind: Pod
metadata:
name: no-priority-app
namespace: default
spec:
containers:
- name: app
image: nginx
resources:
requests:
cpu: 500m
memory: 500Mi
limits:
cpu: 500m
memory: 500Mi
Creating 2nd high-priority-app1
apiVersion: v1
kind: Pod
metadata:
name: high-priority-app1
namespace: default
spec:
containers:
- name: app
image: nginx
resources:
requests:
cpu: 500m
memory: 500Mi
limits:
cpu: 500m
memory: 500Mi
priorityClassName: high-priority
In the above cases, we create two pods one has PriorityClass assigned other has no priority, and has allocated memory around 500Mi each in which the total memory of the node is closed to being exhausted now let’s try to create another high-priority-app2.
Creating 3rd high-priority-app2
apiVersion: v1
kind: Pod
metadata:
name: high-priority-app2
namespace: default
spec:
containers:
- name: app
image: nginx
resources:
requests:
cpu: 500m
memory: 500Mi
limits:
cpu: 500m
memory: 500Mi
priorityClassName: high-priority
In this example, there is a resource shortage as soon as we attempt to create the new pod (high-priority-app2) because it requires 500Mi of memory to operate. However, because it is a high-priority-pod, the pods with lower or no priority are terminated in order to make room for the high-priority-app2, which now has the memory it needs to operate.
Thanks….. for reading Follow more Blogs on Cloudsbaba
References:
https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/