Using Prometheus To Monitoring K8s Cluster

Now that our service is deployed in the K8s cluster, we want to deploy a monitoring service to automatically monitor the resources in the K8s cluster. This blog is divided into two parts. First is using Prometheus to monitor the service for our K8s cluster, when the monitor service finds some node or pod has issues and sends an alert as soon as possible, our deployer can fix the issues asap. And the second page is coded as simply Shell scripts to test the cluster’s network.

中文版

And for the K8s cluster, the main monitor is part into three: Node, Namespace, and Pod.

I. Node Monitoring

For the node side, we mainly monitor the memory, CPU usage, disk, and inodes usages, send alert when usage is too high, and need to cover the NodeNotReady situation.

1. NodeMemorySpaceFillingUp

Monitoring the node’s memory usage, sending an alert when usage > 80%.

alert:NodeMemorySpaceFillingUp
expr:((1 - (node_memory_MemAvailable_bytes{job="node-exporter"} / node_memory_MemTotal_bytes{job="node-exporter"}) * on(instance) group_left(nodename) (node_uname_info) > 0.8) * 100)
for: 5m
labels:
  cluster: critical
  type: node
annotations:
  description: Memory usage on `{{$labels.nodename}}`({{ $labels.instance }}) up to {{ printf "%.2f" $value }}%.
  summary: Node memory will be exhausted.

2. NodeCpuUtilisationHigh

Monitoring the node’s CPU usage, sending an alert when usage > 80%.

alert:NodeFilesystemAlmostOutOfSpace
expr:((node_filesystem_avail_bytes{fstype!="",job="node-exporter"} / node_filesystem_size_bytes{fstype!="",job="node-exporter"} * 100 < 20 and node_filesystem_readonly{fstype!="",job="node-exporter"} == 0) * on(instance) group_left(nodename) (node_uname_info))
for: 5m
labels:
  cluster: critical
  type: node
annotations:
  description: Filesystem on `{{ $labels.device }}` at `{{$labels.nodename}}`({{ $labels.instance }}) has only {{ printf "%.2f" $value }}% available space left.
  summary: Node filesystem has less than 20% space left.

3. NodeFilesystemAlmostOutOfSpace

Monitoring the node’s disk usage, sending an alert when left space < 10%.

alert:NodeFilesystemAlmostOutOfSpace
expr:((node_filesystem_avail_bytes{fstype!="",job="node-exporter"} / node_filesystem_size_bytes{fstype!="",job="node-exporter"} * 100 < 10 and node_filesystem_readonly{fstype!="",job="node-exporter"} == 0) * on(instance) group_left(nodename) (node_uname_info))
for: 5m
labels:
  cluster: critical
	type: node
annotations:
  description: Filesystem on `{{ $labels.device }}` at `{{$labels.nodename}}`({{ $labels.instance }}) has only {{ printf "%.2f" $value }}% available space left.
  summary: Node filesystem has less than 10% space left.

4. NodeFilesystemAlmostOutOfFiles

Monitoring the node’s index usage, sending an alert when left inodes < 10%.

alert:NodeFilesystemAlmostOutOfFiles
expr:((node_filesystem_files_free{fstype!="",job="node-exporter"} / node_filesystem_files{fstype!="",job="node-exporter"} * 100 < 10 and node_filesystem_readonly{fstype!="",job="node-exporter"} == 0) * on(instance) group_left(nodename) (node_uname_info))
for: 5m
labels:
  cluster: critical
  type: node
annotations:
	description: Filesystem on `{{ $labels.device }}` at `{{$labels.nodename}}`({{ $labels.instance }}) has only {{ printf "%.2f" $value }}% available inodes left.
	summary: Node filesystem has less than 10% inodes left.

5. KubeNodeNotReady

Monitoring the node’s stage, sending alerts when some nodes are not ready.

alert:KubeNodeNotReady
expr:(kube_node_status_condition{condition="Ready",job="kube-state-metrics",status="true"} == 0)
for: 5m
labels:
	cluster: critical
	type: node
annotations:
	description: {{ $labels.node }} has been unready for more than 15 minutes.
	summary: Node is not ready.

6. KubeNodePodsTooMuch

Monitoring the node’s pod number, the max pod’s number in every node is 110, send alert when usage > 80%.

alert:KubeNodePodsTooMuch
expr:(sum by(node) (kube_pod_info) * 100 / 110 > 80)
for: 5m
labels:
	cluster: critical
	type: node
annotations:
	description: Pods usage on `{{$labels.node}}` up to {{ printf "%.2f" $value }}%.
	summary: Node pods too much.

II. Namespace Monitoring

There is three num of Namespace, is limit, request, and usage.

In my opinion：

limit The max requests can request (Pod Side)

cpu：namespace_cpu:kube_pod_container_resource_limits:sum

memory：namespace_memory:kube_pod_container_resource_requests:sum
request The request requested (Pod Side)

cpu：namespace_cpu:kube_pod_container_resource_requests:sum

memory：namespace_memory:kube_pod_container_resource_limits:sum
usage The request usage (Pod Side)

cpu：sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate) by (namespace)

memory：sum(node_namespace_pod_container:container_memory_working_set_bytes) by (namespace)

We can define the namespace’s limit memory and CPU, I don’t know how to get this value from Prometheus yet, put more attention.

We need to increase requests when request/limit > is 80%.

1. NamespaceCpuUtilisationHigh

Monitoring the namespace’s CPU usage, sending an alert when usage > 90%.

alert:NamespaceCpuUtilisationHigh
expr:(sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate) by (namespace) / sum(namespace_cpu:kube_pod_container_resource_limits:sum) by (namespace) * 100 > 90)
for: 5m
labels:
	cluster: critical
	type: namespace
annotations:
	description: CPU utilisation on `{{$labels.namespace}}` up to {{ printf "%.2f" $value }}%.
	summary: Namespace CPU utilisation high.

2. NamespaceCpuUtilisationLow

Monitoring the namespace’s CPU usage, sending an alert when usage < 10%.

alert:NamespaceCpuUtilisationLow
expr:(sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate) by (namespace) / sum(namespace_cpu:kube_pod_container_resource_limits:sum) by (namespace) * 100 < 10)
for: 5m
labels:
	cluster: critical
	type: namespace
annotations:
	description: CPU utilisation on `{{$labels.namespace}}` as low as {{ printf "%.2f" $value }}%.
	summary: Namespace CPU underutilization.

3. NamespaceMemorySpaceFillingUp

Monitoring the namespace’s memory usage, sending an alert when usage > 90%.

alert:NamespaceMemorySpaceFillingUp
expr:(sum(node_namespace_pod_container:container_memory_working_set_bytes) by (namespace) / sum(namespace_memory:kube_pod_container_resource_limits:sum) by (namespace) * 100  > 90)
for: 5m
labels:
	cluster: critical
	type: namespace
annotations:
	description: Memory usage on `{{$labels.namespace}}` up to {{ printf "%.2f" $value }}%.
	summary: Namespace memory will be exhausted.

4. NamespaceMemorySpaceLow

Monitoring the namespace’s memory usage, sending an alert when usage < 10%.

alert:NamespaceMemorySpaceLow
expr:(sum(node_namespace_pod_container:container_memory_working_set_bytes) by (namespace) / sum(namespace_memory:kube_pod_container_resource_limits:sum) by (namespace) * 100  < 10)
for: 5m
labels:
	cluster: critical
	type: namespace
annotations:
	description: Memory usage on `{{$labels.namespace}}` as low as {{ printf "%.2f" $value }}%.
	summary: Under-utilized namespace memory.

5. KubePodNotReady

Monitoring the pod’s states, send an alert if any pod continues not-ready states exist until 15 mins.

alert:KubePodNotReady
expr:(sum by(namespace, pod) (max by(namespace, pod) (kube_pod_status_phase{job="kube-state-metrics",namespace=~".*",phase=~"Pending|Unknown"}) * on(namespace, pod) group_left(owner_kind) topk by(namespace, pod) (1, max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!="Job"}))) > 0)
for: 5m
labels:
	cluster: critical
	type: namespace
annotations:
	description: Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 15 minutes.
	summary: Pod has been in a non-ready state for more than 15 minutes.

6. KubeContainerWaiting

Monitoring the pod’s states, send an alert if any pod continues waiting until 15 mins.

alert:KubeContainerWaiting
expr:(sum by(namespace, pod, container) (kube_pod_container_status_waiting_reason{job="kube-state-metrics",namespace=~".*"}) > 0)
for: 5m
labels:
	cluster: critical
	type: namespace
annotations:
	description:Pod {{ $labels.namespace }}/{{ $labels.pod }} container {{$labels.container}} has been in waiting state for longer than 15 minutes.
	summary: Pod container waiting longer than 15 minutes.

7. PodRestart

This monitor is configured that if some pod in the Kube-system namespace restart and sends an alert.

Because pods in the Kube-system namespace have some pod-related clusters, like our log college tools and cordons, etc. So if any pods in this namespace restarted, we need to pay more action to the cluster that has some issues.

alert:PodRestart
expr:(floor(increase(kube_pod_container_status_restarts_total{namespace="kube-system"}[1m])) > 0)
for: 5m
labels:
	cluster: critical
	type: namespace
annotations:
	description:Pod {{ $labels.namespace }}/{{ $labels.pod }} restart {{ $value }} times in last 1 minutes.
	summary: Pod restart in last 1 minutes.

8. PrometheusOom

Prometheus has some down risk, so we ass a monitor for monitoring Prometheus, if memory is up to 90% maybe has some issues, so send an alert.

alert:PrometheusOom
expr:(container_memory_working_set_bytes{container="prometheus"} / container_spec_memory_limit_bytes{container="prometheus"} > 0.9)
for: 5m
labels:
	cluster: critical
	type: namespace
annotations:
	description:Memory usage on `Prometheus` up to {{ printf "%.2f" $value }}%.
	summary: Prometheus will be oom.

III. Network Monitoring

We have some case of some machine’s cores being down, and the networking machine is normal but all pod in this machine has no network, for this case, we add some simply monitor to keep the service stable, in the next blog I will control the detail of this monitoring.