使用Prometheus监控K8s集群

目前我们的业务都部署在K8s集群上,我们想要搭建一个监控服务来对K8s集群的资源进行自动化监控。本文分两节,一个是使用Prometheus来检测K8s集群,当检测到节点或者容器出现异常时及时报警,快速发现并解决问题。另一个编写了一个简单的Shell脚本来测试集群网络。

英文版

针对于K8s集群,主要是对三方面进行监控,分别是Node、Namespace、Pod。

一、 Node监控

针对于节点的维度,主要监控内存、CPU使用率、磁盘和索引的使用率,过高告警。还要监控NodeNotReady的情况。

1、NodeMemorySpaceFillingUp

监控Node内存使用率,如果大于80%则报警。

1
2
3
4
5
6
7
8
9
alert:NodeMemorySpaceFillingUp
expr:((1 - (node_memory_MemAvailable_bytes{job="node-exporter"} / node_memory_MemTotal_bytes{job="node-exporter"}) * on(instance) group_left(nodename) (node_uname_info) > 0.8) * 100)
for: 5m
labels:
cluster: critical
type: node
annotations:
description: Memory usage on `{{$labels.nodename}}`({{ $labels.instance }}) up to {{ printf "%.2f" $value }}%.
summary: Node memory will be exhausted.

2、NodeCpuUtilisationHigh

监控Node CPU使用率,如果大于80%则报警。

1
2
3
4
5
6
7
8
9
alert:NodeFilesystemAlmostOutOfSpace
expr:((node_filesystem_avail_bytes{fstype!="",job="node-exporter"} / node_filesystem_size_bytes{fstype!="",job="node-exporter"} * 100 < 20 and node_filesystem_readonly{fstype!="",job="node-exporter"} == 0) * on(instance) group_left(nodename) (node_uname_info))
for: 5m
labels:
cluster: critical
type: node
annotations:
description: Filesystem on `{{ $labels.device }}` at `{{$labels.nodename}}`({{ $labels.instance }}) has only {{ printf "%.2f" $value }}% available space left.
summary: Node filesystem has less than 20% space left.

3、NodeFilesystemAlmostOutOfSpace

监控Node磁盘使用率,剩余空间<10%则报警。

1
2
3
4
5
6
7
8
9
alert:NodeFilesystemAlmostOutOfSpace
expr:((node_filesystem_avail_bytes{fstype!="",job="node-exporter"} / node_filesystem_size_bytes{fstype!="",job="node-exporter"} * 100 < 10 and node_filesystem_readonly{fstype!="",job="node-exporter"} == 0) * on(instance) group_left(nodename) (node_uname_info))
for: 5m
labels:
cluster: critical
type: node
annotations:
description: Filesystem on `{{ $labels.device }}` at `{{$labels.nodename}}`({{ $labels.instance }}) has only {{ printf "%.2f" $value }}% available space left.
summary: Node filesystem has less than 10% space left.

4、NodeFilesystemAlmostOutOfFiles

监控Node索引节点使用率,剩余空间<10%则报警。

1
2
3
4
5
6
7
8
9
alert:NodeFilesystemAlmostOutOfFiles
expr:((node_filesystem_files_free{fstype!="",job="node-exporter"} / node_filesystem_files{fstype!="",job="node-exporter"} * 100 < 10 and node_filesystem_readonly{fstype!="",job="node-exporter"} == 0) * on(instance) group_left(nodename) (node_uname_info))
for: 5m
labels:
cluster: critical
type: node
annotations:
description: Filesystem on `{{ $labels.device }}` at `{{$labels.nodename}}`({{ $labels.instance }}) has only {{ printf "%.2f" $value }}% available inodes left.
summary: Node filesystem has less than 10% inodes left.

5、KubeNodeNotReady

监控Node状态,如果有Node Not Ready则报警。

1
2
3
4
5
6
7
8
9
alert:KubeNodeNotReady
expr:(kube_node_status_condition{condition="Ready",job="kube-state-metrics",status="true"} == 0)
for: 5m
labels:
cluster: critical
type: node
annotations:
description: {{ $labels.node }} has been unready for more than 15 minutes.
summary: Node is not ready.

6、KubeNodePodsTooMuch

监控Node上pod数量,我们设置的最大每个Node上最多运行110个Pod,如果使用率>80%则报警。

1
2
3
4
5
6
7
8
9
alert:KubeNodePodsTooMuch
expr:(sum by(node) (kube_pod_info) * 100 / 110 > 80)
for: 5m
labels:
cluster: critical
type: node
annotations:
description: Pods usage on `{{$labels.node}}` up to {{ printf "%.2f" $value }}%.
summary: Node pods too much.

二、Namespace监控

Namespace关于CPU和内存有三个值,分别是limit、request和usage。

我的理解是:

  • limit 最多可以申请多少资源(Pod 维度))

    cpu:namespace_cpu:kube_pod_container_resource_limits:sum

    memory:namespace_memory:kube_pod_container_resource_requests:sum

  • request 申请了多少资源(Pod 维度))

    cpu:namespace_cpu:kube_pod_container_resource_requests:sum

    memory:namespace_memory:kube_pod_container_resource_limits:sum

  • usage 实际使用了多少资源(Pod 维度))

    cpu:sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate) by (namespace)

    memory:sum(node_namespace_pod_container:container_memory_working_set_bytes) by (namespace)

我们在kubord里面可以设置namespace的limit内存和cpu,这个暂时不知道如何从Prometheus获取,后续会持续关注。

当request/limit > 80%则说明Namespace资源可能不够,需要扩大namespace资源。

1、NamespaceCpuUtilisationHigh

监控namespace cpu使用率,高于90%则报警。

1
2
3
4
5
6
7
8
9
alert:NamespaceCpuUtilisationHigh
expr:(sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate) by (namespace) / sum(namespace_cpu:kube_pod_container_resource_limits:sum) by (namespace) * 100 > 90)
for: 5m
labels:
cluster: critical
type: namespace
annotations:
description: CPU utilisation on `{{$labels.namespace}}` up to {{ printf "%.2f" $value }}%.
summary: Namespace CPU utilisation high.

2、NamespaceCpuUtilisationLow

监控namespace cpu使用率,低于10%则报警。

1
2
3
4
5
6
7
8
9
alert:NamespaceCpuUtilisationLow
expr:(sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate) by (namespace) / sum(namespace_cpu:kube_pod_container_resource_limits:sum) by (namespace) * 100 < 10)
for: 5m
labels:
cluster: critical
type: namespace
annotations:
description: CPU utilisation on `{{$labels.namespace}}` as low as {{ printf "%.2f" $value }}%.
summary: Namespace CPU underutilization.

3、NamespaceMemorySpaceFillingUp

监控namespace 内存使用率,高于90%则报警。

1
2
3
4
5
6
7
8
9
alert:NamespaceMemorySpaceFillingUp
expr:(sum(node_namespace_pod_container:container_memory_working_set_bytes) by (namespace) / sum(namespace_memory:kube_pod_container_resource_limits:sum) by (namespace) * 100 > 90)
for: 5m
labels:
cluster: critical
type: namespace
annotations:
description: Memory usage on `{{$labels.namespace}}` up to {{ printf "%.2f" $value }}%.
summary: Namespace memory will be exhausted.

4、NamespaceMemorySpaceLow

监控namespace 内存使用率,低于10%则报警。

1
2
3
4
5
6
7
8
9
alert:NamespaceMemorySpaceLow
expr:(sum(node_namespace_pod_container:container_memory_working_set_bytes) by (namespace) / sum(namespace_memory:kube_pod_container_resource_limits:sum) by (namespace) * 100 < 10)
for: 5m
labels:
cluster: critical
type: namespace
annotations:
description: Memory usage on `{{$labels.namespace}}` as low as {{ printf "%.2f" $value }}%.
summary: Under-utilized namespace memory.

5、KubePodNotReady

监控pod状态,如果存在pod持续not-ready达到十五分钟则报警。

1
2
3
4
5
6
7
8
9
alert:KubePodNotReady
expr:(sum by(namespace, pod) (max by(namespace, pod) (kube_pod_status_phase{job="kube-state-metrics",namespace=~".*",phase=~"Pending|Unknown"}) * on(namespace, pod) group_left(owner_kind) topk by(namespace, pod) (1, max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!="Job"}))) > 0)
for: 5m
labels:
cluster: critical
type: namespace
annotations:
description: Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 15 minutes.
summary: Pod has been in a non-ready state for more than 15 minutes.

6、KubeContainerWaiting

监控pod状态,如果存在pod持续waiting达到十五分钟则报警。

1
2
3
4
5
6
7
8
9
alert:KubeContainerWaiting
expr:(sum by(namespace, pod, container) (kube_pod_container_status_waiting_reason{job="kube-state-metrics",namespace=~".*"}) > 0)
for: 5m
labels:
cluster: critical
type: namespace
annotations:
description:Pod {{ $labels.namespace }}/{{ $labels.pod }} container {{$labels.container}} has been in waiting state for longer than 15 minutes.
summary: Pod container waiting longer than 15 minutes.

7、PodRestart

这个告警配置的就是如果kube-system这个namespace下面存在某个pod重启了则发送告警。

因为在kube-system这个namespace下面,存在很多集群相关的pod,比如我们的日志收集组件Fluentd和corends等,所以如果这个namespace下面有容器重启,那么需要警惕一下是否集群出现了问题。

1
2
3
4
5
6
7
8
9
alert:PodRestart
expr:(floor(increase(kube_pod_container_status_restarts_total{namespace="kube-system"}[1m])) > 0)
for: 5m
labels:
cluster: critical
type: namespace
annotations:
description:Pod {{ $labels.namespace }}/{{ $labels.pod }} restart {{ $value }} times in last 1 minutes.
summary: Pod restart in last 1 minutes.

8、PrometheusOom

Prometheus自己也是存在宕机的风险,所以我们加了一个监控来检测Prometheus,如果内存使用率达到90%则可能出现异常,所以发送告警。

1
2
3
4
5
6
7
8
9
alert:PrometheusOom
expr:(container_memory_working_set_bytes{container="prometheus"} / container_spec_memory_limit_bytes{container="prometheus"} > 0.9)
for: 5m
labels:
cluster: critical
type: namespace
annotations:
description:Memory usage on `Prometheus` up to {{ printf "%.2f" $value }}%.
summary: Prometheus will be oom.

三、网络监控

我们线上还出现过某台机器的coredns挂了的情况,导致机器的网络正常,但是机器上所有的pod都无网络连接,针对于这种情况,我们也加了简单的监控来保证服务稳定,下文将详情描述具体的监控方式。


评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×