对Kubernetes中的各项资源进行监控有助于我们时刻了解集群的运行状况,从而对集群进行相应的操作,比如扩容、缩容等。在Kubernetes中主要通过Heapster结合一些其它组件进行资源监控。

1. 架构分析

Heapster也是Kubernetes的一部分,主要目标就是对Kubernetes集群进行基础监控,再结合cAdvisor等工具就可以对集群进行很好的监控。而cAdvisor工具已经集成到了kubelet文件中,当我们部署好一个Kubernetes集群以后,默认监听在4194端口,且有简单的UI:

cAdvisor.png

某个容器的资源使用:

cAdvisor-2.png

Heapster和普通的应用一样,都是以Pod的方式运行在集群中,并通过kubelet进程获取Node的使用信息,而kubelet则是通过自己内置的cAdvisor来获取数据。然后Heapster根据Pod及相关的Labels将这些信息进行分组聚合,最后保存到后端。这个后端也是可配置的,在开源界最受欢迎的是InfluxDB(使用Grafana作为前端展示),其他支持的后端可参考here。整个监控的架构如下:

monitoring-architecture.png

2. 部署实践

我们以最受欢迎的Heapster+InfluxDB+Grafana作为实践,大部分涉及的东西可以在Heapster工程下面找到,包括部署需要的yaml文件,如果你可以访问Google的registry,那么整个实践还是比较容易的,否则可能会比较麻烦。

2.1 Heapster部署

heapster-deployment.yaml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: heapster
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: heapster
    spec:
      containers:
      - name: heapster
        image: gcr.io/google_containers/heapster-amd64:v1.3.0-beta.1
        imagePullPolicy: IfNotPresent
        command:
        - /heapster
        - --source=kubernetes:https://kubernetes.default
        - --sink=influxdb:http://monitoring-influxdb:8086

这里需要注意一个问题,如果你部署的Kubernetes集群没有创建证书,只有http的话,那heapster会因为证书找不到而启动失败,此时,可以将上面的 --source改掉:

- --source=kubernetes:http://192.168.56.101:8080/?inClusterConfig=false

即替换为你的http地址(注意将上面的IP和端口替换为你的集群的端口和地址,即执行kubectl cluster-info的输出地址),当然你也可以自己生成证书,可参考:https://github.com/kubernetes/heapster/blob/master/docs/source-configuration.md#current-sources

heapster-service.yaml文件:

apiVersion: v1
kind: Service
metadata:
  labels:
    task: monitoring
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: Heapster
  name: heapster
  namespace: kube-system
spec:
  ports:
  - port: 80
    targetPort: 8082
  selector:
    k8s-app: heapster

2.2 InfluxDB部署

influxdb-deployment.yaml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: monitoring-influxdb
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: influxdb
    spec:
      containers:
      - name: influxdb
        image: gcr.io/google_containers/heapster-influxdb-amd64:v1.1.1
        volumeMounts:
        - mountPath: /data
          name: influxdb-storage
      volumes:
      - name: influxdb-storage
        emptyDir: {}

influxdb-service.yaml:

apiVersion: v1
kind: Service
metadata:
  labels:
    task: monitoring
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-influxdb
  name: monitoring-influxdb
  namespace: kube-system
spec:
  ports:
  - port: 8086
    targetPort: 8086
  selector:
    k8s-app: influxdb

InfluxDB默认的端口是8086,用户名和密码都是"root"。当你登录到Grafana后,可以通过http://localhost:8086来访问InfluxDB。

2.3 Grafana部署

grafana-deployment.yaml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: monitoring-grafana
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: grafana
    spec:
      containers:
      - name: grafana
        image: gcr.io/google_containers/heapster-grafana-amd64:v4.0.2
        ports:
          - containerPort: 3000
            protocol: TCP
        volumeMounts:
        - mountPath: /var
          name: grafana-storage
        env:
        - name: INFLUXDB_HOST
          value: monitoring-influxdb
        - name: GRAFANA_PORT
          value: "3000"
          # The following env variables are required to make Grafana accessible via
          # the kubernetes api-server proxy. On production clusters, we recommend
          # removing these env variables, setup auth for grafana, and expose the grafana
          # service using a LoadBalancer or a public IP.
        - name: GF_AUTH_BASIC_ENABLED
          value: "false"
        - name: GF_AUTH_ANONYMOUS_ENABLED
          value: "true"
        - name: GF_AUTH_ANONYMOUS_ORG_ROLE
          value: Admin
        - name: GF_SERVER_ROOT_URL
          # If you're only using the API Server proxy, set this value instead:
          # value: /api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/
          value: /
      volumes:
      - name: grafana-storage
        emptyDir: {}

grafana-service.yaml:

apiVersion: v1
kind: Service
metadata:
  labels:
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-grafana
  name: monitoring-grafana
  namespace: kube-system
spec:
  # In a production setup, we recommend accessing Grafana through an external Loadbalancer
  # or through a public IP.
  # type: LoadBalancer
  # You could also use NodePort to expose the service at a randomly-generated port
  type: NodePort
  ports:
  - port: 80
    targetPort: 3000
  selector:
    k8s-app: grafana

Grafana的service文件里面默认使用的是LoadBalancer,但如果你的集群不支持的话,可以像我上面一样使用NodePort的方式。

三个应用部署好以后,我们可以通过Web去访问Grafana了:

Grafana.png

当然,如果你装了kubernetes-dashboard,你会发现dashboard上面也有资源使用的图示了。

实际使用中,我们可能一般不会使用诸如Grafana这种显示工具,而是通过InfluxDB的API去获取监控信息,然后在自己的系统里面展示监控。后面文章中我们会展示如何通过Heapster的API去获取这些监控信息。

参考:

  1. Resource Usage Monitoring
  2. https://github.com/kubernetes/heapster