1. Prometheus 简介

以下是网上看到的简介:

Prometheus 是一套开源的系统监控报警框架。它启发于 Google 的 borgmon 监控系统,由工作在 SoundCloud 的 google 前员工在 2012 年创建,作为社区开源项目进行开发,并于 2015 年正式发布。2016 年,Prometheus 正式加入 Cloud Native Computing Foundation,成为受欢迎度仅次于 Kubernetes 的项目。

作为新一代的监控框架,Prometheus 具有以下特点:

  • 强大的多维度数据模型:
  • 时间序列数据通过 metric 名和键值对来区分。
  • 所有的 metrics 都可以设置任意的多维标签。
  • 数据模型更随意,不需要刻意设置为以点分隔的字符串。
  • 可以对数据模型进行聚合,切割和切片操作。
  • 支持双精度浮点类型,标签可以设为全 unicode。
  • 灵活而强大的查询语句(PromQL):在同一个查询语句,可以对多个 metrics 进行乘法、加法、连接、取分数位等操作。
  • 易于管理: Prometheus server 是一个单独的二进制文件,可直接在本地工作,不依赖于分布式存储。
  • 高效:平均每个采样点仅占 3.5 bytes,且一个 Prometheus server 可以处理数百万的 metrics。
  • 使用 pull 模式采集时间序列数据,这样不仅有利于本机测试而且可以避免有问题的服务器推送坏的 metrics。
  • 可以采用 push gateway 的方式把时间序列数据推送至 Prometheus server 端。
  • 可以通过服务发现或者静态配置去获取监控的 targets。
  • 有多种可视化图形界面。
  • 易于伸缩。
    需要指出的是,由于数据采集可能会有丢失,所以 Prometheus 不适用对采集数据要 100% 准确的情形。但如果用于记录时间序列数据,Prometheus 具有很大的查询优势,此外,Prometheus 适用于微服务的体系架构。

2. Prometheus部署

先生成prometheus的命名空间和prometheus配置文件:
prometheus_configmap.yaml:

---
apiVersion: v1
kind: Namespace
metadata:
  name: prometheus

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: prometheus
data:
  prometheus.yml: |
    global:
      scrape_interval:     15s
      evaluation_interval: 15s
    scrape_configs:

    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https

    - job_name: 'kubernetes-nodes'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics

    - job_name: kubernetes-node-exporters
      honor_timestamps: true
      scrape_interval: 30s
      scrape_timeout: 30s
      metrics_path: /metrics
      scheme: http
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - separator: ;
        regex: __meta_kubernetes_node_label_(.+)
        replacement: $1
        action: labelmap
      - source_labels: [__meta_kubernetes_node_name]
        separator: ;
        regex: (.+)
        target_label: __address__
        replacement: ${1}:9100
        action: replace

    - job_name: 'kubernetes-cadvisor'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name

    - job_name: 'kubernetes-services'
      kubernetes_sd_configs:
      - role: service
      metrics_path: /probe
      params:
        module: [http_2xx]
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox-exporter.example.com:9115
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        target_label: kubernetes_name

    - job_name: 'kubernetes-ingresses'
      kubernetes_sd_configs:
      - role: ingress
      relabel_configs:
      - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
        regex: (.+);(.+);(.+)
        replacement: ${1}://${2}${3}
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox-exporter.example.com:9115
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_ingress_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_ingress_name]
        target_label: kubernetes_name

    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name

创建prometheus所需的SA和RABC,Deployment等配置文件:
prometheus.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-serviceaccount
  namespace: prometheus

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus-serviceaccount
  namespace: prometheus

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    name: prometheus-deployment
  name: prometheus
  namespace: prometheus
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus-serviceaccount
      containers:
      - image: prom/prometheus:v2.13.1
        name: prometheus
        command:
        - "/bin/prometheus"
        args:
        - "--config.file=/prometheus/conf/prometheus.yml"
        - "--storage.tsdb.path=/prometheus/data"
        - "--storage.tsdb.retention=30d"
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: "/prometheus/data"
          name: data
        - mountPath: "/prometheus/conf"
          name: config-volume
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 5000m
            memory: 2500Mi
      volumes:
      - emptyDir: {}
        name: data
      - configMap:
          name: prometheus-config
        name: config-volume

---
apiVersion: v1
kind: Service
metadata:
  annotations:
  name: prometheus
  namespace: prometheus
  labels:
    name: prometheus-service
spec:
  ports:
  - name: http
    port: 9090
    protocol: TCP
    targetPort: 9090
  selector:
    app: prometheus
  sessionAffinity: None
  type: NodePort

部署prometheus:

[root@sh-saas-k8stest-master-dev-01 yaml]# kubectl apply -f prometheus_configmap.yaml 
namespace "prometheus" created
configmap "prometheus-config" created
[root@sh-saas-k8stest-master-dev-01 yaml]# kubectl apply -f prometheus.yaml 
serviceaccount "prometheus-serviceaccount" created
clusterrole.rbac.authorization.k8s.io "prometheus" created
clusterrolebinding.rbac.authorization.k8s.io "prometheus" created
deployment.extensions "prometheus" created
service "prometheus" created

部署成功,可以看看运行有没有什么问题:

[root@sh-saas-k8stest-master-dev-01 yaml]# kubectl get pod -n prometheus 
NAME                          READY     STATUS    RESTARTS   AGE
prometheus-77fd78b574-mrrz6   1/1       Running   0          10s
[root@sh-saas-k8stest-master-dev-01 yaml]# kubectl get pod -n prometheus  -o wide
NAME                          READY     STATUS    RESTARTS   AGE       IP            NODE
prometheus-77fd78b574-mrrz6   1/1       Running   0          17s       10.248.1.12   10.19.0.21
[root@sh-saas-k8stest-master-dev-01 yaml]# kubectl logs -n prometheus prometheus-77fd78b574-mrrz6 
level=warn ts=2019-11-01T09:27:44.501481603Z caller=main.go:295 deprecation_notice="\"storage.tsdb.retention\" flag is deprecated use \"storage.tsdb.retention.time\" instead."
level=info ts=2019-11-01T09:27:44.501537053Z caller=main.go:302 msg="Starting Prometheus" version="(version=2.7.1, branch=HEAD, revision=62e591f928ddf6b3468308b7ac1de1c63aa7fcf3)"
level=info ts=2019-11-01T09:27:44.501559636Z caller=main.go:303 build_context="(go=go1.11.5, user=root@f9f82868fc43, date=20190131-11:16:59)"
level=info ts=2019-11-01T09:27:44.501578409Z caller=main.go:304 host_details="(Linux 4.4.194-1.el7.elrepo.x86_64 #1 SMP Sat Sep 21 09:30:26 EDT 2019 x86_64 prometheus-77fd78b574-mrrz6 (none))"
level=info ts=2019-11-01T09:27:44.501597155Z caller=main.go:305 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-11-01T09:27:44.501615568Z caller=main.go:306 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-11-01T09:27:44.502652402Z caller=main.go:620 msg="Starting TSDB ..."
level=info ts=2019-11-01T09:27:44.502696381Z caller=web.go:416 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-11-01T09:27:44.507526594Z caller=main.go:635 msg="TSDB started"
level=info ts=2019-11-01T09:27:44.507566584Z caller=main.go:695 msg="Loading configuration file" filename=/prometheus/conf/prometheus.yml
level=info ts=2019-11-01T09:27:44.508687279Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-11-01T09:27:44.509311487Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-11-01T09:27:44.509814197Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-11-01T09:27:44.510303946Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-11-01T09:27:44.510798593Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-11-01T09:27:44.511231031Z caller=main.go:722 msg="Completed loading of configuration file" filename=/prometheus/conf/prometheus.yml
level=info ts=2019-11-01T09:27:44.511242928Z caller=main.go:589 msg="Server is ready to receive web requests."

找到node port:

[root@sh-saas-k8stest-master-dev-01 yaml]# kubectl get node -o wide
NAME         STATUS    ROLES     AGE       VERSION    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
10.19.0.11   Ready     master    14d       v1.10.13   <none>        CentOS Linux 7 (Core)   4.4.194-1.el7.elrepo.x86_64   docker://18.6.2
10.19.0.12   Ready     master    14d       v1.10.13   <none>        CentOS Linux 7 (Core)   4.4.194-1.el7.elrepo.x86_64   docker://18.6.2
10.19.0.21   Ready     node      14d       v1.10.13   <none>        CentOS Linux 7 (Core)   4.4.194-1.el7.elrepo.x86_64   docker://18.6.2
10.19.0.22   Ready     node      14d       v1.10.13   <none>        CentOS Linux 7 (Core)   4.4.194-1.el7.elrepo.x86_64   docker://18.6.2
[root@sh-saas-k8stest-master-dev-01 yaml]# kubectl get -n prometheus service prometheus -o wide
NAME         TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE       SELECTOR
prometheus   NodePort   10.248.255.13   <none>        9090:31304/TCP   6m        app=prometheus

现在可以通过http://10.19.0.21:31304/来访问了:

这个时候我们会发现,pods,ingress,deployment等监控数据都没有进来。这是因为这些监控是依赖另一个独立的应用:kube-state-metrics

3.部署kube-state-metrics

部署kube-state-metrics需要注意一下版本的兼容性:

我的k8s版本是1.10,这个版本可以使用kube-state-metrics 1.5的版本:

yaml被墙,从github上下载不下来,我干脆clone下来,clone可以,然后换到tag v1.5.0:

[root@sh-saas-k8stest-master-dev-01 yaml]# git clone https://github.com/kubernetes/kube-state-metrics.git
Cloning into 'kube-state-metrics'...
remote: Enumerating objects: 30, done.
remote: Counting objects: 100% (30/30), done.
remote: Compressing objects: 100% (22/22), done.
Receiving objects: 100% (16155/16155), 14.14 MiB | 194.00 KiB/s, done.
remote: Total 16155 (delta 10), reused 14 (delta 4), pack-reused 16125
Resolving deltas: 100% (9962/9962), done.
[root@sh-saas-k8stest-master-dev-01 yaml]# cd kube-state-metrics/
[root@sh-saas-k8stest-master-dev-01 kube-state-metrics]# git checkout 1.5.0
error: pathspec '1.5.0' did not match any file(s) known to git.
[root@sh-saas-k8stest-master-dev-01 kube-state-metrics]# git checkout v1.5.0
Note: checking out 'v1.5.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at c888603... Merge branch 'master' into release-1.5
[root@sh-saas-k8stest-master-dev-01 kube-state-metrics]# cd kubernetes/
[root@sh-saas-k8stest-master-dev-01 kubernetes]# ls
kube-state-metrics-cluster-role-binding.yaml  kube-state-metrics-deployment.yaml    kube-state-metrics-role.yaml             kube-state-metrics-service.yaml
kube-state-metrics-cluster-role.yaml          kube-state-metrics-role-binding.yaml  kube-state-metrics-service-account.yaml
[root@sh-saas-k8stest-master-dev-01 kubernetes]# 

k8s官方镜像库也被墙了,需要换成一个镜像地址,镜像在docker官方的仓库里,地址为:https://hub.docker.com/r/mirrorgooglecontainers. 这是k8s官方仓库的一个镜像.
修改后的kube-state-metrics-deployment.yaml文件如下:

apiVersion: apps/v1
# Kubernetes versions after 1.9.0 should use apps/v1
# Kubernetes versions before 1.8.0 should use apps/v1beta1 or extensions/v1beta1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: kube-state-metrics
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: mirrorgooglecontainers/kube-state-metrics:v1.5.0
        ports:
        - name: http-metrics
          containerPort: 8080
        - name: telemetry
          containerPort: 8081
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
      - name: addon-resizer
        image: mirrorgooglecontainers/addon-resizer:1.8.3
        resources:
          limits:
            cpu: 150m
            memory: 50Mi
          requests:
            cpu: 150m
            memory: 50Mi
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        command:
          - /pod_nanny
          - --container=kube-state-metrics
          - --cpu=100m
          - --extra-cpu=1m
          - --memory=100Mi
          - --extra-memory=2Mi
          - --threshold=5
          - --deployment=kube-state-metrics

然后再批量部署,就安装完成了:

[root@sh-saas-k8stest-master-dev-01 kubernetes]# kubectl apply -f ./
clusterrolebinding.rbac.authorization.k8s.io "kube-state-metrics" created
clusterrole.rbac.authorization.k8s.io "kube-state-metrics" created
deployment.apps "kube-state-metrics" created
rolebinding.rbac.authorization.k8s.io "kube-state-metrics" created
role.rbac.authorization.k8s.io "kube-state-metrics-resizer" created
serviceaccount "kube-state-metrics" created
service "kube-state-metrics" created

过一会就可以发现之前没有的监控指标都有了。

4. k8s节点node exporter部署

生成下面的daemonset配置文件:
vim node-exporter-ds.yml

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    k8s-app: node-exporter
    kubernetes.io/cluster-service: "true"
    version: v0.18.1
  name: node-exporter
  namespace: kube-system
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: node-exporter
      version: v0.18.1
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: node-exporter
        version: v0.18.1
    spec:
      containers:
      - args:
        - --log.level=info
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --path.rootfs=/rootfs
        - --collector.vmstat
        - --collector.vmstat.fields=.*
        - --collector.netstat
        - --collector.netstat.fields=.*
        - --collector.filesystem.ignored-mount-points=^/(proc|tmpfs|shm|sys|var/lib/docker/.+)($|/)
        - --collector.filesystem.ignored-fs-types=^(autofs|tmpfs|shm|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
        image: prom/node-exporter:v0.18.1
        imagePullPolicy: IfNotPresent
        name: prometheus-node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          name: metrics
          protocol: TCP
        resources:
          limits:
            memory: 50Mi
          requests:
            cpu: 100m
            memory: 50Mi
        securityContext:
          privileged: true
          runAsUser: 0
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /host/proc
          name: proc
          readOnly: true
        - mountPath: /host/sys
          name: sys
          readOnly: true
        - mountPath: /rootfs
          name: root
          readOnly: true
      dnsPolicy: ClusterFirst
      hostIPC: true
      hostNetwork: true
      hostPID: true
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: kubernetes.io/role
        value: master
      volumes:
      - hostPath:
          path: /proc
          type: ""
        name: proc
      - hostPath:
          path: /sys
          type: ""
        name: sys
      - hostPath:
          path: /
          type: ""
        name: root
  templateGeneration: 16
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

创建daemonset:

[root@sh-saas-k8stest-master-dev-01 prometheus]# kubectl apply -f node-exporter-ds.yml 
daemonset.extensions "node-exporter" created

过一会就能看到node exporter的监控信息了。