跳转至

监控方案

监控方案

监控方案实现, 监控方案主要分以下几点来设计;监控指标采集; 监控数据收集; 监控数据存储; 指标查询

监控指标采集

在不同环境, 不通阶段采集需要的监控指标;

Node-Exportor

github 地址: https://github.com/prometheus/node_exporter

Node-Exportor 是使用Go 语言开发的 采集器, 可以通过 Node-Exportor 来采集硬件信息和系统指标;

安装

1、 linux 用户, 通过二进制安装

# NOTE: Replace the URL with one from the above mentioned "downloads" page.
# <VERSION>, <OS>, and <ARCH> are placeholders.
$ wget https://github.com/prometheus/node_exporter/releases/download/v<VERSION>/node_exporter-<VERSION>.<OS>-<ARCH>.tar.gz
$ tar xvfz node_exporter-*.*-amd64.tar.gz
$ cd node_exporter-*.*-amd64
$ ./node_exporter
INFO[0000] Starting node_exporter (version=0.16.0, branch=HEAD, revision=d42bd70f4363dced6b77d8fc311ea57b63387e4f)  source="node_exporter.go:82"
INFO[0000] Build context (go=go1.9.6, user=root@a67a9bc13a69, date=20180515-15:53:28)  source="node_exporter.go:83"
INFO[0000] Enabled collectors:                           source="node_exporter.go:90"
INFO[0000]  - boottime                                   source="node_exporter.go:97"
...
INFO[0000] Listening on :9100                            source="node_exporter.go:111"

运行成功之后, 可以访问9100 端口, 获取采集器采集到的指标信息

$ curl http://localhost:9100/metrics

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.8996e-05
go_gc_duration_seconds{quantile="0.25"} 4.5926e-05
go_gc_duration_seconds{quantile="0.5"} 5.846e-05
# etc.

过滤出node 相关的信息

$ curl http://localhost:9100/metrics | grep "node_"

2、 通过Ansible 部署 Node-Exportor

参考链接:https://github.com/prometheus-community/ansible

3、 通过docker 容器部署, 这种可以应用于k8s 集群, 采集节点的监控指标; 需要将根磁盘挂载进容器的/host目录

docker run -d \
  --net="host" \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  quay.io/prometheus/node-exporter:latest \
  --path.rootfs=/host

docker-compose写法

---
version: '3.8'

services:
  node_exporter:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter
    command:
      - '--path.rootfs=/host'
    network_mode: host
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

Prometheus 采集收集指标方式, 采集本地 localhost:9100 的数据

### $ cat prometheus.yml
global:
  scrape_interval: 15s # 拉取周期, 15s

scrape_configs:
- job_name: node       # 任务名字
  static_configs:
  - targets: ['localhost:9100']

下载prometheus 的二进制, 安装并运行

$ wget https://github.com/prometheus/prometheus/releases/download/v*/prometheus-*.*-amd64.tar.gz
$ tar xvf prometheus-*.*-amd64.tar.gz
$ cd prometheus-*.*

直接指定prometheus 的配置文件并且运行 Prometheus ;

$ ./prometheus --config.file=./prometheus.yml

Nginx-exportor

针对Nginx 服务, 用于采集nginx 的监控指标信息, 实现原理是通过nginx 暴露一个访问端口出来, 供其Prometheus 采集端拉取;

下面类似是openresty 的指标采集方案的实现

安装

安装此库, 需要安装ngx_lua 模块, 可以简单的安装 libnginx-mod-http-lua

库文件 - prometheus.lua - 需要在 中 LUA_PATH 可用。如果这是您使用的唯一 Lua 库,则只需指向 lua_package_path 签出此 git 存储库的目录(请参阅下面的示例)。

openresty 在 opm 中可用, 他可以通过luarocks.获取

$ opm get knyar/nginx-lua-prometheus
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US.UTF-8",
        LC_ALL = "en_US.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
* Fetching knyar/nginx-lua-prometheus  
  Downloading https://opm.openresty.org/api/pkg/tarball/knyar/nginx-lua-prometheus-0.20230607.opm.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 19265  100 19265    0     0   7056      0  0:00:02  0:00:02 --:--:--  7056
Package knyar/nginx-lua-prometheus 0.20230607 installed successfully under /usr/local/openresty/site/ .

查看文件是否存在:

$ ls /usr/local/openresty/site/lualib/prometheus.lua
/usr/local/openresty/site/lualib/prometheus.lua
配置使用

1、 先在nginx.conf 中的 http 模块中加入如下配置

    lua_shared_dict prometheus_metrics 10M; 
    lua_package_path "/usr/local/openresty/site/lualib/?.lua;;";

    init_worker_by_lua_block {
        prometheus = require("prometheus").init("prometheus_metrics")

        metric_requests = prometheus:counter(
          "nginx_http_requests_total", "Number of HTTP requests", {"host", "status", "request_uri", "request_method"})
        metric_latency = prometheus:histogram(
          "nginx_http_request_duration_seconds", "HTTP request latency", {"host", "status", "request_uri", "request_method"})
        metric_connections = prometheus:gauge(
          "nginx_http_connections", "Number of HTTP connections", {"state"})
    }

    log_by_lua_block {
      metric_requests:inc(1, {ngx.var.server_name, ngx.var.status, ngx.var.request_uri, ngx.var.request_method})
      metric_latency:observe(tonumber(ngx.var.request_time), {ngx.var.server_name, ngx.var.status, ngx.var.request_uri, ngx.var.request_method})
    } # https://cdn.console.aliyun.com/domain/detail/cdn.bigquant.com/backSrc

2、配置监控使用的server

server {
  listen 9145;
  server_name _;
  allow 10.160.0.0/16;
  allow 10.96.0.0/12;
  deny all;
  location /metrics {
    content_by_lua_block {
      metric_connections:set(ngx.var.connections_reading, {"reading"})
      metric_connections:set(ngx.var.connections_waiting, {"waiting"})
      metric_connections:set(ngx.var.connections_writing, {"writing"})
      prometheus:collect()
    }
  }
}

Mysql-Exportor

Redis-Exportor

Kubernetes 中组件的Metrics

参考链接:

https://github.com/prometheus-operator/kube-prometheus

https://github.com/prometheus-operator/prometheus-operator