监控方案
监控方案¶
监控方案实现, 监控方案主要分以下几点来设计;监控指标采集; 监控数据收集; 监控数据存储; 指标查询
监控指标采集¶
在不同环境, 不通阶段采集需要的监控指标;
Node-Exportor¶
github 地址: https://github.com/prometheus/node_exporter
Node-Exportor 是使用Go 语言开发的 采集器, 可以通过 Node-Exportor 来采集硬件信息和系统指标;
安装¶
1、 linux 用户, 通过二进制安装
# NOTE: Replace the URL with one from the above mentioned "downloads" page.
# <VERSION>, <OS>, and <ARCH> are placeholders.
$ wget https://github.com/prometheus/node_exporter/releases/download/v<VERSION>/node_exporter-<VERSION>.<OS>-<ARCH>.tar.gz
$ tar xvfz node_exporter-*.*-amd64.tar.gz
$ cd node_exporter-*.*-amd64
$ ./node_exporter
INFO[0000] Starting node_exporter (version=0.16.0, branch=HEAD, revision=d42bd70f4363dced6b77d8fc311ea57b63387e4f) source="node_exporter.go:82"
INFO[0000] Build context (go=go1.9.6, user=root@a67a9bc13a69, date=20180515-15:53:28) source="node_exporter.go:83"
INFO[0000] Enabled collectors: source="node_exporter.go:90"
INFO[0000] - boottime source="node_exporter.go:97"
...
INFO[0000] Listening on :9100 source="node_exporter.go:111"
运行成功之后, 可以访问9100 端口, 获取采集器采集到的指标信息
$ curl http://localhost:9100/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.8996e-05
go_gc_duration_seconds{quantile="0.25"} 4.5926e-05
go_gc_duration_seconds{quantile="0.5"} 5.846e-05
# etc.
过滤出node 相关的信息
$ curl http://localhost:9100/metrics | grep "node_"
2、 通过Ansible 部署 Node-Exportor
参考链接:https://github.com/prometheus-community/ansible
3、 通过docker 容器部署, 这种可以应用于k8s 集群, 采集节点的监控指标; 需要将根磁盘挂载进容器的/host目录
docker run -d \
--net="host" \
--pid="host" \
-v "/:/host:ro,rslave" \
quay.io/prometheus/node-exporter:latest \
--path.rootfs=/host
docker-compose写法
---
version: '3.8'
services:
node_exporter:
image: quay.io/prometheus/node-exporter:latest
container_name: node_exporter
command:
- '--path.rootfs=/host'
network_mode: host
pid: host
restart: unless-stopped
volumes:
- '/:/host:ro,rslave'
Prometheus 采集收集指标方式, 采集本地 localhost:9100
的数据
### $ cat prometheus.yml
global:
scrape_interval: 15s # 拉取周期, 15s
scrape_configs:
- job_name: node # 任务名字
static_configs:
- targets: ['localhost:9100']
下载prometheus 的二进制, 安装并运行
$ wget https://github.com/prometheus/prometheus/releases/download/v*/prometheus-*.*-amd64.tar.gz
$ tar xvf prometheus-*.*-amd64.tar.gz
$ cd prometheus-*.*
直接指定prometheus 的配置文件并且运行 Prometheus ;
$ ./prometheus --config.file=./prometheus.yml
Nginx-exportor¶
针对Nginx 服务, 用于采集nginx 的监控指标信息, 实现原理是通过nginx 暴露一个访问端口出来, 供其Prometheus 采集端拉取;
下面类似是openresty 的指标采集方案的实现
安装¶
安装此库, 需要安装ngx_lua 模块, 可以简单的安装 libnginx-mod-http-lua
库文件 - prometheus.lua
- 需要在 中 LUA_PATH
可用。如果这是您使用的唯一 Lua 库,则只需指向 lua_package_path
签出此 git 存储库的目录(请参阅下面的示例)。
openresty 在 opm 中可用, 他可以通过luarocks.获取
$ opm get knyar/nginx-lua-prometheus
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US.UTF-8",
LC_ALL = "en_US.UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
* Fetching knyar/nginx-lua-prometheus
Downloading https://opm.openresty.org/api/pkg/tarball/knyar/nginx-lua-prometheus-0.20230607.opm.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 19265 100 19265 0 0 7056 0 0:00:02 0:00:02 --:--:-- 7056
Package knyar/nginx-lua-prometheus 0.20230607 installed successfully under /usr/local/openresty/site/ .
查看文件是否存在:
$ ls /usr/local/openresty/site/lualib/prometheus.lua
/usr/local/openresty/site/lualib/prometheus.lua
配置使用¶
1、 先在nginx.conf 中的 http 模块中加入如下配置
lua_shared_dict prometheus_metrics 10M;
lua_package_path "/usr/local/openresty/site/lualib/?.lua;;";
init_worker_by_lua_block {
prometheus = require("prometheus").init("prometheus_metrics")
metric_requests = prometheus:counter(
"nginx_http_requests_total", "Number of HTTP requests", {"host", "status", "request_uri", "request_method"})
metric_latency = prometheus:histogram(
"nginx_http_request_duration_seconds", "HTTP request latency", {"host", "status", "request_uri", "request_method"})
metric_connections = prometheus:gauge(
"nginx_http_connections", "Number of HTTP connections", {"state"})
}
log_by_lua_block {
metric_requests:inc(1, {ngx.var.server_name, ngx.var.status, ngx.var.request_uri, ngx.var.request_method})
metric_latency:observe(tonumber(ngx.var.request_time), {ngx.var.server_name, ngx.var.status, ngx.var.request_uri, ngx.var.request_method})
} # https://cdn.console.aliyun.com/domain/detail/cdn.bigquant.com/backSrc
2、配置监控使用的server
server {
listen 9145;
server_name _;
allow 10.160.0.0/16;
allow 10.96.0.0/12;
deny all;
location /metrics {
content_by_lua_block {
metric_connections:set(ngx.var.connections_reading, {"reading"})
metric_connections:set(ngx.var.connections_waiting, {"waiting"})
metric_connections:set(ngx.var.connections_writing, {"writing"})
prometheus:collect()
}
}
}
Mysql-Exportor¶
Redis-Exportor¶
Kubernetes 中组件的Metrics¶
参考链接:
https://github.com/prometheus-operator/kube-prometheus
https://github.com/prometheus-operator/prometheus-operator