Prometheus 是目前(2022 年)最流行的监控告警方案。它针对的重点是:
- 指标采集
- 聚合分析
- 告警
它并不适合作为日志存储,因为它不会存储所有的采样,仅是把采样做了聚合和快照。它的设计考虑了可靠性(不采用分布式存储,单 server 自治),但没有考虑数据的 100% 准确性。
Prometheus 的 官方文档 非常的精练,值得一看。
Features
- a multi-dimensional data model with time series data identified by metric name and key/value pairs
- PromQL, a flexible query language to leverage this dimensionality
- no reliance on distributed storage; single server nodes are autonomous
- time series collection happens via a pull model over HTTP
- pushing time series is supported via an intermediary gateway
- targets are discovered via service discovery or static configuration
- multiple modes of graphing and dashboarding support
Architecture
图中橙色部分是 Prometheus 提供的组件。
Prometheus 使用了 拉模型,对于不同类型的 target:
- 可以修改代码集成 Prometheus SDK 的应用,集成后提供 HTTP 接口(一般为
/metrics
)即可(图中的 Jobs) - 短时任务,或者想使用推而不是拉的任务,可以将 metrics 推到 Pushgateway
- 无法改动代码的应用或者场景,比如采集机器指标、采集 HaProxy 指标等,需要用专门的 exporter 来生成指标。这些 exporter 会作为桥梁,读取相应的信息(比如读机器指标、读 HaProxy 日志等),并转换成 metrics 供拉取
其他组件应该比较好理解。
Components
The Prometheus ecosystem consists of multiple components, many of which are optional:
- the main Prometheus server which scrapes and stores time series data
- client libraries for instrumenting application code
- a push gateway for supporting short-lived jobs
- special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
- an alertmanager to handle alerts
- various support tools
Most Prometheus components are written in Go, making them easy to build and deploy as static binaries.