监控

Kitex 框架内置了监控能力，但是本身不带任何监控打点，通过接口的方式进行扩展。

自定义监控

框架提供了 Tracer 接口，用户可以根据需求实现该接口，并通过 WithTracer Option 来注入监控的具体实现。

// Tracer is executed at the start and finish of an RPC.
type Tracer interface {
    Start(ctx context.Context) context.Context
    Finish(ctx context.Context)
}

详细文档请阅读监控拓展章节。

拓展库使用

kitex-contrib 中也提供了两种监控拓展 monitor-prometheus 与 obs-opentelemetry ，它们分别集成了 Prometheus 与 OpenTelemetry 的监控拓展，前者更贴合 Prometheus 生态，使用也比较简单方便，而后者使用起来更灵活。

Prometheus

拓展库 monitor-prometheus 中提供了 Prometheus 的监控扩展。

使用方式

Client

import (
    "github.com/kitex-contrib/monitor-prometheus"
    kClient "github.com/cloudwego/kitex/client"
)

...
	client, _ := testClient.NewClient(
	"DestServiceName",
	kClient.WithTracer(prometheus.NewClientTracer(":9091", "/kitexclient")))

	resp, _ := client.Send(ctx, req)
...

Server

import (
    "github.com/kitex-contrib/monitor-prometheus"
    kServer "github.com/cloudwego/kitex/server"
)

func main() {
...
	svr := xxxservice.NewServer(
	    &myServiceImpl{},
	    kServer.WithTracer(prometheus.NewServerTracer(":9092", "/kitexserver")))
	svr.Run()
...
}

Metrics

Client

名称	单位	Tags	描述
`kitex_client_throughput`	-	type, caller, callee, method, status	Client 端处理的请求总数
`kitex_client_latency_us`	us	type, caller, callee, method, status	Client 端请求处理耗时（收到应答时间 - 发起请求时间，单位 us）

Server

名称	单位	Tags	描述
`kitex_server_throughput`	-	type, caller, callee, method, status	Server 端处理的请求总数
`kitex_server_latency_us`	us	type, caller, callee, method, status	Server 端请求处理耗时（处理完请求时间 - 收到请求时间，单位 us）

基于以上 metrics 可以实现更多复杂的数据监控，使用示例看参考 Useful Examples 。

Runtime Metrics

该库依赖于 prometheus/client_golang，支持其自带的 runtime metrics，详细内容请参考 WithGoCollectorRuntimeMetrics

OpenTelemetry

拓展库 obs-opentelemetry 中提供了 opentelemetry 的监控拓展。

使用方式

有关 obs-opentelemetry 的使用方式请查看 tracing 章节。

Metrics

Server

名称	指标数据模型	单位	单位(UCUM)	描述
`rpc.server.duration`	Histogram	milliseconds	`ms`	测量请求RPC的持续时间

Client

名称	指标数据模型	单位	单位(UCUM)	描述
`rpc.server.duration`	Histogram	milliseconds	`ms`	测量请求RPC的持续时间

通过 rpc.server.duration 可以计算更多的服务指标，如 R.E.D (Rate, Errors, Duration)，具体示例可参考此处。

Runtime Metrics

基于 opentelemetry-go，支持以下 runtime metrics：

名称	指标数据模型	单位	单位(UCUM)	描述
`process.runtime.go.cgo.calls`	Sum	-	-	当前进程调用的cgo数量
`process.runtime.go.gc.count`	Sum	-	-	已完成的 gc 周期的数量
`process.runtime.go.gc.pause_ns`	Histogram	nanosecond	`ns`	在GC stop-the-world 中暂停的纳秒数量
`process.runtime.go.gc.pause_total_ns`	Histogram	nanosecond	`ns`	自程序启动以来，GC stop-the-world 的累计微秒计数
`process.runtime.go.goroutines`	Gauge	-	-	协程数量
`process.runtime.go.lookups`	Sum	-	-	运行时执行的指针查询的数量
`process.runtime.go.mem.heap_alloc`	Gauge	bytes	`bytes`	分配的堆对象的字节数
`process.runtime.go.mem.heap_idle`	Gauge	bytes	`bytes`	空闲（未使用）的堆内存
`process.runtime.go.mem.heap_inuse`	Gauge	bytes	`bytes`	已使用的堆内存
`process.runtime.go.mem.heap_objects`	Gauge	-	-	已分配的堆对象数量
`process.runtime.go.mem.live_objects`	Gauge	-	-	存活对象数量(Mallocs - Frees)
`process.runtime.go.mem.heap_released`	Gauge	bytes	`bytes`	已交还给操作系统的堆内存
`process.runtime.go.mem.heap_sys`	Gauge	bytes	`bytes`	从操作系统获得的堆内存
`runtime.uptime`	Sum	ms	`ms`	自应用程序被初始化以来的毫秒数

反馈

当前页面对你有帮助吗？

请告诉我们如何改进.

最后修改 June 26, 2024 : chore: add construct logo to cloudwego user (#1099) (3db8967)