Go语言：pprof性能分析实战

最近在日历系统上线后做性能调优时大量使用了 Go 的 pprof 工具。这篇整理一下 pprof 的完整使用方法和实际优化案例。

接入 pprof

对于 HTTP 服务，最简单的方式是导入 net/http/pprof：

import _ "net/http/pprof"

func main() {
    // 如果已有 HTTP 服务，pprof 自动注册到 DefaultServeMux
    // 如果用了自定义 mux，需要单独起一个端口
    go func() {
        log.Println(http.ListenAndServe(":6060", nil))
    }()
    
    // ... 正常业务代码
}

之后访问 http://localhost:6060/debug/pprof/ 就能看到所有可用的 profile。

CPU Profile

CPU profile 是最常用的，用来找出哪些函数消耗了最多 CPU 时间。

采集 30 秒的 CPU profile：

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

进入交互界面后常用命令：

(pprof) top 20          # 显示CPU占用最高的20个函数
(pprof) top -cum 20     # 按累积时间排序
(pprof) list funcName   # 查看某函数的逐行CPU消耗
(pprof) web             # 在浏览器打开调用图（需要graphviz）

内存分析

# 查看当前堆内存使用
go tool pprof http://localhost:6060/debug/pprof/heap

# 查看累积内存分配（包括已释放的）
go tool pprof http://localhost:6060/debug/pprof/allocs

heap 和 allocs 的区别：

heap：当前活跃的内存分配，用来排查内存泄漏
allocs：从程序启动以来所有的分配，用来排查 GC 压力

(pprof) top -inuse_space     # 按当前占用内存排序
(pprof) top -alloc_space     # 按累积分配量排序
(pprof) top -alloc_objects   # 按分配次数排序

Goroutine 分析

go tool pprof http://localhost:6060/debug/pprof/goroutine

goroutine profile 用来排查：

goroutine 泄漏（数量持续增长）
死锁或长时间阻塞的 goroutine

在交互界面输入 top 可以看到 goroutine 最多的调用栈。也可以直接访问浏览器看文本版：

http://localhost:6060/debug/pprof/goroutine?debug=1  # 简要信息
http://localhost:6060/debug/pprof/goroutine?debug=2  # 完整栈信息

火焰图

pprof 内置了火焰图支持，不需要额外装工具：

go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

-http 参数会启动一个 Web UI，提供交互式火焰图、调用图等多种可视化。这是最直观的分析方式。

火焰图阅读方法：

x 轴是采样数量（不是时间线），宽度代表 CPU 占比
y 轴是调用栈深度，从下到上是调用关系
重点关注宽的色块——它们是 CPU 热点

实际优化案例

案例1：JSON 序列化热点

CPU profile 显示 encoding/json.Marshal 占了 30% 的 CPU。原因是每次 API 响应都在序列化大量事件对象。

优化方案：换用 github.com/json-iterator/go，并对频繁序列化的结构体做缓存。效果：CPU 占比从 30% 降到 12%。

案例2：字符串拼接导致的 GC 压力

allocs profile 显示一个日志格式化函数分配了大量小对象。代码中用 fmt.Sprintf 做字符串拼接：

// 优化前：每次调用分配新字符串
msg := fmt.Sprintf("[%s] %s: %s", time.Now().Format(time.RFC3339), level, message)

// 优化后：用 strings.Builder 复用 buffer
var buf strings.Builder
buf.WriteString("[")
buf.WriteString(time.Now().Format(time.RFC3339))
buf.WriteString("] ")
buf.WriteString(level)
buf.WriteString(": ")
buf.WriteString(message)
msg := buf.String()

更进一步，用 sync.Pool 复用 Builder：

var builderPool = sync.Pool{
    New: func() interface{} { return &strings.Builder{} },
}

func formatLog(level, message string) string {
    buf := builderPool.Get().(*strings.Builder)
    defer func() {
        buf.Reset()
        builderPool.Put(buf)
    }()
    buf.WriteString("[")
    buf.WriteString(time.Now().Format(time.RFC3339))
    buf.WriteString("] ")
    buf.WriteString(level)
    buf.WriteString(": ")
    buf.WriteString(message)
    return buf.String()
}

优化后 GC 频率降了约 40%。

案例3：Goroutine 泄漏

上线后发现 goroutine 数量在缓慢增长，从几百涨到了几千。goroutine profile 显示大量 goroutine 卡在 time.Sleep。

排查发现是提醒系统的一个旧代码路径——某个错误分支里起了 goroutine 做重试但没有退出条件。加上 context 超时后修复。

小结

pprof 是 Go 性能优化的瑞士军刀，CPU/内存/goroutine 分析一站式搞定。建议所有生产服务都接入 pprof（在独立端口上，不要暴露到公网），出问题时能快速定位。