1.13 正式发布了,Release notes 上说 defer 现在大多数情况下可以提升 30% 的性能。这 30% 的性能怎么来的呢?
我们知道,以前的 defer func 会被翻译成 deferproc 和 deferreturn 两个过程,这里
现在 deferproc 这一步增加了 deferprocStack 这个新过程,由编译器来选择使用 deferproc 还是 deferprocStack,当然了,既然官方说优化了大部分的使用场景,说明基本上大部分情况下,是会被编译到 deferprocStack 的。
// All other fields can contain junk.
// The defer record must be immediately followed in memory by
// the arguments of the defer.
// Nosplit because the arguments on the stack won't be scanned
// until the defer record is spliced into the gp._defer list.
//go:nosplit
func deferprocStack(d *_defer) {
gp := getg()
if gp.m.curg != gp {
// go code on the system stack can't defer
throw("defer on system stack")
}
// siz and fn are already set.
// The other fields are junk on entry to deferprocStack and
// are initialized here.
d.started = false
d.heap = false
d.sp = getcallersp()
d.pc = getcallerpc()
// The lines below implement:
// d.panic = nil
// d.link = gp._defer
// gp._defer = d
// But without write barriers. The first two are writes to
// the stack so they don't need a write barrier, and furthermore
// are to uninitialized memory, so they must not use a write barrier.
// The third write does not require a write barrier because we
// explicitly mark all the defer structures, so we don't need to
// keep track of pointers to them with a write barrier.
*(*uintptr)(unsafe.Pointer(&d._panic)) = 0
*(*uintptr)(unsafe.Pointer(&d.link)) = uintptr(unsafe.Pointer(gp._defer))
*(*uintptr)(unsafe.Pointer(&gp._defer)) = uintptr(unsafe.Pointer(d))
return0()
// No code can go here - the C return register has
// been set and must not be clobbered.
}
简单验证验证:
package main
func main() {
defer println(1)
}
0x003a 00058 (deferstack.go:4) LEAQ ""..autotmp_1+8(SP), AX
0x003f 00063 (deferstack.go:4) PCDATA $0, $0
0x003f 00063 (deferstack.go:4) MOVQ AX, (SP)
0x0043 00067 (deferstack.go:4) CALL runtime.deferprocStack(SB)
0x0048 00072 (deferstack.go:4) TESTL AX, AX
0x004a 00074 (deferstack.go:4) JNE 92
0x004c 00076 (deferstack.go:5) XCHGL AX, AX
原来的 deferproc 仍然存在,所以对应的 _defer
结构体上需要区分这个 defer 结构是在栈上还是堆上分配的:
type _defer struct {
siz int32 // includes both arguments and results
started bool
heap bool // 增加了这个新字段
sp uintptr // sp at time of defer
pc uintptr
fn *funcval
_panic *_panic // panic that is running defer
link *_defer
}
在没有 deferprocStack 之前,就是走 deferproc 的过程,虽然也有 deferpool,但是不够用的时候,肯定还是会有这么个东西:
d = (*_defer)(mallocgc(total, deferType, true))
社区里一直有人吐槽 defer 慢慢慢。所以这次相当于官方响应民意了。。
为什么没有把所有 defer 调用都优化成栈上分配呢?
case ODEFER:
d := callDefer
if n.Esc == EscNever {
d = callDeferStack
}
s.call(n.Left, d)
n.Esc 是 ast.Node 的逃逸分析结果,被修改为 EscNever 主要就是下面这段:
case ODEFER:
if e.loopdepth == 1 { // top level
n.Esc = EscNever // force stack allocation of defer record (see ssa.go)
break
}
怎么理解这个 loopdepth 呢?大概就是每增加一个 for 循环增加一吧,我们照这个思路仿照一个 defer 仍然分配在堆上的例子:
package main
import "fmt"
func main() {
for i := 0; i < 10; i++ {
defer func() {
for {
var a = make([]int, 128)
fmt.Println(a)
}
}()
}
}
go tool compile -S
0x0043 00067 (deferproc.go:7) PCDATA $0, $0
0x0043 00067 (deferproc.go:7) MOVQ AX, 8(SP)
0x0048 00072 (deferproc.go:7) CALL runtime.deferproc(SB)
0x004d 00077 (deferproc.go:7) TESTL AX, AX
0x004f 00079 (deferproc.go:7) JNE 83
0x0051 00081 (deferproc.go:7) JMP 33
0x0053 00083 (deferproc.go:7) XCHGL AX, AX
嗯,还是熟悉的味道。
然而在研究完之后才发现,其实也不用这么麻烦,直接去看官方的 test 就好了哈哈:这里