Go 1.13 defer 的变化

1.13 正式发布了,Release notes 上说 defer 现在大多数情况下可以提升 30% 的性能。这 30% 的性能怎么来的呢?

我们知道,以前的 defer func 会被翻译成 deferproc 和 deferreturn 两个过程,这里

现在 deferproc 这一步增加了 deferprocStack 这个新过程,由编译器来选择使用 deferproc 还是 deferprocStack,当然了,既然官方说优化了大部分的使用场景,说明基本上大部分情况下,是会被编译到 deferprocStack 的。

// All other fields can contain junk.
// The defer record must be immediately followed in memory by
// the arguments of the defer.
// Nosplit because the arguments on the stack won't be scanned
// until the defer record is spliced into the gp._defer list.
//go:nosplit
func deferprocStack(d *_defer) {
	gp := getg()
	if gp.m.curg != gp {
		// go code on the system stack can't defer
		throw("defer on system stack")
	}
	// siz and fn are already set.
	// The other fields are junk on entry to deferprocStack and
	// are initialized here.
	d.started = false
	d.heap = false
	d.sp = getcallersp()
	d.pc = getcallerpc()
	// The lines below implement:
	//   d.panic = nil
	//   d.link = gp._defer
	//   gp._defer = d
	// But without write barriers. The first two are writes to
	// the stack so they don't need a write barrier, and furthermore
	// are to uninitialized memory, so they must not use a write barrier.
	// The third write does not require a write barrier because we
	// explicitly mark all the defer structures, so we don't need to
	// keep track of pointers to them with a write barrier.
	*(*uintptr)(unsafe.Pointer(&d._panic)) = 0
	*(*uintptr)(unsafe.Pointer(&d.link)) = uintptr(unsafe.Pointer(gp._defer))
	*(*uintptr)(unsafe.Pointer(&gp._defer)) = uintptr(unsafe.Pointer(d))

	return0()
	// No code can go here - the C return register has
	// been set and must not be clobbered.
}

简单验证验证:

package main

func main() {
	defer println(1)
}
	0x003a 00058 (deferstack.go:4)	LEAQ	""..autotmp_1+8(SP), AX
	0x003f 00063 (deferstack.go:4)	PCDATA	$0, $0
	0x003f 00063 (deferstack.go:4)	MOVQ	AX, (SP)
	0x0043 00067 (deferstack.go:4)	CALL	runtime.deferprocStack(SB)
	0x0048 00072 (deferstack.go:4)	TESTL	AX, AX
	0x004a 00074 (deferstack.go:4)	JNE	92
	0x004c 00076 (deferstack.go:5)	XCHGL	AX, AX

原来的 deferproc 仍然存在,所以对应的 _defer 结构体上需要区分这个 defer 结构是在栈上还是堆上分配的:

type _defer struct {
	siz     int32 // includes both arguments and results
	started bool
	heap    bool // 增加了这个新字段
	sp      uintptr // sp at time of defer
	pc      uintptr
	fn      *funcval
	_panic  *_panic // panic that is running defer
	link    *_defer
}

在没有 deferprocStack 之前,就是走 deferproc 的过程,虽然也有 deferpool,但是不够用的时候,肯定还是会有这么个东西:

d = (*_defer)(mallocgc(total, deferType, true))

社区里一直有人吐槽 defer 慢慢慢。所以这次相当于官方响应民意了。。

为什么没有把所有 defer 调用都优化成栈上分配呢?

	case ODEFER:
		d := callDefer
		if n.Esc == EscNever {
			d = callDeferStack
		}
		s.call(n.Left, d)

n.Esc 是 ast.Node 的逃逸分析结果,被修改为 EscNever 主要就是下面这段:

	case ODEFER:
		if e.loopdepth == 1 { // top level
			n.Esc = EscNever // force stack allocation of defer record (see ssa.go)
			break
		}
		

怎么理解这个 loopdepth 呢?大概就是每增加一个 for 循环增加一吧,我们照这个思路仿照一个 defer 仍然分配在堆上的例子:

package main

import "fmt"

func main() {
	for i := 0; i < 10; i++ {
		defer func() {
			for {
				var a = make([]int, 128)
				fmt.Println(a)
			}
		}()
	}
}

go tool compile -S

	0x0043 00067 (deferproc.go:7)	PCDATA	$0, $0
	0x0043 00067 (deferproc.go:7)	MOVQ	AX, 8(SP)
	0x0048 00072 (deferproc.go:7)	CALL	runtime.deferproc(SB)
	0x004d 00077 (deferproc.go:7)	TESTL	AX, AX
	0x004f 00079 (deferproc.go:7)	JNE	83
	0x0051 00081 (deferproc.go:7)	JMP	33
	0x0053 00083 (deferproc.go:7)	XCHGL	AX, AX

嗯,还是熟悉的味道。

然而在研究完之后才发现,其实也不用这么麻烦,直接去看官方的 test 就好了哈哈:这里

Xargin

Xargin

If you don't keep moving, you'll quickly fall behind
Beijing