livelock

最近在看的 《concurrency in go》 里有这么一个例子:

package main

import (
	"fmt"
	"sync"
	"sync/atomic"
	"time"
)

func main() {
	cadence := sync.NewCond(&sync.Mutex{})
	go func() {
		for range time.Tick(1 * time.Millisecond) {
			cadence.Broadcast()
		}
	}()

	takeStep := func() {
		cadence.L.Lock()
		cadence.Wait()
		cadence.L.Unlock()
	}

	tryDir := func(name string, dirName string, dir *int32) bool {
		fmt.Printf(name + " " + dirName + "\n")
		atomic.AddInt32(dir, 1) // <2>
		takeStep()              // <3>
		if atomic.LoadInt32(dir) == 1 {
			fmt.Printf("success")
			return true
		}
		takeStep()
		atomic.AddInt32(dir, -1) // <4>
		return false
	}

	var left, right int32
	tryLeft := func(name string) bool { return tryDir(name, "left", &left) }
	tryRight := func(name string) bool { return tryDir(name, "right", &right) }
	walk := func(wg *sync.WaitGroup, name string) {
		defer wg.Done()
		fmt.Printf("%v is trying to scoot:", name)
		for i := 0; i < 5; i++ { // <1>
			if tryLeft(name) || tryRight(name) { // <2>
				return
			}
		}
		fmt.Printf("\n%v tosses her hands up in exasperation!", name)
	}

	var wg sync.WaitGroup // <3>
	wg.Add(2)
	go walk(&wg, "Alice")
	go walk(&wg, "Barbara")
	wg.Wait()
}

书上的例子写的比较恶心,这里的代码我已经简化过了,结果输出:

Barbara is trying to scoot:Barbara left
Alice is trying to scoot:Alice left
Barbara right
Alice right
Barbara left
Alice left
Alice right
Barbara right
Alice left
Barbara left
Alice right
Barbara right
Alice left
Barbara left
Alice right
Barbara right
Alice left
Barbara left
Alice right
Barbara right

Barbara tosses her hands up in exasperation!
Alice tosses her hands up in exasperation!

模拟的是两个线程 or 协程反复抢两把锁的过程。其实这东西就类似于只能两排的华容道,两个迎面来的人,如果没有默认的规则(比如都走右边,或者其中一方您先请)的话,就可能会发生活锁。活锁的场景下服务看似正常运行,但实际上会有不少资源消耗在加锁解锁上。

作者这里为了能表现出这种“加锁失败,放弃锁,然后重新加锁”的过程,在代码里加了一个等待一段时间后唤醒的逻辑:

	go func() {
		for range time.Tick(1 * time.Millisecond) {
			cadence.Broadcast()
		}
	}()

两个 goroutine 会阻塞在

	takeStep := func() {
		cadence.L.Lock()
		cadence.Wait()
		cadence.L.Unlock()
	}

中的 cadence.Wait 这一步上。

如果没有这个东西的话,还真不太好复现。。活锁产生的原因上面也说了,就是并发场景下大家没有协调好加锁顺序,能产生活锁的另一个前提是获取锁用的是 trylock,所谓 trylock,就是 try 一下。。如果没 lock 住,就返回加锁失败。如果不是 trylock 的话,不一致的加锁顺序会直接导致死锁。死锁应该大多数人比较熟悉了,如果满足 Coffman's Rules,那么就会产生死锁,在 OS 课上应该都接触过:

Coffman identified four condition must hold simultaneously for deadlock to occur:-

1) Mutual Exclusion:-The resources involved are non-shareable.

2) Hold and Wait condition:-  processes already holding resources may request new resources

3) No-Preemptive Condition:- Processes that are allocated resources cannot be preempted.

4) Circular wait condition: two or more processes form a circular chain where each process waits for a resource that the next process in the chain holds

在 Go 里 sync 包没有可以使用的 trylock 的 API,如果程序员有 trylock 这种需求,我目前看到的解决方案是用 select 来实现:

lockChan := make(chan struct{}, 1)
select {
    case lockChan <- struct{}{}:
        return lockSucc
    default:
        return lockFail
}

如果已经发现了活锁导致的问题,解决手段很简单,只要规定好加锁顺序,并且大家都按同样的顺序去加锁就可以了。活锁比较麻烦的是难以发现,因为在活锁状态下的程序实际上看起来很正常,只是性能表现会稍微差一些。到现在为止,我是还没遇到过活锁导致的问题啊哈哈。。

当然了,这也是个比较经典的并发问题了,具体参见 wikipedia:
livelock

Xargin

Xargin

If you don't keep moving, you'll quickly fall behind
Beijing