分享
  1. 首页
  2. 文章

剖析golang map的实现

123archu · · 3103 次点击 · · 开始浏览
这是一个创建于 的文章,其中的信息可能已经有所发展或是发生改变。

[TOC]


本文参考的是golang 1.10源码实现。

golang中map是一个kv对集合。底层使用hash table,用链表来解决冲突,通过编译器配合runtime,所有的map对象都是共用一份代码。

对比其他语言
c++使用红黑树组织,性能稍低但是稳定性很好。使用模版在编译期生成代码,好处是效率高,但是缺点是代码膨胀、编译时间也会变长。

java使用的是hash table+链表/红黑树,当bucket内元素超过某个阈值时,该bucket的链表会转换成红黑树。java为了所有map共用一份代码,规定了只有Object的子类才能使用作为map的key,缺点是基础数据类型必须使用object包装一下才能使用map。

1. 函数选择

hash函数,有加密型和非加密型。加密型的一般用于加密数据、数字摘要等,典型代表就是md5、sha1、sha256、aes256这种;非加密型的一般就是查找。在map的应用场景中,用的是查找。选择hash函数主要考察的是两点:性能、碰撞概率。

具体hash函数的性能比较可以看:http://aras-p.info/blog/2016/08/09/More-Hash-Function-Tests/

golang使用的hash算法根据硬件选择,如果cpu支持aes,那么使用aes hash,否则使用memhash,memhash是参考xxhash、cityhash实现的,性能炸裂。

把hash值映射到buckte时,golang会把bucket的数量规整为2的次幂,而有m=2b,则n%m=n&(m-1),用位运算规避mod的昂贵代价。

2. 结构组成

首先我们看下map的结构:

// A header for a Go map.
type hmap struct {
 // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
 // Make sure this stays in sync with the compiler's definition.
 count int // # live cells == size of map. Must be first (used by len() builtin)
 flags uint8
 B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
 noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
 hash0 uint32 // hash seed
 buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
 oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
 nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
 extra *mapextra // optional fields
}
// mapextra holds fields that are not present on all maps.
type mapextra struct {
 // If both key and value do not contain pointers and are inline, then we mark bucket
 // type as containing no pointers. This avoids scanning such maps.
 // However, bmap.overflow is a pointer. In order to keep overflow buckets
 // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
 // overflow and oldoverflow are only used if key and value do not contain pointers.
 // overflow contains overflow buckets for hmap.buckets.
 // oldoverflow contains overflow buckets for hmap.oldbuckets.
 // The indirection allows to store a pointer to the slice in hiter.
 overflow *[]*bmap
 oldoverflow *[]*bmap
 // nextOverflow holds a pointer to a free overflow bucket.
 nextOverflow *bmap
}
// A bucket for a Go map.
type bmap struct {
 // tophash generally contains the top byte of the hash value
 // for each key in this bucket. If tophash[0] < minTopHash,
 // tophash[0] is a bucket evacuation state instead.
 tophash [bucketCnt]uint8
 // Followed by bucketCnt keys and then bucketCnt values.
 // NOTE: packing all the keys together and then all the values together makes the
 // code a bit more complicated than alternating key/value/key/value/... but it allows
 // us to eliminate padding which would be needed for, e.g., map[int64]int8.
 // Followed by an overflow pointer.
}

一个map主要是由三个结构构成:

  1. hmap --- map的最外层的数据结构,包括了map的各种基础信息、如大小、bucket。
  2. mapextra --- 记录map的额外信息,例如overflow bucket。
  3. bmap --- 代表bucket,每一个bucket最多放8个kv,最后由一个overflow字段指向下一个bmap,注意key、value、overflow字段都不显示定义,而是通过maptype计算偏移获取的。
hmap.001.png

其中hmap.extra.nextOverflow指向的是预分配的overflow bucket,预分配的用完了那么值就变成nil。
hmap.noverflow是overflow bucket的数量,当B小于16时是准确值,大于等于16时是大概的值。
hmap.count是当前map的元素个数,也就是len()返回的值。

2.1 设计原理

介绍完结构,我们就细说一下这么设计的原因。

2.1.1 bmap细节

在golang map中出现冲突时,不是每一个key都申请一个结构通过链表串起来,而是以bmap为最小粒度挂载,一个bmap可以放8个kv。这样减少对象数量,减轻管理内存的负担,利于gc。
如果插入时,bmap中key超过8,那么就会申请一个新的bmap(overflow bucket)挂在这个bmap的后面形成链表,优先用预分配的overflow bucket,如果预分配的用完了,那么就malloc一个挂上去。注意golang的map不会shrink,内存只会越用越多,overflow bucket中的key全删了也不会释放

hash值的高8位存储在bucket中的tophash字段。每个桶最多放8个kv对,所以tophash类型是数组[8]uint8。把高八位存储起来,这样不用完整比较key就能过滤掉不符合的key,加快查询速度。实际上当hash值的高八位小于常量minTopHash时,会加上minTopHash,区间[0, minTophash)的值用于特殊标记。查找key时,计算hash值,用hash值的高八位在tophash中查找,有tophash相等的,再去比较key值是否相同。

????? 这里我不太清楚,1.为啥小于minTopHash才加 2.为什么不是位运算而用加。 刚好top在[0,minHash),或着加上minHash之后溢出到这个区间,岂不是可能误判?

// tophash calculates the tophash value for hash.
func tophash(hash uintptr) uint8 {
 top := uint8(hash >> (sys.PtrSize*8 - 8))
 if top < minTopHash {
 top += minTopHash
 }
 return top
}

bmap中所有key存在一块,所有value存在一块,这样做方便内存对齐。
当key大于128字节时,bucket的key字段存储的会是指针,指向key的实际内容;value也是一样。

我们还知道golang中没有范型,为了支持map的范型,golang定义了一个maptype类型,定义了这类key用什么hash函数、bucket的大小、怎么比较之类的,通过这个变量来实现范型。

2.1.2 扩容设计

bcuket挂接的链表越来越长,性能会退化,那么就要进行扩容,扩大bucket的数量。

当元素个数/bucket个数大于等于6.5时,就会进行扩容,把bucket数量扩成原本的两倍,当hash表扩容之后,需要将那些老数据迁移到新table上(源代码中称之为evacuate), 数据搬迁不是一次性完成,而是逐步的完成(在insert和remove时进行搬移),这样就分摊了扩容的耗时。同时为了避免有个bucket一直访问不到导致扩容无法完成,还会进行一个顺序扩容,每次因为写操作搬迁对应bucket后,还会按顺序搬迁未搬迁的bucket,所以最差情况下n次写操作,就保证搬迁完大小为n的map。

扩容会建立一个大小是原来2倍的新的表,将旧的bucket搬到新的表中之后,并不会将旧的bucket从oldbucket中删除,而是加上一个已删除的标记。

只有当所有的bucket都从旧表移到新表之后,才会将oldbucket释放掉。 如果扩容过程中,阈值又超了呢?如果正在扩容,那么不会再进行扩容。

总体思路描述完,就看源码创建、查询、赋值、删除的具体实现。

3. 源码实现

3.1 创建

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
 if hint < 0 || hint > int(maxSliceCap(t.bucket.size)) {
 hint = 0
 }
 // initialize Hmap
 if h == nil {
 h = new(hmap)
 }
 h.hash0 = fastrand()
 // find size parameter which will hold the requested # of elements
 B := uint8(0)
 for overLoadFactor(hint, B) {
 B++
 }
 h.B = B
 // allocate initial hash table
 // if B == 0, the buckets field is allocated lazily later (in mapassign)
 // If hint is large zeroing this memory could take a while.
 if h.B != 0 {
 var nextOverflow *bmap
 h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
 if nextOverflow != nil {
 h.extra = new(mapextra)
 h.extra.nextOverflow = nextOverflow
 }
 }
 return h
}

hint是一个启发值,启发初建map时创建多少个bucket,如果hint是0那么就先不分配bucket,lazy分配。大概流程就是设置一下hash seed、bucket数量、实际申请bucket之类的,流程很简单。

然后我们在看下申请bucket实际干了啥:

// makeBucketArray initializes a backing array for map buckets.
// 1<<b is the minimum number of buckets to allocate.
// dirtyalloc should either be nil or a bucket array previously
// allocated by makeBucketArray with the same t and b parameters.
// If dirtyalloc is nil a new backing array will be alloced and
// otherwise dirtyalloc will be cleared and reused as backing array.
func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
 base := bucketShift(b)
 nbuckets := base
 // For small b, overflow buckets are unlikely.
 // Avoid the overhead of the calculation.
 if b >= 4 {
 // Add on the estimated number of overflow buckets
 // required to insert the median number of elements
 // used with this value of b.
 nbuckets += bucketShift(b - 4)
 sz := t.bucket.size * nbuckets
 up := roundupsize(sz)
 if up != sz {
 nbuckets = up / t.bucket.size
 }
 }
 if dirtyalloc == nil {
 buckets = newarray(t.bucket, int(nbuckets))
 } else {
 // dirtyalloc was previously generated by
 // the above newarray(t.bucket, int(nbuckets))
 // but may not be empty.
 buckets = dirtyalloc
 size := t.bucket.size * nbuckets
 if t.bucket.kind&kindNoPointers == 0 {
 memclrHasPointers(buckets, size)
 } else {
 memclrNoHeapPointers(buckets, size)
 }
 }
 if base != nbuckets {
 // We preallocated some overflow buckets.
 // To keep the overhead of tracking these overflow buckets to a minimum,
 // we use the convention that if a preallocated overflow bucket's overflow
 // pointer is nil, then there are more available by bumping the pointer.
 // We need a safe non-nil pointer for the last overflow bucket; just use buckets.
 nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
 last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
 last.setoverflow(t, (*bmap)(buckets))
 }
 return buckets, nextOverflow
}

默认创建2b个bucket,如果b大于等于4,那么就预先额外创建一些overflow bucket。除了最后一个overflow bucket,其余overflow bucket的overflow指针都是nil,最后一个overflow bucket的overflow指针指向bucket数组第一个元素,作为哨兵,说明到了到结尾了.

创建简单流程

3.2 查询

// mapaccess1 returns a pointer to h[key]. Never returns nil, instead
// it will return a reference to the zero object for the value type if
// the key is not in the map.
// NOTE: The returned pointer may keep the whole map live, so don't
// hold onto it for very long.
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
 if raceenabled && h != nil {
 callerpc := getcallerpc()
 pc := funcPC(mapaccess1)
 racereadpc(unsafe.Pointer(h), callerpc, pc)
 raceReadObjectPC(t.key, key, callerpc, pc)
 }
 if msanenabled && h != nil {
 msanread(key, t.key.size)
 }
 if h == nil || h.count == 0 {
 return unsafe.Pointer(&zeroVal[0])
 }
 if h.flags&hashWriting != 0 {
 throw("concurrent map read and map write")
 }
 alg := t.key.alg
 hash := alg.hash(key, uintptr(h.hash0))
 m := bucketMask(h.B)
 b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
 if c := h.oldbuckets; c != nil {
 if !h.sameSizeGrow() {
 // There used to be half as many buckets; mask down one more power of two.
 m >>= 1
 }
 oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
 if !evacuated(oldb) {
 b = oldb
 }
 }
 top := tophash(hash)
 for ; b != nil; b = b.overflow(t) {
 for i := uintptr(0); i < bucketCnt; i++ {
 if b.tophash[i] != top {
 continue
 }
 k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
 if t.indirectkey {
 k = *((*unsafe.Pointer)(k))
 }
 if alg.equal(key, k) {
 v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
 if t.indirectvalue {
 v = *((*unsafe.Pointer)(v))
 }
 return v
 }
 }
 }
 return unsafe.Pointer(&zeroVal[0])
}
  1. 先定位出bucket,如果正在扩容,并且这个bucket还没搬到新的hash表中,那么就从老的hash表中查找。

  2. 在bucket中进行顺序查找,使用高八位进行快速过滤,高八位相等,再比较key是否相等,找到就返回value。如果当前bucket找不到,就往下找overflow bucket,都没有就返回零值。

这里我们可以看到,访问的时候,并不进行扩容的数据搬迁。并且并发有写操作时抛异常

这里要注意的是,t.bucketsize并不是bmap的size,而是bmap加上存储key、value、overflow指针,所以查找bucket的时候时候用的不是bmap的szie。


查询简单流程

3.3 赋值

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
 if h == nil {
 panic(plainError("assignment to entry in nil map"))
 }
 if raceenabled {
 callerpc := getcallerpc()
 pc := funcPC(mapassign)
 racewritepc(unsafe.Pointer(h), callerpc, pc)
 raceReadObjectPC(t.key, key, callerpc, pc)
 }
 if msanenabled {
 msanread(key, t.key.size)
 }
 if h.flags&hashWriting != 0 {
 throw("concurrent map writes")
 }
 alg := t.key.alg
 hash := alg.hash(key, uintptr(h.hash0))
 // Set hashWriting after calling alg.hash, since alg.hash may panic,
 // in which case we have not actually done a write.
 h.flags |= hashWriting
 if h.buckets == nil {
 h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
 }
again:
 bucket := hash & bucketMask(h.B)
 if h.growing() {
 growWork(t, h, bucket)
 }
 b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
 top := tophash(hash)
 var inserti *uint8
 var insertk unsafe.Pointer
 var val unsafe.Pointer
 for {
 for i := uintptr(0); i < bucketCnt; i++ {
 if b.tophash[i] != top {
 if b.tophash[i] == empty && inserti == nil {
 inserti = &b.tophash[i]
 insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
 val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
 }
 continue
 }
 k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
 if t.indirectkey {
 k = *((*unsafe.Pointer)(k))
 }
 if !alg.equal(key, k) {
 continue
 }
 // already have a mapping for key. Update it.
 if t.needkeyupdate {
 typedmemmove(t.key, k, key)
 }
 val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
 goto done
 }
 ovf := b.overflow(t)
 if ovf == nil {
 break
 }
 b = ovf
 }
 // Did not find mapping for key. Allocate new cell & add entry.
 // If we hit the max load factor or we have too many overflow buckets,
 // and we're not already in the middle of growing, start growing.
 if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
 hashGrow(t, h)
 goto again // Growing the table invalidates everything, so try again
 }
 if inserti == nil {
 // all current buckets are full, allocate a new one.
 newb := h.newoverflow(t, b)
 inserti = &newb.tophash[0]
 insertk = add(unsafe.Pointer(newb), dataOffset)
 val = add(insertk, bucketCnt*uintptr(t.keysize))
 }
 // store new key/value at insert position
 if t.indirectkey {
 kmem := newobject(t.key)
 *(*unsafe.Pointer)(insertk) = kmem
 insertk = kmem
 }
 if t.indirectvalue {
 vmem := newobject(t.elem)
 *(*unsafe.Pointer)(val) = vmem
 }
 typedmemmove(t.key, insertk, key)
 *inserti = top
 h.count++
done:
 if h.flags&hashWriting == 0 {
 throw("concurrent map writes")
 }
 h.flags &^= hashWriting
 if t.indirectvalue {
 val = *((*unsafe.Pointer)(val))
 }
 return val
}
  1. hash表如果正在扩容,并且这次要操作的bucket还没搬到新hash表中,那么先进行搬迁(扩容细节下面细说)。

  2. 在buck中寻找key,同时记录下第一个空位置,如果找不到,那么就在空位置中插入数据;如果找到了,那么就更新对应的value;

  3. 找不到key就看下需不需要扩容,需要扩容并且没有正在扩容,那么就进行扩容,然后回到第一步。

  4. 找不到key,不需要扩容,但是没有空slot,那么就分配一个overflow bucket挂在链表结尾,用新bucket的第一个slot放存放数据。

3.4 删除

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
 if raceenabled && h != nil {
 callerpc := getcallerpc()
 pc := funcPC(mapdelete)
 racewritepc(unsafe.Pointer(h), callerpc, pc)
 raceReadObjectPC(t.key, key, callerpc, pc)
 }
 if msanenabled && h != nil {
 msanread(key, t.key.size)
 }
 if h == nil || h.count == 0 {
 return
 }
 if h.flags&hashWriting != 0 {
 throw("concurrent map writes")
 }
 alg := t.key.alg
 hash := alg.hash(key, uintptr(h.hash0))
 // Set hashWriting after calling alg.hash, since alg.hash may panic,
 // in which case we have not actually done a write (delete).
 h.flags |= hashWriting
 bucket := hash & bucketMask(h.B)
 if h.growing() {
 growWork(t, h, bucket)
 }
 b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
 top := tophash(hash)
search:
 for ; b != nil; b = b.overflow(t) {
 for i := uintptr(0); i < bucketCnt; i++ {
 if b.tophash[i] != top {
 continue
 }
 k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
 k2 := k
 if t.indirectkey {
 k2 = *((*unsafe.Pointer)(k2))
 }
 if !alg.equal(key, k2) {
 continue
 }
 // Only clear key if there are pointers in it.
 if t.indirectkey {
 *(*unsafe.Pointer)(k) = nil
 } else if t.key.kind&kindNoPointers == 0 {
 memclrHasPointers(k, t.key.size)
 }
 v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
 if t.indirectvalue {
 *(*unsafe.Pointer)(v) = nil
 } else if t.elem.kind&kindNoPointers == 0 {
 memclrHasPointers(v, t.elem.size)
 } else {
 memclrNoHeapPointers(v, t.elem.size)
 }
 b.tophash[i] = empty
 h.count--
 break search
 }
 }
 if h.flags&hashWriting == 0 {
 throw("concurrent map writes")
 }
 h.flags &^= hashWriting
}
  1. 如果正在扩容,并且操作的bucket还没搬迁完,那么搬迁bucket。

  2. 找出对应的key,如果key、value是包含指针的那么会清理指针指向的内存,否则不会回收内存。

3.5 扩容

首先通过赋值、删除流程,我们可以知道,触发扩容的是赋值、删除操作,具体判断要不要扩容的代码片段如下:


// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
func overLoadFactor(count int, B uint8) bool {
 return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}
// tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
// Note that most of these overflow buckets must be in sparse use;
// if use was dense, then we'd have already triggered regular map growth.
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
 // If the threshold is too low, we do extraneous work.
 // If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.
 // "too many" means (approximately) as many overflow buckets as regular buckets.
 // See incrnoverflow for more details.
 if B > 15 {
 B = 15
 }
 // The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
 return noverflow >= uint16(1)<<(B&15)
}
{
 ....
 // If we hit the max load factor or we have too many overflow buckets,
 // and we're not already in the middle of growing, start growing.
 if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
 hashGrow(t, h)
 goto again // Growing the table invalidates everything, so try again
 }
 ....
}

翻译一下代码,意思就是:

func overLoadFactor(countint, Buint8) bool {
 // return count>bucketCnt&&uintptr(count) >loadFactorNum*(bucketShift(B)/loadFactorDen)
 return 元素个数>8 && count>bucket数量*6.5
 其中loadFactorNum是常量13,loadFactorDen是常量2,所以是6.5
 bucket数量不算overflow bucket
}
​
func tooManyOverflowBuckets(noverflowuint16, Buint8) bool{
 if B > 15 {
 B=15
 }
 // The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
 return noverflow>=uint16(1)<<(B&15)
}
if (不是正在扩容 && (元素个数/bucket数超过某个值 || 太多overflow bucket)) {
 进行扩容
}

判断完扩容后,如果需要扩容,那么第一步需要做的,就是对hash表进行扩容:

//仅对hash表进行扩容,这里不进行搬迁
func hashGrow(t *maptype, h *hmap) {
 // If we've hit the load factor, get bigger.
 // Otherwise, there are too many overflow buckets,
 // so keep the same number of buckets and "grow" laterally.
 bigger := uint8(1)
 if !overLoadFactor(h.count+1, h.B) {
 bigger = 0
 h.flags |= sameSizeGrow
 }
 oldbuckets := h.buckets
 newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
 flags := h.flags &^ (iterator | oldIterator)
 if h.flags&iterator != 0 {
 flags |= oldIterator
 }
 // commit the grow (atomic wrt gc)
 h.B += bigger
 h.flags = flags
 h.oldbuckets = oldbuckets
 h.buckets = newbuckets
 h.nevacuate = 0
 h.noverflow = 0
 if h.extra != nil && h.extra.overflow != nil {
 // Promote current overflow buckets to the old generation.
 if h.extra.oldoverflow != nil {
 throw("oldoverflow is not nil")
 }
 h.extra.oldoverflow = h.extra.overflow
 h.extra.overflow = nil
 }
 if nextOverflow != nil {
 if h.extra == nil {
 h.extra = new(mapextra)
 }
 h.extra.nextOverflow = nextOverflow
 }
 // the actual copying of the hash table data is done incrementally
 // by growWork() and evacuate().
}

扩容的函数hashGrow其实仅仅是进行一些空间分配,字段的初始化,实际的搬迁操作是在growWork函数中

func growWork(t *maptype, h *hmap, bucket uintptr) {
 // make sure we evacuate the oldbucket corresponding
 // to the bucket we're about to use
 evacuate(t, h, bucket&h.oldbucketmask())
 // evacuate one more oldbucket to make progress on growing
 if h.growing() {
 evacuate(t, h, h.nevacuate)
 }
}

evacuate是进行具体搬迁某个bucket的函数,可以看出growWork会搬迁两个bucket,一个是入参bucket;另一个是h.nevacuate。这个nevacuate是一个顺序累加的值。可以想想如果每次仅仅搬迁进行写操作(赋值/删除)的bucket,那么有可能某些bucket就是一直没有机会访问到,那么扩容就一直没法完成,总是在扩容中的状态,因此会额外进行一次顺序迁移,理论上,有N个old bucket,最多N次写操作,那么必定会搬迁完。

然后我们再看下evacuate具体的实现

func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
 b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
 newbit := h.noldbuckets()
 if !evacuated(b) {
 // TODO: reuse overflow buckets instead of using new ones, if there
 // is no iterator using the old buckets. (If !oldIterator.)
 // xy contains the x and y (low and high) evacuation destinations.
 var xy [2]evacDst
 x := &xy[0]
 x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
 x.k = add(unsafe.Pointer(x.b), dataOffset)
 x.v = add(x.k, bucketCnt*uintptr(t.keysize))
 if !h.sameSizeGrow() {
 // Only calculate y pointers if we're growing bigger.
 // Otherwise GC can see bad pointers.
 y := &xy[1]
 y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
 y.k = add(unsafe.Pointer(y.b), dataOffset)
 y.v = add(y.k, bucketCnt*uintptr(t.keysize))
 }
 for ; b != nil; b = b.overflow(t) {
 k := add(unsafe.Pointer(b), dataOffset)
 v := add(k, bucketCnt*uintptr(t.keysize))
 for i := 0; i < bucketCnt; i, k, v = i+1, add(k, uintptr(t.keysize)), add(v, uintptr(t.valuesize)) {
 top := b.tophash[I]
 if top == empty {
 b.tophash[i] = evacuatedEmpty
 continue
 }
 if top < minTopHash {
 throw("bad map state")
 }
 k2 := k
 if t.indirectkey {
 k2 = *((*unsafe.Pointer)(k2))
 }
 var useY uint8
 if !h.sameSizeGrow() {
 // Compute hash to make our evacuation decision (whether we need
 // to send this key/value to bucket x or bucket y).
 hash := t.key.alg.hash(k2, uintptr(h.hash0))
 if h.flags&iterator != 0 && !t.reflexivekey && !t.key.alg.equal(k2, k2) {
 // If key != key (NaNs), then the hash could be (and probably
 // will be) entirely different from the old hash. Moreover,
 // it isn't reproducible. Reproducibility is required in the
 // presence of iterators, as our evacuation decision must
 // match whatever decision the iterator made.
 // Fortunately, we have the freedom to send these keys either
 // way. Also, tophash is meaningless for these kinds of keys.
 // We let the low bit of tophash drive the evacuation decision.
 // We recompute a new random tophash for the next level so
 // these keys will get evenly distributed across all buckets
 // after multiple grows.
 useY = top & 1
 top = tophash(hash)
 } else {
 if hash&newbit != 0 {
 useY = 1
 }
 }
 }
 if evacuatedX+1 != evacuatedY {
 throw("bad evacuatedN")
 }
 b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
 dst := &xy[useY] // evacuation destination
 if dst.i == bucketCnt {
 dst.b = h.newoverflow(t, dst.b)
 dst.i = 0
 dst.k = add(unsafe.Pointer(dst.b), dataOffset)
 dst.v = add(dst.k, bucketCnt*uintptr(t.keysize))
 }
 dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
 if t.indirectkey {
 *(*unsafe.Pointer)(dst.k) = k2 // copy pointer
 } else {
 typedmemmove(t.key, dst.k, k) // copy value
 }
 if t.indirectvalue {
 *(*unsafe.Pointer)(dst.v) = *(*unsafe.Pointer)(v)
 } else {
 typedmemmove(t.elem, dst.v, v)
 }
 dst.i++
 // These updates might push these pointers past the end of the
 // key or value arrays. That's ok, as we have the overflow pointer
 // at the end of the bucket to protect against pointing past the
 // end of the bucket.
 dst.k = add(dst.k, uintptr(t.keysize))
 dst.v = add(dst.v, uintptr(t.valuesize))
 }
 }
 // Unlink the overflow buckets & clear key/value to help GC.
 if h.flags&oldIterator == 0 && t.bucket.kind&kindNoPointers == 0 {
 b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
 // Preserve b.tophash because the evacuation
 // state is maintained there.
 ptr := add(b, dataOffset)
 n := uintptr(t.bucketsize) - dataOffset
 memclrHasPointers(ptr, n)
 }
 }
 if oldbucket == h.nevacuate {
 advanceEvacuationMark(h, t, newbit)
 }
}

在advanceEvacuationMark中进行nevacuate的累加,遇到已经迁移的bucket会继续累加,一次最多加1024。


有疑问加站长微信联系(非本文作者)

本文来自:简书

感谢作者:123archu

查看原文:剖析golang map的实现

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

关注微信
3103 次点击
暂无回复
添加一条新回复 (您需要 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传

用户登录

没有账号?注册
(追記) (追記ここまで)

今日阅读排行

    加载中
(追記) (追記ここまで)

一周阅读排行

    加载中

关注我

  • 扫码关注领全套学习资料 关注微信公众号
  • 加入 QQ 群:
    • 192706294(已满)
    • 731990104(已满)
    • 798786647(已满)
    • 729884609(已满)
    • 977810755(已满)
    • 815126783(已满)
    • 812540095(已满)
    • 1006366459(已满)
    • 692541889

  • 关注微信公众号
  • 加入微信群:liuxiaoyan-s,备注入群
  • 也欢迎加入知识星球 Go粉丝们(免费)

给该专栏投稿 写篇新文章

每篇文章有总共有 5 次投稿机会

收入到我管理的专栏 新建专栏