剖析golang slice的实现

123archu · · 1678 次点击 · · 开始浏览

这是一个创建于的文章,其中的信息可能已经有所发展或是发生改变。

本文基于golang 1.10版本分析。

slice 结构

slice实际就是一个struct,在runtime/slice.go中的定义如下:

type slice struct {
 array unsafe.Pointer
 len int 
 cap int 
}
// An notInHeapSlice is a slice backed by go:notinheap memory.
type notInHeapSlice struct {
 array *notInHeap
 len int 
 cap int 
}

由定义可以看出slice底层是基于数组,本质是对数组的封装。由三部分组成:

指针指向第一个slice元素对应的底层数组元素地址。
长度 slice中元素的数目
容量 slice开始位置到底层数据的结尾

内置函数len和cap,分别返回slice的长度和容量。slice使用下标不能超过len,向后扩展不能超过cap。多个不同slice之间可以共享底层的数据,起始地址、长度都可以不同,所以slice第一个元素未必是数组的第一个元素。

image.png

使用切片

Slice代表变长的序列,序列中每个元素都有相同的类型,属于引用类型,一般这么声明:

var 变量名 []类型 //这样没有初始化赋值,仅仅是引用,没分配底层数组。
var 变量名 = []类型{置集合} //会分配底层数组,len、cap都是置集合大小
var 变量名 []类型 = make([]类型,len,cap) //这样会分配底层数组

注意,slice类型是不能比较的,对于字节型的slice,标准库有bytes.Equal函数用于比较,但是其他类型的slice,需要自行展开比较。

对于数组和slice,除了使用声明,还可以使用[begin:end:cap]来创建新的切片,注意这是个左闭右开区间,slice的容量是cap-begin,底层数据共享。
对于string类型,进行如[:]的操作,返回的是一个string,而不是切片,这个新的string和原来的string未必是一块内存,看编译器优化。
另外给切片从后面追加数据,可以用buildin函数append来实现。

func f(a []int) {
 a = append(a, 8)
 fmt.Printf("f cap:%d addr:%p value:%v\n", cap(a), &a[0], a)
}
func main() {
 var slice1 = []int{1, 2, 3, 4, 5, 6}
 fmt.Printf("%v %d %d\n", slice1, len(slice1), cap(slice1))
 slice2 := slice1[2:2:3]
 fmt.Printf("%v %d %d\n", slice2, len(slice2), cap(slice2))
 var a [5]int = [5]int{1, 2, 3, 4, 5} //先定义了一个数组
 array_slice := a[:]
 fmt.Printf("cap:%d a_addr:%p slice_addr:%p slice_type:%T\n", cap(array_slice), &a, &array_slice[0], array_slice)
 array_slice[1] = 9
 fmt.Printf("cap:%d addr:%p value:%v\n", cap(array_slice), &array_slice[0], a)
 array_slice = append(array_slice, 6)
 fmt.Printf("cap:%d addr:%p value:%v\n", cap(array_slice), &array_slice[0], array_slice)
 array_slice = append(array_slice, 7)
 fmt.Printf("cap:%d addr:%p value:%v\n", cap(array_slice), &array_slice[0], array_slice)
 f(array_slice)
 fmt.Printf("cap:%d addr:%p value:%v\n", cap(array_slice), &array_slice[0], array_slice)
 array_slice = array_slice[:8]
 fmt.Printf("cap:%d addr:%p value:%v\n", cap(array_slice), &array_slice[0], array_slice)
 var s string = "abcdefg"
 string_slice := s[0:5]
 fmt.Printf("%p %T\n", &s, string_slice)
}

输出:
[1 2 3 4 5 6] 6 6
[] 0 1
cap:5 a_addr:0xc042086090 slice_addr:0xc042086090 slice_type:[]int
cap:5 addr:0xc042086090 value:[1 9 3 4 5]
cap:10 addr:0xc04209a000 value:[1 9 3 4 5 6]
cap:10 addr:0xc04209a000 value:[1 9 3 4 5 6 7]
f cap:10 addr:0xc04209a000 value:[1 9 3 4 5 6 7 8]
cap:10 addr:0xc04209a000 value:[1 9 3 4 5 6 7]
cap:10 addr:0xc04209a000 value:[1 9 3 4 5 6 7 8]
0xc04205c1c0 string

可以看到返回的切片,底层数据是一样的,修改切片中某个元素的值,就是修改原数据的值。但是对切片进行append的时候,如果底层空间足够就使用原来的空间,如果底层空间不够,那么就会申请新的空间。函数传递切片的时候,也是值传递,不是引用传递,传递的是slice结构体那三个字段的值,所以不会复制slice的实际内容,在函数内append,那么在cap足够的时候,修改的仅仅是函数中slice的len,外面的slice len不会发生变化。

nil值、空值

slice有2个特殊的值,大家要注意分辨一下

var s []int //nil值 
var t = []int{} //空值
var u = make([]int, 3)[3:] //空值
fmt.Printf("value of s: %#v\n", s) // value of s: []int(nil)
fmt.Printf("value of t: %#v\n", t) // value of t: []int{}
fmt.Printf("value of u: %#v\n", u) //value of u: []int{}
fmt.Printf("s is nil? %v\n", s == nil) //true
fmt.Printf("t is nil? %v\n", t == nil) //false
fmt.Printf("u is nil? %v\n", u == nil) //false

区别是,nil slice的底层数组指针是nil,empty slice底层数组指针指向一个长度为0的数组。
所以测试一个slice是否有数据,使用len(s) == 0来判断,而不应用s == nil来判断。
一般的用法是nil slice表示数组不存在,empty slice表示集合为空。序列化json的时候,nil slice会变成null,empty是[]

源码分析

创建slice

// maxElems is a lookup table containing the maximum capacity for a slice.
// The index is the size of the slice element.
var maxElems = [...]uintptr{
 ^uintptr(0),
 maxAlloc / 1, maxAlloc / 2, maxAlloc / 3, maxAlloc / 4,
 maxAlloc / 5, maxAlloc / 6, maxAlloc / 7, maxAlloc / 8,
 maxAlloc / 9, maxAlloc / 10, maxAlloc / 11, maxAlloc / 12,
 maxAlloc / 13, maxAlloc / 14, maxAlloc / 15, maxAlloc / 16,
 maxAlloc / 17, maxAlloc / 18, maxAlloc / 19, maxAlloc / 20,
 maxAlloc / 21, maxAlloc / 22, maxAlloc / 23, maxAlloc / 24,
 maxAlloc / 25, maxAlloc / 26, maxAlloc / 27, maxAlloc / 28,
 maxAlloc / 29, maxAlloc / 30, maxAlloc / 31, maxAlloc / 32,
}
// maxSliceCap returns the maximum capacity for a slice.
func maxSliceCap(elemsize uintptr) uintptr {
 if elemsize < uintptr(len(maxElems)) {
 return maxElems[elemsize]
 }
 return maxAlloc / elemsize
}
func makeslice(et *_type, len, cap int) slice {
 // NOTE: The len > maxElements check here is not strictly necessary,
 // but it produces a 'len out of range' error instead of a 'cap out of range' error
 // when someone does make([]T, bignumber). 'cap out of range' is true too,
 // but since the cap is only being supplied implicitly, saying len is clearer.
 // See issue 4085.
 maxElements := maxSliceCap(et.size)
 if len < 0 || uintptr(len) > maxElements {
 panicmakeslicelen()
 }
 if cap < len || uintptr(cap) > maxElements {
 panicmakeslicecap()
 }
 p := mallocgc(et.size*uintptr(cap), et, true)
 return slice{p, len, cap}
}

可以看到创建slice的流程非常简单,根据类型的大小,算出最多能申请多少个元素,然后检查一下参数,不对就panic,就用malloc申请空间,赋值到一个slice结构体中,返回。

扩容

// growslice handles slice growth during append.
// It is passed the slice element type, the old slice, and the desired new minimum capacity,
// and it returns a new slice with at least that capacity, with the old data
// copied into it.
// The new slice's length is set to the old slice's length,
// NOT to the new requested capacity.
// This is for codegen convenience. The old slice's length is used immediately
// to calculate where to write new values during an append.
// TODO: When the old backend is gone, reconsider this decision.
// The SSA backend might prefer the new length or to return only ptr/cap and save stack space.
func growslice(et *_type, old slice, cap int) slice {
 if raceenabled {
 callerpc := getcallerpc()
 racereadrangepc(old.array, uintptr(old.len*int(et.size)), callerpc, funcPC(growslice))
 }
 if msanenabled {
 msanread(old.array, uintptr(old.len*int(et.size)))
 }
 if et.size == 0 {
 if cap < old.cap {
 panic(errorString("growslice: cap out of range"))
 }
 // append should not create a slice with nil pointer but non-zero len.
 // We assume that append doesn't need to preserve old.array in this case.
 return slice{unsafe.Pointer(&zerobase), old.len, cap}
 }
 newcap := old.cap
 doublecap := newcap + newcap
 if cap > doublecap {
 newcap = cap
 } else {
 if old.len < 1024 {
 newcap = doublecap
 } else {
 // Check 0 < newcap to detect overflow
 // and prevent an infinite loop.
 for 0 < newcap && newcap < cap {
 newcap += newcap / 4
 }
 // Set newcap to the requested cap when
 // the newcap calculation overflowed.
 if newcap <= 0 {
 newcap = cap
 }
 }
 }
 var overflow bool
 var lenmem, newlenmem, capmem uintptr
 // Specialize for common values of et.size.
 // For 1 we don't need any division/multiplication.
 // For sys.PtrSize, compiler will optimize division/multiplication into a shift by a constant.
 // For powers of 2, use a variable shift.
 switch {
 case et.size == 1:
 lenmem = uintptr(old.len)
 newlenmem = uintptr(cap)
 capmem = roundupsize(uintptr(newcap))
 overflow = uintptr(newcap) > maxAlloc
 newcap = int(capmem)
 case et.size == sys.PtrSize:
 lenmem = uintptr(old.len) * sys.PtrSize
 newlenmem = uintptr(cap) * sys.PtrSize
 capmem = roundupsize(uintptr(newcap) * sys.PtrSize)
 overflow = uintptr(newcap) > maxAlloc/sys.PtrSize
 newcap = int(capmem / sys.PtrSize)
 case isPowerOfTwo(et.size):
 var shift uintptr
 if sys.PtrSize == 8 {
 // Mask shift for better code generation.
 shift = uintptr(sys.Ctz64(uint64(et.size))) & 63
 } else {
 shift = uintptr(sys.Ctz32(uint32(et.size))) & 31
 }
 lenmem = uintptr(old.len) << shift
 newlenmem = uintptr(cap) << shift
 capmem = roundupsize(uintptr(newcap) << shift)
 overflow = uintptr(newcap) > (maxAlloc >> shift)
 newcap = int(capmem >> shift)
 default:
 lenmem = uintptr(old.len) * et.size
 newlenmem = uintptr(cap) * et.size
 capmem = roundupsize(uintptr(newcap) * et.size)
 overflow = uintptr(newcap) > maxSliceCap(et.size)
 newcap = int(capmem / et.size)
 }
 // The check of overflow (uintptr(newcap) > maxSliceCap(et.size))
 // in addition to capmem > _MaxMem is needed to prevent an overflow
 // which can be used to trigger a segfault on 32bit architectures
 // with this example program:
 //
 // type T [1<<27 + 1]int64
 //
 // var d T
 // var s []T
 //
 // func main() {
 // s = append(s, d, d, d, d)
 // print(len(s), "\n")
 // }
 if cap < old.cap || overflow || capmem > maxAlloc {
 panic(errorString("growslice: cap out of range"))
 }
 var p unsafe.Pointer
 if et.kind&kindNoPointers != 0 {
 p = mallocgc(capmem, nil, false)
 memmove(p, old.array, lenmem)
 // The append() that calls growslice is going to overwrite from old.len to cap (which will be the new length).
 // Only clear the part that will not be overwritten.
 memclrNoHeapPointers(add(p, newlenmem), capmem-newlenmem)
 } else {
 // Note: can't use rawmem (which avoids zeroing of memory), because then GC can scan uninitialized memory.
 p = mallocgc(capmem, et, true)
 if !writeBarrier.enabled {
 memmove(p, old.array, lenmem)
 } else {
 for i := uintptr(0); i < lenmem; i += et.size {
 typedmemmove(et, add(p, i), add(old.array, i))
 }
 }
 }
 return slice{p, old.len, newcap}
}

扩容时,先计算需要扩多少个,算法是这样的:

如果申请的容量(cap)是老容量(old.cap)的两倍以上,那么就扩成cap
否则,如果老容量old.cap小于1024,那么就扩成old.cap x 2
再否则,newcap初始为old.cap,一直循环newcap += newcap/4,直到不小于cap,newcap就是最终扩成的大小,注意这里还有个溢出保护,如果溢出了,那么newcap=cap。

计算完需要申请的元素个数大小之后,就计算内存位置,进行复制,这里不细说。

需要注意的地方是,扩容之后可能还是原来的数组,因为可能底层数组还有空间。

slice copy

func slicecopy(to, fm slice, width uintptr) int {
 if fm.len == 0 || to.len == 0 {
 return 0
 }
 n := fm.len
 if to.len < n {
 n = to.len
 }
 if width == 0 {
 return n
 }
 if raceenabled {
 callerpc := getcallerpc()
 pc := funcPC(slicecopy)
 racewriterangepc(to.array, uintptr(n*int(width)), callerpc, pc)
 racereadrangepc(fm.array, uintptr(n*int(width)), callerpc, pc)
 }
 if msanenabled {
 msanwrite(to.array, uintptr(n*int(width)))
 msanread(fm.array, uintptr(n*int(width)))
 }
 size := uintptr(n) * width
 if size == 1 { // common case worth about 2x to do here
 // TODO: is this still worth it with new memmove impl?
 *(*byte)(to.array) = *(*byte)(fm.array) // known to be a byte pointer
 } else {
 memmove(to.array, fm.array, size)
 }
 return n
}

这是常规的copy,比较两个slice的len,选取小的len进行复制,把fm的内容复制to中,使用memmove对array进行内存拷贝。我们可以看到因为使用的是较小的len,所以slice to中的cap不需要改变。如果fm的len较小,那么就覆盖to中的前len个位置,其余不变。eg:

func main() {
 var slice1 = []int{1, 2, 3, 4, 5, 6}
 var slice2 = []int{8,9,10,11,12,13,14,15}
 copy(slice2, slice1)
 fmt.Printf("len:%d cap:%d %#v\n", len(slice2), cap(slice2), slice2)
}

输出
len:8 cap:8 []int{1, 2, 3, 4, 5, 6, 14, 15}

有疑问加站长微信联系(非本文作者)

本文来自:简书

感谢作者:123archu

查看原文:剖析golang slice的实现

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

关注微信

1678 次点击

上一篇:Golang学习:为基本类型添加方法

下一篇:[golang] json序列化时自动将id加密

slice 切片函数变量

0 回复

暂无回复

添加一条新回复 (您需要后才能回复没有账号 ?)

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

用户登录

Go今日面试题

(追記) (追記ここまで)

今日阅读排行

加载中

(追記) (追記ここまで)

一周阅读排行

加载中

关注我

扫码关注领全套学习资料关注微信公众号
加入 QQ 群:
- 192706294(已满)
- 731990104(已满)
- 798786647(已满)
- 729884609(已满)
- 977810755(已满)
- 815126783(已满)
- 812540095(已满)
- 1006366459(已满)
- 692541889
关注微信公众号
加入微信群:liuxiaoyan-s,备注入群
也欢迎加入知识星球 Go粉丝们(免费)

给该专栏投稿写篇新文章

每篇文章有总共有 5 次投稿机会

剖析golang slice的实现

slice 结构

使用切片

nil值、空值

源码分析

创建slice

扩容

slice copy

用户登录

今日阅读排行

一周阅读排行

关注我

给该专栏投稿 写篇新文章

收入到我管理的专栏 新建专栏

slice 结构

使用切片

nil值、空值

源码分析

创建slice

扩容

slice copy

给该专栏投稿写篇新文章

收入到我管理的专栏新建专栏