.. _algorithms:
用python实现基本数据结构和算法
=====================================================================
1章:ADT抽象数据类型,定义数据和其操作
--------------------------------------
什么是ADT: 抽象数据类型,学过数据结构的应该都知道。
How to select datastructures for ADT
1. Dose the data structure provide for the storage requirements as specified by the domain of the ADT?
2. Does the data structure provide the data access and manipulation functionality to fully implement the ADT?
3. Effcient implemention? based on complexity analysis.
下边代码是个简单的示例,比如实现一个简单的Bag类,先定义其具有的操作,然后我们再用类的magic
method来实现这些方法:
::
class Bag:
"""
constructor: 构造函数
size
contains
append
remove
iter
"""
def __init__(self):
self._items = list()
def __len__(self):
return len(self._items)
def __contains__(self, item):
return item in self._items
def add(self, item):
self._items.append(item)
def remove(self, item):
assert item in self._items, 'item must in the bag'
return self._items.remove(item)
def __iter__(self):
return _BagIterator(self._items)
class _BagIterator:
""" 注意这里实现了迭代器类 """
def __init__(self, seq):
self._bag_items = seq
self._cur_item = 0
def __iter__(self):
return self
def __next__(self):
if self._cur_item < len(self._bag_items): item = self._bag_items[self._cur_item] self._cur_item += 1 return item else: raise StopIteration b = Bag() b.add(1) b.add(2) for i in b: # for使用__iter__构建,用__next__迭代 print(i) """ # for 语句等价于 i = b.__iter__() while True: try: item = i.__next__() print(item) except StopIteration: break """ -------------- 2章:array vs list ------------------ array: 定长,操作有限,但是节省内存;貌似我的生涯中还没用过,不过python3.5中我试了确实有array类,可以用import array直接导入 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ list: 会预先分配内存,操作丰富,但是耗费内存。我用sys.getsizeof做了实验。我个人理解很类似C++ STL里的vector,是使用最频繁的数据结构。 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - list.append: 如果之前没有分配够内存,会重新开辟新区域,然后复制之前的数据,复杂度退化 - list.insert: 会移动被插入区域后所有元素,O(n) - list.pop: pop不同位置需要的复杂度不同pop(0)是O(1)复杂度,pop()首位O(n)复杂度 - list[]: slice操作copy数据(预留空间)到另一个list 来实现一个array的ADT: :: import ctypes class Array: def __init__(self, size): assert size> 0, 'array size must be> 0'
self._size = size
PyArrayType = ctypes.py_object * size
self._elements = PyArrayType()
self.clear(None)
def __len__(self):
return self._size
def __getitem__(self, index):
assert index>= 0 and index < len(self), 'out of range' return self._elements[index] def __setitem__(self, index, value): assert index>= 0 and index < len(self), 'out of range' self._elements[index] = value def clear(self, value): """ 设置每个元素为value """ for i in range(len(self)): self._elements[i] = value def __iter__(self): return _ArrayIterator(self._elements) class _ArrayIterator: def __init__(self, items): self._items = items self._idx = 0 def __iter__(self): return self def __next__(self): if self._idex < len(self._items): val = self._items[self._idx] self._idex += 1 return val else: raise StopIteration Two-Demensional Arrays ~~~~~~~~~~~~~~~~~~~~~~ :: class Array2D: """ 要实现的方法 Array2D(nrows, ncols): constructor numRows() numCols() clear(value) getitem(i, j) setitem(i, j, val) """ def __init__(self, numrows, numcols): self._the_rows = Array(numrows) # 数组的数组 for i in range(numrows): self._the_rows[i] = Array(numcols) @property def numRows(self): return len(self._the_rows) @property def NumCols(self): return len(self._the_rows[0]) def clear(self, value): for row in self._the_rows: row.clear(value) def __getitem__(self, ndx_tuple): # ndx_tuple: (x, y) assert len(ndx_tuple) == 2 row, col = ndx_tuple[0], ndx_tuple[1] assert (row>= 0 and row < self.numRows and col>= 0 and col < self.NumCols) the_1d_array = self._the_rows[row] return the_1d_array[col] def __setitem__(self, ndx_tuple, value): assert len(ndx_tuple) == 2 row, col = ndx_tuple[0], ndx_tuple[1] assert (row>= 0 and row < self.numRows and col>= 0 and col < self.NumCols) the_1d_array = self._the_rows[row] the_1d_array[col] = value The Matrix ADT, m行,n列。这个最好用还是用pandas处理矩阵,自己实现比较\*疼 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: class Matrix: """ 最好用pandas的DataFrame Matrix(rows, ncols): constructor numCols() getitem(row, col) setitem(row, col, val) scaleBy(scalar): 每个元素乘scalar transpose(): 返回transpose转置 add(rhsMatrix): size must be the same subtract(rhsMatrix) multiply(rhsMatrix) """ def __init__(self, numRows, numCols): self._theGrid = Array2D(numRows, numCols) self._theGrid.clear(0) @property def numRows(self): return self._theGrid.numRows @property def NumCols(self): return self._theGrid.numCols def __getitem__(self, ndxTuple): return self._theGrid[ndxTuple[0], ndxTuple[1]] def __setitem__(self, ndxTuple, scalar): self._theGrid[ndxTuple[0], ndxTuple[1]] = scalar def scaleBy(self, scalar): for r in range(self.numRows): for c in range(self.numCols): self[r, c] *= scalar def __add__(self, rhsMatrix): assert (rhsMatrix.numRows == self.numRows and rhsMatrix.numCols == self.numCols) newMartrix = Matrix(self.numRows, self.numCols) for r in range(self.numRows): for c in range(self.numCols): newMartrix[r, c] = self[r, c] + rhsMatrix[r, c] -------------- 3章:Sets and Maps ------------------ 除了list之外,最常用的应该就是python内置的set和dict了。 sets ADT ~~~~~~~~ A set is a container that stores a collection of unique values over a given comparable domain in which the stored values have no particular ordering. :: class Set: """ 使用list实现set ADT Set() length() contains(element) add(element) remove(element) equals(element) isSubsetOf(setB) union(setB) intersect(setB) difference(setB) iterator() """ def __init__(self): self._theElements = list() def __len__(self): return len(self._theElements) def __contains__(self, element): return element in self._theElements def add(self, element): if element not in self: self._theElements.append(element) def remove(self, element): assert element in self, 'The element must be set' self._theElements.remove(element) def __eq__(self, setB): if len(self) != len(setB): return False else: return self.isSubsetOf(setB) def isSubsetOf(self, setB): for element in self: if element not in setB: return False return True def union(self, setB): newSet = Set() newSet._theElements.extend(self._theElements) for element in setB: if element not in self: newSet._theElements.append(element) return newSet Maps or Dict: 键值对,python内部采用hash实现。 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: class Map: """ Map ADT list implemention Map() length() contains(key) add(key, value) remove(key) valudOf(key) iterator() """ def __init__(self): self._entryList = list() def __len__(self): return len(self._entryList) def __contains__(self, key): ndx = self._findPosition(key) return ndx is not None def add(self, key, value): ndx = self._findPosition(key) if ndx is not None: self._entryList[ndx].value = value return False else: entry = _MapEntry(key, value) self._entryList.append(entry) return True def valueOf(self, key): ndx = self._findPosition(key) assert ndx is not None, 'Invalid map key' return self._entryList[ndx].value def remove(self, key): ndx = self._findPosition(key) assert ndx is not None, 'Invalid map key' self._entryList.pop(ndx) def __iter__(self): return _MapIterator(self._entryList) def _findPosition(self, key): for i in range(len(self)): if self._entryList[i].key == key: return i return None class _MapEntry: # or use collections.namedtuple('_MapEntry', 'key,value') def __init__(self, key, value): self.key = key self.value = value The multiArray ADT, 多维数组,一般是使用一个一维数组模拟,然后通过计算下标获取元素 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: class MultiArray: """ row-major or column-marjor ordering, this is row-major ordering MultiArray(d1, d2, ...dn) dims(): the number of dimensions length(dim): the length of given array dimension clear(value) getitem(i1, i2, ... in), index(i1,i2,i3) = i1*(d2*d3) + i2*d3 + i3 setitem(i1, i2, ... in) 计算下标:index(i1,i2,...in) = i1*f1 + i2*f2 + ... + i(n-1)*f(n-1) + in*1 """ def __init__(self, *dimensions): # Implementation of MultiArray ADT using a 1-D # array,数组的数组的数组。。。 assert len(dimensions)> 1, 'The array must have 2 or more dimensions'
self._dims = dimensions
# Compute to total number of elements in the array
size = 1
for d in dimensions:
assert d> 0, 'Dimensions must be> 0'
size *= d
# Create the 1-D array to store the elements
self._elements = Array(size)
# Create a 1-D array to store the equation factors
self._factors = Array(len(dimensions))
self._computeFactors()
@property
def numDims(self):
return len(self._dims)
def length(self, dim):
assert dim> 0 and dim < len(self._dims), 'Dimension component out of range' return self._dims[dim-1] def clear(self, value): self._elements.clear(value) def __getitem__(self, ndxTuple): assert len(ndxTuple) == self.numDims, 'Invalid # of array subscripts' index = self._computeIndex(ndxTuple) assert index is not None, 'Array subscript out of range' return self._elements[index] def __setitem__(self, ndxTuple, value): assert len(ndxTuple) == self.numDims, 'Invalid # of array subscripts' index = self._computeIndex(ndxTuple) assert index is not None, 'Array subscript out of range' self._elements[index] = value def _computeIndex(self, ndxTuple): # using the equation: i1*f1 + i2*f2 + ... + in*fn offset = 0 for j in range(len(ndxTuple)): if ndxTuple[j] < 0 or ndxTuple[j]>= self._dims[j]:
return None
else:
offset += ndexTuple[j] * self._factors[j]
return offset
--------------
4章:Algorithm Analysis
------------------------------------
一般使用大O标记法来衡量算法的平均时间复杂度, 1 < log(n) < n < nlog(n) < n^2 < n^3 < a^n。 了解常用数据结构操作的平均时间复杂度有利于使用更高效的数据结构,当然有时候需要在时间和空间上进行衡量,有些操作甚至还会退化,比如list的append操作,如果list空间不够,会去开辟新的空间,操作复杂度退化到O(n),有时候还需要使用均摊分析(amortized) -------------- 5章:Searching and Sorting ------------------------------------ 排序和查找是最基础和频繁的操作,python内置了in操作符和bisect二分操作模块实现查找,内置了sorted方法来实现排序操作。二分和快排也是面试中经常考到的,本章讲的是基本的排序和查找。 :: def binary_search(sorted_seq, val): """ 实现标准库中的bisect.bisect_left """ low = 0 high = len(sorted_seq) - 1 while low <= high: mid = (high + low) // 2 if sorted_seq[mid] == val: return mid elif val < sorted_seq[mid]: high = mid - 1 else: low = mid + 1 return low def bubble_sort(seq): # O(n^2), n(n-1)/2 = 1/2(n^2 + n) n = len(seq) for i in range(n-1): for j in range(n-1-i): # 这里之所以 n-1 还需要 减去 i 是因为每一轮冒泡最大的元素都会冒泡到最后,无需再比较 if seq[j]> seq[j+1]:
seq[j], seq[j+1] = seq[j+1], seq[j]
def select_sort(seq):
"""可以看作是冒泡的改进,每次找一个最小的元素交换,每一轮只需要交换一次"""
n = len(seq)
for i in range(n-1):
min_idx = i # assume the ith element is the smallest
for j in range(i+1, n):
if seq[j] < seq[min_idx]: # find the minist element index min_idx = j if min_idx != i: # swap seq[i], seq[min_idx] = seq[min_idx], seq[i] def insertion_sort(seq): """ 每次挑选下一个元素插入已经排序的数组中,初始时已排序数组只有一个元素""" n = len(seq) for i in range(1, n): value = seq[i] # save the value to be positioned # find the position where value fits in the ordered part of the list pos = i while pos> 0 and value < seq[pos-1]: # Shift the items to the right during the search seq[pos] = seq[pos-1] pos -= 1 seq[pos] = value def merge_sorted_list(listA, listB): """ 归并两个有序数组 """ new_list = list() a = b = 0 while a < len(listA) and b < len(listB): if listA[a] < listB[b]: new_list.append(listA[a]) a += 1 else: new_list.append(listB[b]) b += 1 while a < len(listA): new_list.append(listA[a]) a += 1 while b < len(listB): new_list.append(listB[b]) b += 1 return new_list 6章: Linked Structure ------------------------ list是最常用的数据结构,但是list在中间增减元素的时候效率会很低,这时候linked list会更适合,缺点就是获取元素的平均时间复杂度变成了O(n) :: # 单链表实现 class ListNode: def __init__(self, data): self.data = data self.next = None def travsersal(head, callback): curNode = head while curNode is not None: callback(curNode.data) curNode = curNode.next def unorderdSearch(head, target): curNode = head while curNode is not None and curNode.data != target: curNode = curNode.next return curNode is not None # Given the head pointer, prepend an item to an unsorted linked list. def prepend(head, item): newNode = ListNode(item) newNode.next = head head = newNode # Given the head reference, remove a target from a linked list def remove(head, target): predNode = None curNode = head while curNode is not None and curNode.data != target: # 寻找目标 predNode = curNode curNode = curNode.data if curNode is not None: if curNode is head: head = curNode.next else: predNode.next = curNode.next -------------- 7章:Stacks ----------- 栈也是计算机里用得比较多的数据结构,栈是一种后进先出的数据结构,可以理解为往一个桶里放盘子,先放进去的会被压在地下,拿盘子的时候,后放的会被先拿出来。 :: class Stack: """ Stack ADT, using a python list Stack() isEmpty() length() pop(): assert not empty peek(): assert not empty, return top of non-empty stack without removing it push(item) """ def __init__(self): self._items = list() def isEmpty(self): return len(self) == 0 def __len__(self): return len(self._items) def peek(self): assert not self.isEmpty() return self._items[-1] def pop(self): assert not self.isEmpty() return self._items.pop() def push(self, item): self._items.append(item) class Stack: """ Stack ADT, use linked list 使用list实现很简单,但是如果涉及大量push操作,list的空间不够时复杂度退化到O(n) 而linked list可以保证最坏情况下仍是O(1) """ def __init__(self): self._top = None # top节点, _StackNode or None self._size = 0 # int def isEmpty(self): return self._top is None def __len__(self): return self._size def peek(self): assert not self.isEmpty() return self._top.item def pop(self): assert not self.isEmpty() node = self._top self.top = self._top.next self._size -= 1 return node.item def _push(self, item): self._top = _StackNode(item, self._top) self._size += 1 class _StackNode: def __init__(self, item, link): self.item = item self.next = link -------------- 8章:Queues ----------- 队列也是经常使用的数据结构,比如发送消息等,celery可以使用redis提供的list实现消息队列。 本章我们用list和linked list来实现队列和优先级队列。 :: class Queue: """ Queue ADT, use list。list实现,简单但是push和pop效率最差是O(n) Queue() isEmpty() length() enqueue(item) dequeue() """ def __init__(self): self._qList = list() def isEmpty(self): return len(self) == 0 def __len__(self): return len(self._qList) def enquue(self, item): self._qList.append(item) def dequeue(self): assert not self.isEmpty() return self._qList.pop(0) from array import Array # Array那一章实现的Array ADT class Queue: """ circular Array ,通过头尾指针实现。list内置append和pop复杂度会退化,使用 环数组实现可以使得入队出队操作时间复杂度为O(1),缺点是数组长度需要固定。 """ def __init__(self, maxSize): self._count = 0 self._front = 0 self._back = maxSize - 1 self._qArray = Array(maxSize) def isEmpty(self): return self._count == 0 def isFull(self): return self._count == len(self._qArray) def __len__(self): return len(self._count) def enqueue(self, item): assert not self.isFull() maxSize = len(self._qArray) self._back = (self._back + 1) % maxSize # 移动尾指针 self._qArray[self._back] = item self._count += 1 def dequeue(self): assert not self.isFull() item = self._qArray[self._front] maxSize = len(self._qArray) self._front = (self._front + 1) % maxSize self._count -= 1 return item class _QueueNode: def __init__(self, item): self.item = item class Queue: """ Queue ADT, linked list 实现。为了改进环型数组有最大数量的限制,改用 带有头尾节点的linked list实现。 """ def __init__(self): self._qhead = None self._qtail = None self._qsize = 0 def isEmpty(self): return self._qhead is None def __len__(self): return self._count def enqueue(self, item): node = _QueueNode(item) # 创建新的节点并用尾节点指向他 if self.isEmpty(): self._qhead = node else: self._qtail.next = node self._qtail = node self._qcount += 1 def dequeue(self): assert not self.isEmpty(), 'Can not dequeue from an empty queue' node = self._qhead if self._qhead is self._qtail: self._qtail = None self._qhead = self._qhead.next # 前移头节点 self._count -= 1 return node.item class UnboundedPriorityQueue: """ PriorityQueue ADT: 给每个item加上优先级p,高优先级先dequeue 分为两种: - bounded PriorityQueue: 限制优先级在一个区间[0...p) - unbounded PriorityQueue: 不限制优先级 PriorityQueue() BPriorityQueue(numLevels): create a bounded PriorityQueue with priority in range [0, numLevels-1] isEmpty() length() enqueue(item, priority): 如果是bounded PriorityQueue, priority必须在区间内 dequeue(): 最高优先级的出队,同优先级的按照FIFO顺序 - 两种实现方式: 1.入队的时候都是到队尾,出队操作找到最高优先级的出队,出队操作O(n) 2.始终维持队列有序,每次入队都找到该插入的位置,出队操作是O(1) (注意如果用list实现list.append和pop操作复杂度会因内存分配退化) """ from collections import namedtuple _PriorityQEntry = namedtuple('_PriorityQEntry', 'item, priority') # 采用方式1,用内置list实现unbounded PriorityQueue def __init__(self): self._qlist = list() def isEmpty(self): return len(self) == 0 def __len__(self): return len(self._qlist) def enqueue(self, item, priority): entry = UnboundedPriorityQueue._PriorityQEntry(item, priority) self._qlist.append(entry) def deque(self): assert not self.isEmpty(), 'can not deque from an empty queue' highest = self._qlist[0].priority for i in range(len(self)): # 出队操作O(n),遍历找到最高优先级 if self._qlist[i].priority < highest: highest = self._qlist[i].priority entry = self._qlist.pop(highest) return entry.item class BoundedPriorityQueue: """ BoundedPriorityQueue ADT,用linked list实现。上一个地方提到了 BoundedPriorityQueue 但是为什么需要 BoundedPriorityQueue呢? BoundedPriorityQueue 的优先级限制在[0, maxPriority-1] 对于 UnboundedPriorityQueue,出队操作由于要遍历寻找优先级最高的item,所以平均 是O(n)的操作,但是对于 BoundedPriorityQueue,用队列数组实现可以达到常量时间, 用空间换时间。比如要弹出一个元素,直接找到第一个非空队列弹出 元素就可以了。 (小数字代表高优先级,先出队) qlist [0] -> ["white"]
[1]
[2] -> ["black", "green"]
[3] -> ["purple", "yellow"]
"""
# Implementation of the bounded Priority Queue ADT using an array of #
# queues in which the queues are implemented using a linked list.
from array import Array # 第二章定义的ADT
def __init__(self, numLevels):
self._qSize = 0
self._qLevels = Array(numLevels)
for i in range(numLevels):
self._qLevels[i] = Queue() # 上一节讲到用linked list实现的Queue
def isEmpty(self):
return len(self) == 0
def __len__(self):
return len(self._qSize)
def enqueue(self, item, priority):
assert priority>= 0 and priority < len(self._qLevels), 'invalid priority' self._qLevel[priority].enquue(item) # 直接找到 priority 对应的槽入队 def deque(self): assert not self.isEmpty(), 'can not deque from an empty queue' i = 0 p = len(self._qLevels) while i < p and not self._qLevels[i].isEmpty(): # 找到第一个非空队列 i += 1 return self._qLevels[i].dequeue() -------------- 9章:Advanced Linked Lists -------------------------- 之前曾经介绍过单链表,一个链表节点只有data和next字段,本章介绍高级的链表。 Doubly Linked List,双链表,每个节点多了个prev指向前一个节点。双链表可以用来编写文本编辑器的buffer。 :: class DListNode: def __init__(self, data): self.data = data self.prev = None self.next = None def revTraversa(tail): curNode = tail while cruNode is not None: print(curNode.data) curNode = curNode.prev def search_sorted_doubly_linked_list(head, tail, probe, target): """ probing technique探查法,改进直接遍历,不过最坏时间复杂度仍是O(n) searching a sorted doubly linked list using the probing technique Args: head (DListNode obj) tail (DListNode obj) probe (DListNode or None) target (DListNode.data): data to search """ if head is None: # make sure list is not empty return False if probe is None: # if probe is null, initialize it to first node probe = head # if the target comes before the probe node, we traverse backward, otherwise # traverse forward if target < probe.data: while probe is not None and target <= probe.data: if target == probe.dta: return True else: probe = probe.prev else: while probe is not None and target>= probe.data:
if target == probe.data:
return True
else:
probe = probe.next
return False
def insert_node_into_ordered_doubly_linekd_list(value):
""" 最好画个图看,链表操作很容易绕晕,注意赋值顺序"""
newnode = DListNode(value)
if head is None: # empty list
head = newnode
tail = head
elif value < head.data: # insert before head newnode.next = head head.prev = newnode head = newnode elif value> tail.data: # insert after tail
newnode.prev = tail
tail.next = newnode
tail = newnode
else: # insert into middle
node = head
while node is not None and node.data < value: node = node.next newnode.next = node newnode.prev = node.prev node.prev.next = newnode node.prev = newnode 循环链表 :: def travrseCircularList(listRef): curNode = listRef done = listRef is None while not None: curNode = curNode.next print(curNode.data) done = curNode is listRef # 回到遍历起始点 def searchCircularList(listRef, target): curNode = listRef done = listRef is None while not done: curNode = curNode.next if curNode.data == target: return True else: done = curNode is listRef or curNode.data> target
return False
def add_newnode_into_ordered_circular_linked_list(listRef, value):
""" 插入并维持顺序
1.插入空链表;2.插入头部;3.插入尾部;4.按顺序插入中间
"""
newnode = ListNode(value)
if listRef is None: # empty list
listRef = newnode
newnode.next = newnode
elif value < listRef.next.data: # insert in front newnode.next = listRef.next listRef.next = newnode elif value> listRef.data: # insert in back
newnode.next = listRef.next
listRef.next = newnode
listRef = newnode
else: # insert in the middle
preNode = None
curNode = listRef
done = listRef is None
while not done:
preNode = curNode
preNode = curNode.next
done = curNode is listRef or curNode.data> value
newnode.next = curNode
preNode.next = newnode
利用循环双端链表我们可以实现一个经典的缓存失效算法,lru:
::
# -*- coding: utf-8 -*-
class Node(object):
def __init__(self, prev=None, next=None, key=None, value=None):
self.prev, self.next, self.key, self.value = prev, next, key, value
class CircularDoubleLinkedList(object):
def __init__(self):
node = Node()
node.prev, node.next = node, node
self.rootnode = node
def headnode(self):
return self.rootnode.next
def tailnode(self):
return self.rootnode.prev
def remove(self, node):
if node is self.rootnode:
return
else:
node.prev.next = node.next
node.next.prev = node.prev
def append(self, node):
tailnode = self.tailnode()
tailnode.next = node
node.next = self.rootnode
self.rootnode.prev = node
class LRUCache(object):
def __init__(self, maxsize=16):
self.maxsize = maxsize
self.cache = {}
self.access = CircularDoubleLinkedList()
self.isfull = len(self.cache)>= self.maxsize
def __call__(self, func):
def wrapper(n):
cachenode = self.cache.get(n)
if cachenode is not None: # hit
self.access.remove(cachenode)
self.access.append(cachenode)
return cachenode.value
else: # miss
value = func(n)
if not self.isfull:
tailnode = self.access.tailnode()
newnode = Node(tailnode, self.access.rootnode, n, value)
self.access.append(newnode)
self.cache[n] = newnode
self.isfull = len(self.cache)>= self.maxsize
return value
else: # full
lru_node = self.access.headnode()
del self.cache[lru_node.key]
self.access.remove(lru_node)
tailnode = self.access.tailnode()
newnode = Node(tailnode, self.access.rootnode, n, value)
self.access.append(newnode)
self.cache[n] = newnode
return value
return wrapper
@LRUCache()
def fib(n):
if n <= 2: return 1 else: return fib(n - 1) + fib(n - 2) for i in range(1, 35): print(fib(i)) -------------- 10章:Recursion -------------------------------------- Recursion is a process for solving problems by subdividing a larger problem into smaller cases of the problem itself and then solving the smaller, more trivial parts. 递归函数:调用自己的函数 :: # 递归函数:调用自己的函数,看一个最简单的递归函数,倒序打印一个数 def printRev(n): if n> 0:
print(n)
printRev(n-1)
printRev(3) # 从10输出到1
# 稍微改一下,print放在最后就得到了正序打印的函数
def printInOrder(n):
if n> 0:
printInOrder(n-1)
print(n) # 之所以最小的先打印是因为函数一直递归到n==1时候的最深栈,此时不再
# 递归,开始执行print语句,这时候n==1,之后每跳出一层栈,打印更大的值
printInOrder(3) # 正序输出
Properties of Recursion: 使用stack解决的问题都能用递归解决
- A recursive solution must contain a base case; 递归出口,代表最小子问题(n == 0退出打印)
- A recursive solution must contain a recursive case; 可以分解的子问题
- A recursive solution must make progress toward the base case. 递减n使得n像递归出口靠近
Tail Recursion: occurs when a function includes a single recursive call
as the last statement of the function. In this case, a stack is not
needed to store values to te used upon the return of the recursive call
and thus a solution can be implemented using a iterative loop instead.
::
# Recursive Binary Search
def recBinarySearch(target, theSeq, first, last):
# 你可以写写单元测试来验证这个函数的正确性
if first> last: # 递归出口1
return False
else:
mid = (first + last) // 2
if theSeq[mid] == target:
return True # 递归出口2
elif theSeq[mid]> target:
return recBinarySearch(target, theSeq, first, mid - 1)
else:
return recBinarySearch(target, theSeq, mid + 1, last)
--------------
11章:Hash Tables
--------------------
基于比较的搜索(线性搜索,有序数组的二分搜索)最好的时间复杂度只能达到O(logn),利用hash可以实现O(1)查找,python内置dict的实现方式就是hash,你会发现dict的key必须要是实现了 ``__hash__`` 和 ``__eq__`` 方法的。
Hashing: hashing is the process of mapping a search a key to a limited
range of array indeices with the goal of providing direct access to the
keys.
hash方法有个hash函数用来给key计算一个hash值,作为数组下标,放到该下标对应的槽中。当不同key根据hash函数计算得到的下标相同时,就出现了冲突。解决冲突有很多方式,比如让每个槽成为链表,每次冲突以后放到该槽链表的尾部,但是查询时间就会退化,不再是O(1)。还有一种探查方式,当key的槽冲突时候,就会根据一种计算方式去寻找下一个空的槽存放,探查方式有线性探查,二次方探查法等,cpython解释器使用的是二次方探查法。还有一个问题就是当python使用的槽数量大于预分配的2/3时候,会重新分配内存并拷贝以前的数据,所以有时候dict的add操作代价还是比较高的,牺牲空间但是可以始终保证O(1)的查询效率。如果有大量的数据,建议还是使用bloomfilter或者redis提供的HyperLogLog。
如果你感兴趣,可以看看这篇文章,介绍c解释器如何实现的python dict对象: `Python dictionary implementation