Commit 6051796

committed

update cnn

1 parent cedd192 commit 6051796Copy full SHA for 6051796

File tree

6 files changed

+265

-2

lines changed

4_Convoltional_Neural_Networks_LeNet_卷积神经网络.md
5_denoising_autoencoders_降噪自动编码.md
images

6 files changed

+265

-2

lines changed

`‎4_Convoltional_Neural_Networks_LeNet_卷积神经网络.md‎`

Lines changed: 5 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -202,7 +202,7 @@ With ignore_border set to False:`
`202`	`202`
`203`	`203`	`![full_model](/images/4_full_model_1.png)`
`204`	`204`
`205`		`-模型比较低的层是由卷积和最大池化层的交替来构建的,较高的层则是全连接的传统MLP(隐藏层+逻辑回归)。第一个全连接层的输入是前一层(thr layer below)的特征映射的集合。`
	`205`	`+模型比较低的层是由卷积和最大池化层的交替来构建的,较高的层则是全连接的传统MLP(隐藏层+逻辑回归)。第一个全连接层的输入是前一层(the layer below)的特征映射的集合。`
`206`	`206`
`207`	`207`	`从上图的实现来看,较低层的操作都是建立在4维张量上的。然后它需要被压缩为2维的特征映射,来适应之后的MLP实现。`
`208`	`208`
`@@ -444,7 +444,10 @@ The code for file convolutional_mlp.py ran for 32.52m`
`444`	`444`	`#####最大池化的尺寸`
`445`	`445`	`经典的是22,或者没有最大池化。非常大的图可以在较低的层使用44的池化。但是需要记住的是,池化在通过16个因子减少信号维度的同时,也可能导致信号细节的大量丢失。`
`446`	`446`
`447`		`-`
	`447`	`+#####技巧`
	`448`	`+假如你想要在新的数据集上采用这个模型,下面的一些小技巧可能能让你获得更好的结果:`
	`449`	`+* 白化(whitening)数据(例如,使用主成分分析)`
	`450`	`+* 衰减每次迭代的学习速率。`
`448`	`451`
`449`	`452`
`450`	`453`

`‎5_denoising_autoencoders_降噪自动编码.md‎`

Lines changed: 260 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,260 @@`
	`1`	`+降噪自动编码机(Denoising Autoencoders)`
	`2`	`+====================================`
	`3`	`+`
	`4`	`+这一节假设读者一节阅读了[使用逻辑回归进行MNIST分类](https://github.com/Syndrome777/DeepLearningTutorial/blob/master/2_Classifying_MNIST_using_LR_逻辑回归进行MNIST分类.md),[多层感知机](https://github.com/Syndrome777/DeepLearningTutorial/blob/master/3_Multilayer_Perceptron_多层感知机.md)。如果你需要在GPU上跑代码,你也需要阅读[GPU](http://deeplearning.net/software/theano/tutorial/using_gpu.html)。`
	`5`	`+`
	`6`	`+本节所有的代码都可用在[这里](http://deeplearning.net/tutorial/code/dA.py)下载。`
	`7`	`+`
	`8`	`+降噪自动编码机(denoising Autoencoders)是经典自动编码机的扩展。它在[Vincent08](http://deeplearning.net/tutorial/references.html#vincent08)中作为深度网络的一个构建块被介绍。我们在通过开始简短的[自动编码机](http://deeplearning.net/tutorial/dA.html#autoencoders)来开始本教程。`
	`9`	`+`
	`10`	`+###自动编码机`
	`11`	`+在[Bengio09](http://deeplearning.net/tutorial/references.html#bengio09)的第4.6节中,有自动编码机的简介。一个自动编码机,由d维的[0,1]之间的输入向量x,通过第一层映射(使用一个编码器)来获得隐藏的d‘维度的[0,1]的输出表达y。通过如下的决定性映射:`
	`12`	`+`
	`13`	`+![y_mapping](/images/5_autoencoders_1.png)`
	`14`	`+`
	`15`	`+这里s是一个非线性的函数,例如sigmoid。这个潜在的表达y,或者码,被映射回一个重构机z,来重构x。这个映射通过下面的简单转换来实现:`
	`16`	`+`
	`17`	`+![z_mapping](5_autoencoders_2.png)`
	`18`	`+`
	`19`	`+(这里撇号不代表矩阵转置)当y被给定时,z被看着是对x的预测。可选的,这个权重矩阵W‘的逆映射可用被约束为正向映射的转置:![tanspose](/images/5_autoencoders_3.png),这被称为捆绑权重。这个模型的所有参数(W,b,b‘,或者不使用捆绑权重W’)通过优化最小平均重构误差来实现训练。`
	`20`	`+`
	`21`	+重建误差可以通过许多方法来度量,基于输入的分布假设。这个传统的平方误差是L(x,z)=\|\|x-z\|\|^2,可以被使用。假如这个输入是通过比特向量或者比特概率向量来表述,重构`交叉熵`([cross-entropy)可以被表示如下:
	`22`	`+`
	`23`	`+![cross-entropy](/images/5_autoencoders_4.png)`
	`24`	`+`
	`25`	+我们希望这样,这个编码y是一个分布式表达,可以捕捉数据中主要因子变化的位置。这就类似与主成分凸出,金额也捕捉数据中主要因子的变化。事实上,假如这里有一个线性隐藏层(这个编码),并且平均平方误差准则被用以训练这个网络,然后这k个隐藏单元学习去凸出这个输入,在该数据的前k个主成分的范围中。假如这个隐藏层是非线性的,这个自动编码机表现的是与PCA不同的,它有着捕捉输入分布的多模态方面的能力。从PCA开始变得更加重要,当我们考虑堆叠混合编码机(stacking multiple encoders,在[Hinton06](http://deeplearning.net/tutorial/references.html#hinton06)中被用以构建一个深度自动编码机)。
	`26`	`+`
	`27`	`+因为y是视为x的有损压缩(lossy compression),它不可能对所有的x有好的压缩。优化可以使得训练样本有好的压缩,同时也希望对别的输入有好的压缩,但是不是对于任意输入。这里有一个对自动编码机的概括定义:它对于与训练样本有相似分布的测试样本有较低的重建误差,但对于随机的输入会有较高的重构误差。`
	`28`	`+`
	`29`	`+我们希望通过Theano中来实现自动编码机,作为一个类的形式,它可以在未来去构建一个层叠自动编码机。这个第一步是去创建自动编码机的共享变量参数(W,b,b‘)。`
	`30`	`+`
	`31`	+```Python
	`32`	`+class dA(object):`
	`33`	`+ """Denoising Auto-Encoder class (dA)`
	`34`	`+`
	`35`	`+ A denoising autoencoders tries to reconstruct the input from a corrupted`
	`36`	`+ version of it by projecting it first in a latent space and reprojecting`
	`37`	`+ it afterwards back in the input space. Please refer to Vincent et al.,2008`
	`38`	`+ for more details. If x is the input then equation (1) computes a partially`
	`39`	`+ destroyed version of x by means of a stochastic mapping q_D. Equation (2)`
	`40`	`+ computes the projection of the input into the latent space. Equation (3)`
	`41`	`+ computes the reconstruction of the input, while equation (4) computes the`
	`42`	`+ reconstruction error.`
	`43`	`+`
	`44`	`+ .. math::`
	`45`	`+`
	`46`	`+ \tilde{x} ~ q_D(\tilde{x}\|x) (1)`
	`47`	`+`
	`48`	`+ y = s(W \tilde{x} + b) (2)`
	`49`	`+`
	`50`	`+ x = s(W' y + b') (3)`
	`51`	`+`
	`52`	`+ L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)] (4)`
	`53`	`+`
	`54`	`+ """`
	`55`	`+`
	`56`	`+ def __init__(`
	`57`	`+ self,`
	`58`	`+ numpy_rng,`
	`59`	`+ theano_rng=None,`
	`60`	`+ input=None,`
	`61`	`+ n_visible=784,`
	`62`	`+ n_hidden=500,`
	`63`	`+ W=None,`
	`64`	`+ bhid=None,`
	`65`	`+ bvis=None`
	`66`	`+ ):`
	`67`	`+ """`
	`68`	`+ Initialize the dA class by specifying the number of visible units (the`
	`69`	`+ dimension d of the input ), the number of hidden units ( the dimension`
	`70`	`+ d' of the latent or hidden space ) and the corruption level. The`
	`71`	`+ constructor also receives symbolic variables for the input, weights and`
	`72`	`+ bias. Such a symbolic variables are useful when, for example the input`
	`73`	`+ is the result of some computations, or when weights are shared between`
	`74`	`+ the dA and an MLP layer. When dealing with SdAs this always happens,`
	`75`	`+ the dA on layer 2 gets as input the output of the dA on layer 1,`
	`76`	`+ and the weights of the dA are used in the second stage of training`
	`77`	`+ to construct an MLP.`
	`78`	`+`
	`79`	`+ :type numpy_rng: numpy.random.RandomState`
	`80`	`+ :param numpy_rng: number random generator used to generate weights`
	`81`	`+`
	`82`	`+ :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams`
	`83`	`+ :param theano_rng: Theano random generator; if None is given one is`
	`84`	+ generated based on a seed drawn from `rng`
	`85`	`+`
	`86`	`+ :type input: theano.tensor.TensorType`
	`87`	`+ :param input: a symbolic description of the input or None for`
	`88`	`+ standalone dA`
	`89`	`+`
	`90`	`+ :type n_visible: int`
	`91`	`+ :param n_visible: number of visible units`
	`92`	`+`
	`93`	`+ :type n_hidden: int`
	`94`	`+ :param n_hidden: number of hidden units`
	`95`	`+`
	`96`	`+ :type W: theano.tensor.TensorType`
	`97`	`+ :param W: Theano variable pointing to a set of weights that should be`
	`98`	`+ shared belong the dA and another architecture; if dA should`
	`99`	`+ be standalone set this to None`
	`100`	`+`
	`101`	`+ :type bhid: theano.tensor.TensorType`
	`102`	`+ :param bhid: Theano variable pointing to a set of biases values (for`
	`103`	`+ hidden units) that should be shared belong dA and another`
	`104`	`+ architecture; if dA should be standalone set this to None`
	`105`	`+`
	`106`	`+ :type bvis: theano.tensor.TensorType`
	`107`	`+ :param bvis: Theano variable pointing to a set of biases values (for`
	`108`	`+ visible units) that should be shared belong dA and another`
	`109`	`+ architecture; if dA should be standalone set this to None`
	`110`	`+`
	`111`	`+`
	`112`	`+ """`
	`113`	`+ self.n_visible = n_visible`
	`114`	`+ self.n_hidden = n_hidden`
	`115`	`+`
	`116`	`+ # create a Theano random generator that gives symbolic random values`
	`117`	`+ if not theano_rng:`
	`118`	`+ theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))`
	`119`	`+`
	`120`	+ # note : W' was written as `W_prime` and b' as `b_prime`
	`121`	`+ if not W:`
	`122`	+ # W is initialized with `initial_W` which is uniformely sampled
	`123`	`+ # from -4*sqrt(6./(n_visible+n_hidden)) and`
	`124`	`+ # 4*sqrt(6./(n_hidden+n_visible))the output of uniform if`
	`125`	`+ # converted using asarray to dtype`
	`126`	`+ # theano.config.floatX so that the code is runable on GPU`
	`127`	`+ initial_W = numpy.asarray(`
	`128`	`+ numpy_rng.uniform(`
	`129`	`+ low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),`
	`130`	`+ high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),`
	`131`	`+ size=(n_visible, n_hidden)`
	`132`	`+ ),`
	`133`	`+ dtype=theano.config.floatX`
	`134`	`+ )`
	`135`	`+ W = theano.shared(value=initial_W, name='W', borrow=True)`
	`136`	`+`
	`137`	`+ if not bvis:`
	`138`	`+ bvis = theano.shared(`
	`139`	`+ value=numpy.zeros(`
	`140`	`+ n_visible,`
	`141`	`+ dtype=theano.config.floatX`
	`142`	`+ ),`
	`143`	`+ borrow=True`
	`144`	`+ )`
	`145`	`+`
	`146`	`+ if not bhid:`
	`147`	`+ bhid = theano.shared(`
	`148`	`+ value=numpy.zeros(`
	`149`	`+ n_hidden,`
	`150`	`+ dtype=theano.config.floatX`
	`151`	`+ ),`
	`152`	`+ name='b',`
	`153`	`+ borrow=True`
	`154`	`+ )`
	`155`	`+`
	`156`	`+ self.W = W`
	`157`	`+ # b corresponds to the bias of the hidden`
	`158`	`+ self.b = bhid`
	`159`	`+ # b_prime corresponds to the bias of the visible`
	`160`	`+ self.b_prime = bvis`
	`161`	`+ # tied weights, therefore W_prime is W transpose`
	`162`	`+ self.W_prime = self.W.T`
	`163`	`+ self.theano_rng = theano_rng`
	`164`	`+ # if no input is given, generate a variable representing the input`
	`165`	`+ if input is None:`
	`166`	`+ # we use a matrix because we expect a minibatch of several`
	`167`	`+ # examples, each example being a row`
	`168`	`+ self.x = T.dmatrix(name='input')`
	`169`	`+ else:`
	`170`	`+ self.x = input`
	`171`	`+`
	`172`	`+ self.params = [self.W, self.b, self.b_prime]`
	`173`	+```
	`174`	+注意,我们将`input`作为一个参数来传递给自动编码机。我们可以串联自动编码机来实现深度网络:第k层的输出(y)可以变成第k+1层的输入。
	`175`	`+`
	`176`	`+现在,我们可以预计去重构信号的潜在表达的计算量。`
	`177`	`+`
	`178`	+```Python
	`179`	`+ def get_hidden_values(self, input):`
	`180`	`+ """ Computes the values of the hidden layer """`
	`181`	`+ return T.nnet.sigmoid(T.dot(input, self.W) + self.b)`
	`182`	`+`
	`183`	`+ def get_reconstructed_input(self, hidden):`
	`184`	`+ """Computes the reconstructed input given the values of the`
	`185`	`+ hidden layer`
	`186`	`+`
	`187`	`+ """`
	`188`	`+ return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)`
	`189`	+```
	`190`	`+然后我们通过这些函数可以计算一个随机梯度下降步骤的cost和更新。`
	`191`	`+`
	`192`	+```Python
	`193`	`+ def get_cost_updates(self, corruption_level, learning_rate):`
	`194`	`+ """ This function computes the cost and the updates for one trainng`
	`195`	`+ step of the dA """`
	`196`	`+`
	`197`	`+ tilde_x = self.get_corrupted_input(self.x, corruption_level)`
	`198`	`+ y = self.get_hidden_values(tilde_x)`
	`199`	`+ z = self.get_reconstructed_input(y)`
	`200`	`+ # note : we sum over the size of a datapoint; if we are using`
	`201`	`+ # minibatches, L will be a vector, with one entry per`
	`202`	`+ # example in minibatch`
	`203`	`+ L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)`
	`204`	`+ # note : L is now a vector, where each element is the`
	`205`	`+ # cross-entropy cost of the reconstruction of the`
	`206`	`+ # corresponding example of the minibatch. We need to`
	`207`	`+ # compute the average of all these to get the cost of`
	`208`	`+ # the minibatch`
	`209`	`+ cost = T.mean(L)`
	`210`	`+`
	`211`	+ # compute the gradients of the cost of the `dA` with respect
	`212`	`+ # to its parameters`
	`213`	`+ gparams = T.grad(cost, self.params)`
	`214`	`+ # generate the list of updates`
	`215`	`+ updates = [`
	`216`	`+ (param, param - learning_rate * gparam)`
	`217`	`+ for param, gparam in zip(self.params, gparams)`
	`218`	`+ ]`
	`219`	`+`
	`220`	`+ return (cost, updates)`
	`221`	+```
	`222`	`+`
	`223`	`+`
	`224`	`+`
	`225`	`+`
	`226`	`+`
	`227`	`+`
	`228`	`+`
	`229`	`+`
	`230`	`+`
	`231`	`+`
	`232`	`+`
	`233`	`+`
	`234`	`+`
	`235`	`+`
	`236`	`+`
	`237`	`+`
	`238`	`+`
	`239`	`+`
	`240`	`+`
	`241`	`+`
	`242`	`+`
	`243`	`+`
	`244`	`+`
	`245`	`+`
	`246`	`+`
	`247`	`+`
	`248`	`+`
	`249`	`+`
	`250`	`+`
	`251`	`+`
	`252`	`+`
	`253`	`+`
	`254`	`+`
	`255`	`+`
	`256`	`+`
	`257`	`+`
	`258`	`+`
	`259`	`+`
	`260`	`+`