Commit 9bdaf97

authored

Create 1.5 MLP Block - Python Example.md

1 parent a83c6ee commit 9bdaf97Copy full SHA for 9bdaf97

File tree

1 file changed

+71

-0

lines changed

GenerativeAI
- 1.5 MLP Block - Python Example.md

1 file changed

+71

-0

lines changed

`‎GenerativeAI/1.5 MLP Block - Python Example.md`

Lines changed: 71 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,71 @@`
	`1`	`+## Multi-Layer Perceptron (MLP) in Transformers`
	`2`	`+`
	`3`	`+The Multi-Layer Perceptron (MLP) is a key component of the Transformer architecture, responsible for refining the representation of each token using a non-linear transformation. Here's the mathematical intuition behind the MLP in Transformers:`
	`4`	`+`
	`5`	`+### Mathematical Formulation`
	`6`	`+`
	`7`	+The MLP in Transformers operates across the features of each token, applying the same non-linear transformation to each token independently. Given the output of the self-attention layer `y(m)_n` for token `n` at layer `m`, the MLP computes:
	`8`	`+`
	`9`	`+$$`
	`10`	`+x^{(m+1)}_n = \text{MLP}_\theta(y^{(m)}_n)`
	`11`	`+$$`
	`12`	`+`
	`13`	+where `\theta` represents the parameters of the MLP, which are shared across all tokens.
	`14`	`+`
	`15`	+The MLP typically consists of one or two hidden layers with a dimension equal to the number of features `D` (or larger). The computational cost of this step is roughly `N * D * D`, where `N` is the sequence length.
	`16`	`+`
	`17`	`+### Example Implementation in Python and NumPy`
	`18`	`+`
	`19`	`+Here's a simple example of implementing the MLP component in Transformers using Python and NumPy:`
	`20`	`+`
	`21`	+```python
	`22`	`+import numpy as np`
	`23`	`+`
	`24`	`+# Define MLP parameters`
	`25`	`+D = 4 # Number of features`
	`26`	`+hidden_size = 8 # Size of the hidden layer`
	`27`	`+`
	`28`	`+# Sample input from the self-attention layer`
	`29`	`+y = np.array([[1, 0, 1, 0],`
	`30`	`+ [0, 1, 0, 1],`
	`31`	`+ [1, 1, 1, 1]])`
	`32`	`+`
	`33`	`+# Initialize MLP weights`
	`34`	`+W1 = np.random.rand(D, hidden_size)`
	`35`	`+b1 = np.random.rand(1, hidden_size)`
	`36`	`+W2 = np.random.rand(hidden_size, D)`
	`37`	`+b2 = np.random.rand(1, D)`
	`38`	`+`
	`39`	`+# Compute MLP output`
	`40`	`+h = np.maximum(0, y @ W1 + b1) # ReLU activation in the hidden layer`
	`41`	`+x = h @ W2 + b2 # Linear output layer`
	`42`	`+`
	`43`	`+print("Input from self-attention layer:\n", y)`
	`44`	`+print("Output of the MLP:\n", x)`
	`45`	+```
	`46`	`+`
	`47`	`+In this example:`
	`48`	`+`
	`49`	+1. We define the MLP parameters, including the number of features `D` and the size of the hidden layer.
	`50`	`+`
	`51`	+2. We create a sample input `y` from the self-attention layer.
	`52`	`+`
	`53`	`+3. We initialize the weights and biases of the MLP randomly.`
	`54`	`+`
	`55`	`+4. We compute the output of the MLP by applying the following steps:`
	`56`	`+ - Compute the hidden layer activation using a ReLU non-linearity.`
	`57`	`+ - Apply the output layer weights and biases to obtain the final output.`
	`58`	`+`
	`59`	`+5. Finally, we print the input from the self-attention layer and the output of the MLP.`
	`60`	`+`
	`61`	`+The MLP in Transformers acts as a non-linear feature extractor, processing the output of the self-attention layer independently for each token. It helps capture complex interactions between features and refine the representations learned by the self-attention mechanism.`
	`62`	`+`
	`63`	`+Citations:`
	`64`	`+[1] https://www.youtube.com/watch?v=kO0XdAsY5YA`
	`65`	`+[2] https://transformer-circuits.pub/2021/framework/index.html`
	`66`	`+[3] https://arxiv.org/abs/2304.10557`
	`67`	`+[4] https://learnopencv.com/attention-mechanism-in-transformer-neural-networks/`
	`68`	`+[5] https://arxiv.org/pdf/2304.10557.pdf`
	`69`	`+[6] https://www.youtube.com/watch?v=idVm0DMaDR4`
	`70`	`+[7] https://towardsdatascience.com/the-math-behind-multi-head-attention-in-transformers-c26cba15f625`
	`71`	`+[8] https://www.youtube.com/watch?v=qw7wFGgNCSU`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 9bdaf97

File tree

1 file changed

1 file changed

`‎GenerativeAI/1.5 MLP Block - Python Example.md`

0 commit comments