Releases: jshn9515/deep-learning-notes

June 2026 Release

11 Jun 05:02

@jshn9515 jshn9515

v2026.06.11

f50e94c

This commit was signed with the committer’s verified signature.

jshn9515 jshn9515

GPG key ID: FF343E42C7DF25DD

Verified

Learn about vigilant mode.

June 2026 Release Latest

Latest

This release significantly expands the project with new chapters on optimization algorithms, Vision Transformers (ViT), and additional PyTorch fundamentals. The accompanying dnnl library has also been extended with new neural network components and model implementations.

New Notebooks

Chapter 3: Multi-Layer Perceptron: From Single Layer to Deep Nonlinear Modeling

3.1 From Linear Classifiers to MLPs: Why We Need Hidden Layers
3.2 Activation Functions: Adding Nonlinearity to Neural Networks
3.3 Softmax and Cross Entropy: From Logits to Classification Loss
3.4 Forward and Backward Propagation of Linear Layers
3.5 Building a Complete MLP with NumPy
3.6 Train MLP on MNIST with NumPy
3.7 Backward Propagation Check: Using Numerical Gradients to Verify Handwritten Backward
3.8 Reimplementing MLP with PyTorch nn.Module

Chapter 4: Optimization Algorithms: How Neural Networks Update Parameters

4.1 From Gradient Descent to SGD
4.2 Momentum and Nesterov Momentum
4.3 Adagrad: Adapting the Learning Rate for Each Parameter
4.4 RMSprop and Adadelta: Fixing Adagrad's Learning-rate Decay
4.5 Adam: Combining Momentum and Adaptive Scaling
4.6 AdamW: Decoupling Weight Decay from Adam
4.7 Muon: Orthogonalizing Matrix Updates
4.8 Optimizer Map: When to Use Which Optimization Algorithm
4.9 Learning Rate Schedulers: How the Learning Rate Changes During Training

Chapter 11: Vision Transformer: From Image Classification to Visual Sequence Modeling

11.1 From CNN to Vision Transformer: Treating Images as Sequences
11.2 Patch Embedding: Cutting Images into Tokens
11.3 Class Token and Positional Embedding: Letting a Sequence Represent the Whole Image
11.4 ViT Encoder: Letting Patch Tokens Exchange Information
11.5 ViT Backbone: Pretraining and Fine-tuning

`dnnl` Package Updates

Added NumPy-based implementations of common neural network building blocks, including linear layers, activation functions, loss functions, normalization layers, and optimizers.
Added a complete NumPy MLP implementation with forward propagation, backpropagation, gradient checking, and MNIST training examples.
Added Vision Transformer (ViT) components, including patch embedding, class tokens, positional embeddings, Transformer encoders, and classification heads.
Expanded Transformer-related modules and improved interoperability between educational examples and reusable library code.
Added optimizer implementations including SGD, momentum, Nesterov momentum, Adagrad, RMSprop, Adam, AdamW, and Muon.
Added learning rate scheduler and optimizer-related utilities.
Improved package organization and documentation across neural network, optimization, and vision-related modules.
Expanded test coverage and examples for newly introduced models and optimization algorithms.
Updated package metadata, dependencies, CI workflows, and development tooling.

CI Updates

Migrated GitHub Actions workflows to use GitHub Artifact Attestations for build provenance and artifact verification.
Replaced Quarto _freeze caching with GitHub Actions cache to reduce repository size and improve CI performance.
Improved workflow reliability and build reproducibility across documentation and package pipelines.

Merged Pull Requests

Bump numpy from 2.4.5 to 2.4.6 by @dependabot[bot] in #5
Update transformers requirement from ~=5.8.0 to ~=5.9.0 by @dependabot[bot] in #8
Fix view operations for q, k, v in multi-head attention by @kbyy123 in #11
Fix typo in decoder explanation by @kbyy123 in #12
[en] Fix formula rendering issues in ch1.3 based on CN version by @wqpwqp1222 in #13
Update dependency gdown to >=6.1.0,<6.2.0 by @renovate[bot] in #16
Update dependency scikit-learn to >=1.9.0,<1.10.0 by @renovate[bot] in #17
Remove extra 'not' in zero_grad example code for both zh and en versions by @wqpwqp1222 in #18
Update dependency transformers to >=5.10.1,<5.11.0 by @renovate[bot] in #19
Update dependency datasets to v5 by @renovate[bot] in #20
Update dependency diffusers to >=0.38.0,<0.39.0 by @renovate[bot] in #24
Update dependency transformers to >=5.11.0,<5.12.0 by @renovate[bot] in #25

New Contributors

@kbyy123 made their first contribution in #11
@wqpwqp1222 made their first contribution in #13
@renovate[bot] made their first contribution in #16

Note

This project continues to be maintained in both Chinese and English through a Quarto-based structure, as an open and continuously growing collection of deep learning study notes.

Full Changelog: v2026.05.09...v2026.06.11

Contributors

renovate, dependabot, and 2 other contributors

Assets 8

1 person reacted

May 2026 Release

10 May 03:44

@jshn9515 jshn9515

v2026.05.09

e8336b4

This commit was signed with the committer’s verified signature.

jshn9515 jshn9515

GPG key ID: FF343E42C7DF25DD

Verified

Learn about vigilant mode.

May 2026 Release

This release completes the Attention and Transformers chapter, adds English versions for all Chinese content, improves notebook packaging and formatting, and introduces a rewritten dnnl package with tests and CI support.

New Notebooks

Chapter 1: Introduction to Deep Learning

1.1 Neural Networks: A Learnable Function

Chapter 8: Attention and Transformers: From Fixed-Length Encoding to Dynamic Context Modeling

8.1 Bahdanau Attention: From Information Compression to Dynamic Retrieval
8.2 Cross-Attention: One Sequence Querying Another Sequence
8.3 Self-Attention: Internal Information Interaction within a Sequence
8.4 Multi-Head Attention: From Single Perspective to Multiple Perspectives
8.5 Positional Encoding: Adding Positional Information to Attention
8.6 Transformer Encoder: Stacking Self-Attention Layers
8.7 Transformer Decoder: Masked Self-Attention and Cross-Attention
8.8 Encoder-Decoder Transformer: Connecting Encoder and Decoder
8.9 KV Cache: Why We Don't Recompute the Past During Inference
8.10 Three Different Transformer Architectures: Understanding, Generation, and Input-Output Conversion
8.11 Hugging Face Transformers API: From Structure to Calls

Repository and Publishing Updates

Added English versions for all Chinese content.
Packaged notebooks now include images.
Refined page navigation, code output wrapping, Open Graph descriptions, blockquote emphasis, and plaintext code block styling.
Regular version bump.

`dnnl` Package Updates

Completely rewrote dnnl around a PyTorch-like API, with module classes under dnnl.nn and stateless helpers under dnnl.nn.functional.
Removed the old chapter-based package layout, including dnnl.ch8, dnnl.ch10, dnnl.ch13, and dnnl.ch14.
Reorganized dnnl into reusable neural-network components instead of chapter-specific modules.
Added attention, FlashAttention, positional encoding, Transformer, AE/VAE, diffusion, and UNet-related code.
Improved attention and Transformer APIs to better align with PyTorch behavior.
Updated projection bias handling, causal masks, attention weights, and functional interfaces.
Added unit tests for attention, FlashAttention, AE/VAE, diffusion, Transformer, and PyTorch compatibility checks.
Added a dedicated GitHub Actions workflow for testing and building dnnl.
Updated the dnnl version, package metadata, dependencies, package-specific Ruff configuration.

Note

This project continues to be maintained in both Chinese and English through a Quarto-based structure, as an open and continuously growing collection of deep learning study notes.

Full Changelog: v2026.04.21...v2026.05.09

Assets 6

April 2026 Release

21 Apr 05:06

@jshn9515 jshn9515

v2026.04.21

bf2f4a5

This commit was signed with the committer’s verified signature.

jshn9515 jshn9515

GPG key ID: FF343E42C7DF25DD

Verified

Learn about vigilant mode.

April 2026 Release

This first release introduces the initial public version of these notes, covering topics from deep learning fundamentals to modern architectures and generative models.

New Notebooks

Chapter 1: Introduction to Deep Learning

1.3 Forward Propagation, Backpropagation, and Computation Graphs

Chapter 2: Getting Started with PyTorch

2.1 Automatic Differentiation in PyTorch
2.2 Gradient Recording and Control in PyTorch

Chapter 10: FlashAttention: Efficient Implementation of Attention Mechanism

10.1 Why Attention is IO-Bound
10.2 10.2 Flash Attention v1: Eliminating the IO Bottleneck in Attention Mechanisms

Chapter 12: GAN: Generative Adversarial Networks

12.1 GANs: The Basics of Generative Adversarial Networks

Chapter 13: VAE: Variational Autoencoders

13.1 Autoencoder: Starting with Compression and Reconstruction
13.2 VAE: Probabilistic Modeling and the Reparameterization Trick
13.3 ELBO: Where Does the VAE's Objective Function Come From?
13.4 VAE Training Phenomena and Latent Space Intuition
13.5 VAE: Advantages, Limitations, and Future Developments

Chapter 14: Diffusion Models: From Denoising to Generation

14.1 DDPM: From Denoising to Generation
14.2 The Forward Process of DDPM: From Image to Noise
14.3 DDPM's Reverse Denoising Process and Training Objective
14.4 DDPM Network Architecture and Sampling Process
14.5 DDPM from a Variational Perspective: Where Does the ELBO Come From?

Chapter 15: CLIP: Multimodal Models Integrating Vision and Language

15.1 CLIP: Connecting Images and Language with Contrastive Learning

Repository and Publishing Updates

Added GitHub Actions workflows for packaging and publishing Quarto notebooks.
Updated Giscus configuration.
Added a _freeze folder for caching.

`dnnl` Package Updates

Added notes and setup-related updates around the dnnl package.
Added chapter-based implementations in dnnl to support examples and code used across individual chapters.
Added a dedicated GitHub Actions workflow for dnnl packaging.

Note

This project continues to be maintained in both Chinese and English through a Quarto-based structure, as an open and continuously growing collection of deep learning study notes.

Full Changelog: https://github.com/jshn9515/deep-learning-notes/commits/v2026.04.21

Assets 6

Releases: jshn9515/deep-learning-notes

June 2026 Release

New Notebooks

Chapter 3: Multi-Layer Perceptron: From Single Layer to Deep Nonlinear Modeling

Chapter 4: Optimization Algorithms: How Neural Networks Update Parameters

Chapter 11: Vision Transformer: From Image Classification to Visual Sequence Modeling

dnnl Package Updates

CI Updates

Merged Pull Requests

New Contributors

Contributors

Uh oh!

May 2026 Release

New Notebooks

Chapter 1: Introduction to Deep Learning

Chapter 8: Attention and Transformers: From Fixed-Length Encoding to Dynamic Context Modeling

Repository and Publishing Updates

dnnl Package Updates

Uh oh!

April 2026 Release

New Notebooks

Chapter 1: Introduction to Deep Learning

Chapter 2: Getting Started with PyTorch

Chapter 10: FlashAttention: Efficient Implementation of Attention Mechanism

Chapter 12: GAN: Generative Adversarial Networks

Chapter 13: VAE: Variational Autoencoders

Chapter 14: Diffusion Models: From Denoising to Generation

Chapter 15: CLIP: Multimodal Models Integrating Vision and Language

Repository and Publishing Updates

dnnl Package Updates

Uh oh!

`dnnl` Package Updates

`dnnl` Package Updates

`dnnl` Package Updates