Transformers for Natural Language Processing and Computer Vision: Take Generative AI and LLMs to the next level with Hugging Face, Google Vertex AI, ChatGPT, GPT-4V, and DALL-E 3 3rd Edition
 by Denis Rothman 
drawing 
This repo is continually updated and upgraded.
Last updated: August 14, 2025
📝 For details on updates and improvements, see the Changelog.
🚩If you see anything that doesn't run as expected, raise an issue, and we'll work on it!
Look for 🐬 to explore new bonus notebooks such as and DeepSeek-R1 and OpenAI o1 reasoning models, Midjourney's API, Google Vertex AI Gemini's API, OpenAI asynchronous batch API calls!
Look for 🎏 to explore existing notebooks for the latest model or platform releases, such as OpenAI's latest models (GPT-4o and o1).
Look for 🛠 to run existing notebooks with new dependency versions and platform API constraints and tweaks.
This is the code repository for Transformers for Natural Language Processing and Computer Vision, published by Packt.
Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3
Transformers for Natural Language Processing and Computer Vision, Third Edition, explores Large Language Model (LLM) architectures, applications, and various platforms (Hugging Face, OpenAI, and Google Vertex AI) used for Natural Language Processing (NLP) and Computer Vision (CV).
Dive into generative vision transformers and multimodal model architectures and build applications, such as image and video-to-text classifiers. Go further by combining different models and platforms and learning about AI agent replication.
- Learn how to pretrain and fine-tune LLMs
- Learn how to work with multiple platforms, such as Hugging Face, OpenAI, and Google Vertex AI
- Learn about different tokenizers and the best practices for preprocessing language data
- Implement Retrieval Augmented Generation and rules bases to mitigate hallucinations
- Visualize transformer model activity for deeper insights using BertViz, LIME, and SHAP
- Create and implement cross-platform chained models, such as HuggingGPT
- Go in-depth into vision transformers with CLIP, DALL-E 2, DALL-E 3, and GPT-4V
- What Are Transformers?
- Getting Started with the Architecture of the Transformer Model
- Emergent vs Downstream Tasks: The Unseen Depths of Transformers
- Advancements in Translations with Google Trax, Google Translate, and Gemini
- Diving into Fine-Tuning through BERT
- Pretraining a Transformer from Scratch through RoBERTa
- The Generative AI Revolution with ChatGPT
- Fine-Tuning OpenAI GPT Models
- Shattering the Black Box with Interpretable Tools
- Investigating the Role of Tokenizers in Shaping Transformer Models
- Leveraging LLM Embeddings as an Alternative to Fine-Tuning
- Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4
- Summarization with T5 and ChatGPT
- Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2
- Guarding the Giants: Mitigating Risks in Large Language Models
- Beyond Text: Vision Transformers in the Dawn of Revolutionary AI
- Transcending the Image-Text Boundary with Stable Diffusion
- Hugging Face AutoTrain: Training Vision Models without Coding
- On the Road to Functional AGI with HuggingGPT and its Peers
- Beyond Human-Designed Prompts with Generative Ideation
Appendix: Answers to the Questions
You can run the notebooks directly from the table below:
| Chapter | Colab | Kaggle | Gradient | StudioLab | 
|---|---|---|---|---|
| Part I The Foundations of Transformer Models | ||||
| Chapter 1: What are Transformers? | ||||
| 
 | Open In Colab Open In Colab | Kaggle Kaggle | Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Getting started with DeepSeek-R1 Reasoning models. Integrated into HuggingFace Hub and Together. | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| Chapter 2: Getting Started with the Architecture of the Transformer Model | ||||
| 
 | Open In Colab Open In Colab | Kaggle Kaggle | Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Explaining DeepSeek's Training innovations; Part 1: RL | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| Explaining DeepSeek's Training innovations; Part 2: RoPE | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| Chapter 3: Emergent vs Downstream Tasks: the Unseen Depths of Transformers | ||||
| 
 | Open In Colab Open In Colab | Kaggle Kaggle | Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Chapter 4: Advancements in Translations with Google Trax, Google Translate, and Google Bard | ||||
| 
 | Open In Colab Open In Colab | Kaggle Kaggle | Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Chapter 5: Diving into Fine-Tuning through BERT | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| Chapter 6: Pretraining a Transformer from Scratch through RoBERTa | ||||
| 
 | Open In Colab Open In Colab | Kaggle Kaggle | Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Part II: The Rise of Suprahuman NLP | ||||
| Chapter 7: The Generative AI Revolution with ChatGPT | ||||
| 
 | Open In Colab Open In Colab Open In Colab Open In Colab | Kaggle Kaggle Kaggle Kaggle | Gradient Gradient Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| OpenAI Reasoning models: the o1 API | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| OpenAI Reasoning models: the o1-preview API | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| Chapter 8: Fine-tuning OpenAI Models | ||||
| 
 | Open In Colab Open In Colab | Kaggle Kaggle | Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Fine-Tuning GPT-4.1 | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| 🐬RAG as an alternative to fine-tuning: Building Scalable Knowledge-based RAG-drive Generative AI | ||||
| Click here to access an open-source library to implement RAG | ||||
| Chapter 9: Shattering the Black Box with Interpretable tools | ||||
| 
 | Open In Colab Open In Colab | Kaggle Kaggle | Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Chapter 10: Investigating the Role of Tokenizers in Shaping Transformer Models | ||||
| 
 | Open In Colab Open In Colab Open In Colab | Kaggle Kaggle Kaggle | Gradient Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Chapter 11: Leveraging LLM Embeddings as an Alternative to Fine-Tuning | ||||
| 
 | Open In Colab Open In Colab Open In Colab | Kaggle Kaggle Kaggle | Gradient Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Chapter 12: Towards Syntax-Free Semantic Role Labeling with BERT and OpenAI's ChatGPT | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| Chapter 13: Summarization with T5 and ChatGPT | ||||
| 
 | Open In Colab Open In Colab | Kaggle Kaggle | Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Chapter 14: Exploring Cutting-Edge NLP with Google Vertex AI(PaLM and🐬Gemini with gemini-1.5-flash-001 | ||||
| 
 | Open In Colab Open In Colab | Kaggle Kaggle | Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Gemini 2.5 Flash showcase of Generative AI tasks | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| Chapter 15: Guarding the Giants: Mitigating Risks in Large Language Models< | ||||
| 
 | Open In Colab Open In Colab Open In Colab Open In Colab Open In Colab Open In Colab | Kaggle Kaggle Kaggle Kaggle Kaggle Kaggle | Gradient Gradient Gradient Gradient Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Part III: Generative Computer Vision: A New Way to See the World | ||||
| Chapter 16: Vision Transformers in the Dawn of Revolutionary AI | ||||
| 
 | Open In Colab Open In Colab Open In Colab | Kaggle Kaggle Kaggle | Gradient Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Chapter 17: Transcending the Image-Text Boundary with Stable Diffusion | ||||
| 
 | Open In Colab Open In Colab Open In Colab Open In Colab Open In Colab | Kaggle Kaggle Kaggle Kaggle Kaggle | Gradient Gradient Gradient Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
| Stable Diffusion with Hugging Face | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| Chapter 18: Automated Vision Transformer Training | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| Chapter 19: On the Road to Functional AGI with HuggingGPT and its Peers | ||||
| 
 | Open In Colab | Kaggle | Gradient | Open In SageMaker Studio Lab | 
| Chapter 20: Generative AI Ideation Vertex AI, Langchain, and Stable Diffusion | ||||
| 
 | Open In Colab Open In Colab Open In Colab Open In Colab | Kaggle Kaggle Kaggle Kaggle | Gradient Gradient Gradient Gradient | Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab Open In SageMaker Studio Lab | 
You can create an issue We will be glad to provide support!Supportin this repository if you encounter one in the notebooks.
If you feel this book is for you, get your copy today! Coding
Know more on the Discord server Coding
You can get more engaged on the Discord server for more latest updates and discussions in the community at Discord
Download a free PDF Coding
If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost. Simply click on the link to claim your Free PDF Coding
We also provide a PDF file that has color images of the screenshots/diagrams used in this book at ColorImages Coding
Denis Rothman graduated from Sorbonne University and Paris-Cité University, designing one of the first patented encoding and embedding systems and teaching at Paris-I Panthéon Sorbonne.He authored one of the first patented word encoding and AI bots/robots. He began his career delivering a Natural Language Processing (NLP) chatbot for Moët et Chandon(LVMH) and an AI tactical defense optimizer for Airbus (formerly Aerospatiale). Denis then authored an AI optimizer for IBM and luxury brands, leading to an Advanced Planning and Scheduling (APS) solution used worldwide. LinkedIn