Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/ attentif Public

A toy implementation of "Attention Is All You Need"

Notifications You must be signed in to change notification settings

neet/attentif

Repository files navigation

attentif

codecov

A toy implementation of "Attention Is All You Need"

A matplotlib capture for loss vs step

Demo

BERT

Screenshot of Jupyter Lab, solving a fill-mask task by BERT

GPT2

Screenshot of Jupyter Lab, solving a generate text task by GPT2

Motivation

I made this project in order to get a deeper understanding for the Transformer architecture, BERT, RoBERTa, T5, and GPT models. We often rely on existing Transformer implementation such as Hugging Face Transformers when we need to train a model. However, I wanted to test if I can implement them from scratch, referring to the paper.

This project does include:

  • torch.nn.Module
  • torch.nn.Parameter
  • Existing tokenizer implementation from transformers
  • And other primitive functions offered by PyTorch

While this project does not include:

  • Any models from transformers
  • nn.Transformer
  • nn.MultiheadAttention
  • nn.Embedding
  • nn.LayerNorm
  • nn.functional.softmax
  • And other existing modules that plays an essential role in Transformer architecture

Features

We implemented the following features so far. You can find the layers and functions in src/layers, and models in src/models.

Functions

  • dropout
  • softmax
  • gelu
  • positional_encoding

Layers

  • MultiHeadAttention
  • FeedForwardNetwork
  • LayerNorm
  • TokenEmbedding
  • TransformerEncoder
  • TransformerEncoderBlock
  • TransformerDecoder
  • TransformerDecoderBlock

Models

  • BertModel
  • GPT2Model
  • T5Model

Schedulers

We use transformers for schedulers for now, but have a plan to implement them from scratch in the future.

  • AdamW
  • CrossEntropy

References

About

A toy implementation of "Attention Is All You Need"

Topics

Resources

Stars

Watchers

Forks

Languages

AltStyle によって変換されたページ (->オリジナル) /