-
Notifications
You must be signed in to change notification settings - Fork 11.2k
-
is it correct that the embeddings for token and position have the same input size equals to vocab_size?
it soudns me strange that a pos_embedding is related with the total vocab size,
token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
pos_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
Beta Was this translation helpful? Give feedback.
All reactions
Good question. It should be equal to to the maximum context length, which is usually smaller than the vocabulary size. E.g., for GPT-2 that would be 1024 but for modern LLMs that usually somewhere above 2056. I think in the recent GPT-4 model it's >100k now.
I will modify this using a separate parameter to make it more clear. E.g.,
token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim) pos_embedding_layer = torch.nn.Embedding(context_len, output_dim)
Replies: 1 comment 2 replies
-
Good question. It should be equal to to the maximum context length, which is usually smaller than the vocabulary size. E.g., for GPT-2 that would be 1024 but for modern LLMs that usually somewhere above 2056. I think in the recent GPT-4 model it's >100k now.
I will modify this using a separate parameter to make it more clear. E.g.,
token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim) pos_embedding_layer = torch.nn.Embedding(context_len, output_dim)
Beta Was this translation helpful? Give feedback.
All reactions
-
exactly what I meant. Thanks and congratulations for the good work
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 1
-
Thanks! I saw I already had used block_size in the chapter 2 file. I adjusted it accordingly in the other files.
Beta Was this translation helpful? Give feedback.