Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

pos_embedding_layer question #11

Answered by rasbt
nicolaleo asked this question in Q&A
Discussion options

is it correct that the embeddings for token and position have the same input size equals to vocab_size?
it soudns me strange that a pos_embedding is related with the total vocab size,

token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
pos_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)

You must be logged in to vote

Good question. It should be equal to to the maximum context length, which is usually smaller than the vocabulary size. E.g., for GPT-2 that would be 1024 but for modern LLMs that usually somewhere above 2056. I think in the recent GPT-4 model it's >100k now.

I will modify this using a separate parameter to make it more clear. E.g.,

token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
pos_embedding_layer = torch.nn.Embedding(context_len, output_dim)

Replies: 1 comment 2 replies

Comment options

Good question. It should be equal to to the maximum context length, which is usually smaller than the vocabulary size. E.g., for GPT-2 that would be 1024 but for modern LLMs that usually somewhere above 2056. I think in the recent GPT-4 model it's >100k now.

I will modify this using a separate parameter to make it more clear. E.g.,

token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
pos_embedding_layer = torch.nn.Embedding(context_len, output_dim)
You must be logged in to vote
2 replies
Comment options

exactly what I meant. Thanks and congratulations for the good work

Comment options

Thanks! I saw I already had used block_size in the chapter 2 file. I adjusted it accordingly in the other files.

Answer selected by rasbt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants

AltStyle によって変換されたページ (->オリジナル) /