Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Reproducing Humaneval Benchmark Results #11

Open
@YSLIU627

Description

Hi, we re-ran the training phase and evaluated the trained model by the evaluation script in your repo. However, we find that there is a performance in the Humaneval benchmark between the trained model and the release score in the paper. We also evaluate the released model in the hugging face and report the results as follows.
20240806-135618
Here the first column is the released score in the paper, the second column is the evaluation result of the released model, and the last column is the evaluation result of our re-trained model. We did not modify any hyper-parameters before training and found that the loss curve of our re-trained model is identical to the one that you released in the other issue (issue #6). We are not sure if you evaluate the model that is saved in the end of the training or some intermediate checkpoint (for example, checkpoint-2000). We will appreciate it quite a lot if you could offer the help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /