-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Assignees
@mstallone
Description
I am trying to recreate the StarCoder2-Instruct-v0.1 model; however, the model produced by the provided command in the README (copied below) does not match the evaluation of the StarCoder2-Instruct-v0.1 model on HF.
I actually see quite a bit of discrepancy between the two models' evaluations: humaneval on your HF version is 7 points higher than on my reproduced model (both models were evaluated locally by me in the same environment).
MODEL_KEY=bigcode/starcoder2-15b LR=1e-5 EPOCH=4 SEQ_LEN=1280 WARMUP_RATIO=0.05 OUTPUT_DIR=/path/to/output_model DATASET_FILE=/path/to/50k-dataset.jsonl accelerate launch -m star_align.train \ --model_key $MODEL_KEY \ --model_name_or_path $MODEL_KEY \ --use_flash_attention True \ --datafile_paths $DATASET_FILE \ --output_dir $OUTPUT_DIR \ --bf16 True \ --num_train_epochs $EPOCH \ --max_training_seq_length $SEQ_LEN \ --pad_to_max_length False \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 64 \ --group_by_length False \ --ddp_find_unused_parameters False \ --logging_steps 1 \ --log_level info \ --optim adafactor \ --max_grad_norm -1 \ --warmup_ratio $WARMUP_RATIO \ --learning_rate $LR \ --lr_scheduler_type linear
Are the parameters in the README correct for the released model? Are you adding anything in your accelerate config? i.e. any model wrappers or something else?
For the data, I just ran:
>>> from datasets import load_dataset >>> load_dataset("bigcode/self-oss-instruct-sc2-exec-filter-50k", split="train").to_json("/path/to/50k-dataset.jsonl", lines=True)
Do you have any ideas on how I can reproduce your model? Thanks!