Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ZZR0/CoderUJB

Repository files navigation

CoderUJB

This is the official repository for CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios, accepted to the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA) 2024.

CoderUJB (Unified Java Benchmark): A new benchmark designed to evaluate LLMs across diverse Java programming tasks that are executable and reflective of actual development scenarios, acknowledging Java’s prevalence in real-world software production.

Contents

Install

  1. Install codeujb.

    # create a new conda environment
    conda create -n ujb python=3.10
    conda activate ujb
    # clone and install codeujb
    git clone https://github.com/ZZR0/ISSTA24-CoderUJB.git
    cd ISSTA24-CoderUJB
    pip install -r requirements.txt
    pip install -e .
    

    For more details packages version, please refer to requirements.txt.

  2. Refer to defects4j repository for install execution environment.

CodeUJB

Evaluate a model on CodeUJB

Step 1. Generate model answers to CodeUJB questions

We support three backbones for generating CodeUJB answers: hf, openai and tgi.

# generate answers with huggingface `transformers` backbone.
python code_ujb/generate_hf.py \
 --model-path $model_name_or_path \
 --model-id $run_id \
 --gen-mode $gen_mode \
 --bench-name $dataset \
 --num-samples $num_samples \
 --save-generations-path ./log/$run_id/$dataset/generations-$gen_mode.json 
# generate answers with openai API backbone.
export OPENAI_API_BASE=''
export OPENAI_API_KEY=''
python code_ujb/generate_api.py \
 --model-path $run_id \
 --model-id $run_id \
 --gen-mode $gen_mode \
 --bench-name $dataset \
 --num-samples $num_samples \
 --parallel 8 \
 --save-generations-path ./log/$run_id/$dataset/generations-$gen_mode.json 
# If `model-id` not in OpenAI model list, `generate_api.py` will generate answers with Text Generation Inference backbone.
# Please refer to [Text Generation Inference](https://github.com/huggingface/text-generation-inference) for deploying your TGI server first.
export TGI_API_URL_${run_id//-/_}=http://127.0.0.1:8081,http://127.0.0.1:8082 # The Text Generation Inference API URL.
python code_ujb/generate_api.py \
 --model-path $run_id \
 --model-id $run_id \
 --gen-mode $gen_mode \
 --bench-name $dataset \
 --num-samples $num_samples \
 --parallel 32 \
 --save-generations-path ./log/$run_id/$dataset/generations-$gen_mode.json 

Arguments:

  • [model-path] is the path to the weights, which can be a local folder or a Hugging Face repo ID. If you using generate_api.py, it should be the same as model ID.
  • [model-id] is a name you give to the model.
  • [gen-mode] have two options: complete for model without instruction-finetuning and chat for model with instruction-finetuning.
  • [bench-name] is the name of the dataset you want to evaluate. There five datasets in CodeUJB: codeujbrepair, codeujbcomplete, codeujbtestgen, codeujbtestgenissue, codeujbdefectdetection.
  • [num-samples] is the number of samples for each coding question you want to generate.
  • [save-generations-path] is the path to save the generated answer.
  • [parallel] is the number of parallel API calls. e.g.,
python code_ujb/generate_api.py --model-path gpt-3.5-turbo --model-id gpt-3.5-turbo --gen-mode chat --bench-name codeujbcomplete --num-samples 10 --save-generations-path log/gpt-3.5-turbo/codeujbcomplete/generations-chat.jsonl

The answers will be saved to log/gpt-3.5-turbo/codeujbcomplete/generations-chat.jsonl.

Step 2. Evaluate model answers of CodeUJB

Please make sure you have installed defects4j first.

python3 code_ujb/evaluate.py \
 --model-path $model_name_or_path \
 --model-id $run_id \
 --gen-mode $gen_mode \
 --bench-name $dataset \
 --num-samples $num_samples \
 --load-generations-path ./log/$run_id/$dataset/generations-$gen_mode.json \
 --eval-output-path ./log/$run_id/$dataset/evaluation-$gen_mode.json

Arguments:

  • [load-generations-path] is the path to the generated answer.
  • [eval-output-path] is the path to save the evaluation results.

e.g.,

python code_ujb/evaluate.py --model-path gpt-3.5-turbo --model-id gpt-3.5-turbo --gen-mode chat --bench-name codeujbcomplete --num-samples 10 --load-generations-path log/gpt-3.5-turbo/codeujbcomplete/generations-chat.jsonl --eval-output-path ./log/gpt-3.5-turbo/codeujbcomplete/evaluation-chat.json

The evaluation results will be saved to ./log/gpt-3.5-turbo/codeujbcomplete/evaluation-chat.json

QuickStart Scripts

# generate and evaluate with openai api, please setting the Openai API key first.
# export OPENAI_API_BASE=''
# export OPENAI_API_KEY=''
./scripts/run_code_ujb.sh api_gen chat multiplepython gpt-3.5-turbo gpt-3.5-turbo
./scripts/run_code_ujb.sh eval chat multiplepython gpt-3.5-turbo gpt-3.5-turbo
# generate with ray inference
./scripts/run_code_ujb.sh local_gen chat multiplepython $HOME/models/deepseekcoder-instruct-7b deepseekcoder-instruct-7b
./scripts/run_code_ujb.sh eval chat multiplepython $HOME/models/deepseekcoder-instruct-7b deepseekcoder-instruct-7b
# generate with tgi inference
./scripts/run_code_ujb.sh tgi_gen chat multiplepython $HOME/models/deepseekcoder-instruct-7b deepseekcoder-instruct-7b
./scripts/run_code_ujb.sh eval chat multiplepython $HOME/models/deepseekcoder-instruct-7b deepseekcoder-instruct-7b

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

AltStyle によって変換されたページ (->オリジナル) /