Name	Name	Last commit message	Last commit date
Latest commit History 31 Commits
code_parser	code_parser
code_ujb	code_ujb
datasets	datasets
scripts	scripts
.gitignore	.gitignore
ARTIFACT.md	ARTIFACT.md
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt
setup.py	setup.py

CoderUJB

This is the official repository for CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios, accepted to the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA) 2024.

CoderUJB (Unified Java Benchmark): A new benchmark designed to evaluate LLMs across diverse Java programming tasks that are executable and reflective of actual development scenarios, acknowledging Java’s prevalence in real-world software production.

Install

Install codeujb.

# create a new conda environment
conda create -n ujb python=3.10
conda activate ujb
# clone and install codeujb
git clone https://github.com/ZZR0/ISSTA24-CoderUJB.git
cd ISSTA24-CoderUJB
pip install -r requirements.txt
pip install -e .

For more details packages version, please refer to requirements.txt.

Refer to defects4j repository for install execution environment.

CodeUJB

Evaluate a model on CodeUJB

Step 1. Generate model answers to CodeUJB questions

We support three backbones for generating CodeUJB answers: hf, openai and tgi.

# generate answers with huggingface `transformers` backbone.
python code_ujb/generate_hf.py \
 --model-path $model_name_or_path \
 --model-id $run_id \
 --gen-mode $gen_mode \
 --bench-name $dataset \
 --num-samples $num_samples \
 --save-generations-path ./log/$run_id/$dataset/generations-$gen_mode.json

# generate answers with openai API backbone.
export OPENAI_API_BASE=''
export OPENAI_API_KEY=''
python code_ujb/generate_api.py \
 --model-path $run_id \
 --model-id $run_id \
 --gen-mode $gen_mode \
 --bench-name $dataset \
 --num-samples $num_samples \
 --parallel 8 \
 --save-generations-path ./log/$run_id/$dataset/generations-$gen_mode.json

# If `model-id` not in OpenAI model list, `generate_api.py` will generate answers with Text Generation Inference backbone.
# Please refer to [Text Generation Inference](https://github.com/huggingface/text-generation-inference) for deploying your TGI server first.
export TGI_API_URL_${run_id//-/_}=http://127.0.0.1:8081,http://127.0.0.1:8082 # The Text Generation Inference API URL.
python code_ujb/generate_api.py \
 --model-path $run_id \
 --model-id $run_id \
 --gen-mode $gen_mode \
 --bench-name $dataset \
 --num-samples $num_samples \
 --parallel 32 \
 --save-generations-path ./log/$run_id/$dataset/generations-$gen_mode.json

Arguments:

[model-path] is the path to the weights, which can be a local folder or a Hugging Face repo ID. If you using generate_api.py, it should be the same as model ID.
[model-id] is a name you give to the model.
[gen-mode] have two options: complete for model without instruction-finetuning and chat for model with instruction-finetuning.
[bench-name] is the name of the dataset you want to evaluate. There five datasets in CodeUJB: codeujbrepair, codeujbcomplete, codeujbtestgen, codeujbtestgenissue, codeujbdefectdetection.
[num-samples] is the number of samples for each coding question you want to generate.
[save-generations-path] is the path to save the generated answer.
[parallel] is the number of parallel API calls. e.g.,

python code_ujb/generate_api.py --model-path gpt-3.5-turbo --model-id gpt-3.5-turbo --gen-mode chat --bench-name codeujbcomplete --num-samples 10 --save-generations-path log/gpt-3.5-turbo/codeujbcomplete/generations-chat.jsonl

The answers will be saved to log/gpt-3.5-turbo/codeujbcomplete/generations-chat.jsonl.

Step 2. Evaluate model answers of CodeUJB

Please make sure you have installed defects4j first.

python3 code_ujb/evaluate.py \
 --model-path $model_name_or_path \
 --model-id $run_id \
 --gen-mode $gen_mode \
 --bench-name $dataset \
 --num-samples $num_samples \
 --load-generations-path ./log/$run_id/$dataset/generations-$gen_mode.json \
 --eval-output-path ./log/$run_id/$dataset/evaluation-$gen_mode.json

Arguments:

[load-generations-path] is the path to the generated answer.
[eval-output-path] is the path to save the evaluation results.

e.g.,

python code_ujb/evaluate.py --model-path gpt-3.5-turbo --model-id gpt-3.5-turbo --gen-mode chat --bench-name codeujbcomplete --num-samples 10 --load-generations-path log/gpt-3.5-turbo/codeujbcomplete/generations-chat.jsonl --eval-output-path ./log/gpt-3.5-turbo/codeujbcomplete/evaluation-chat.json

The evaluation results will be saved to ./log/gpt-3.5-turbo/codeujbcomplete/evaluation-chat.json

QuickStart Scripts

# generate and evaluate with openai api, please setting the Openai API key first.
# export OPENAI_API_BASE=''
# export OPENAI_API_KEY=''
./scripts/run_code_ujb.sh api_gen chat multiplepython gpt-3.5-turbo gpt-3.5-turbo
./scripts/run_code_ujb.sh eval chat multiplepython gpt-3.5-turbo gpt-3.5-turbo
# generate with ray inference
./scripts/run_code_ujb.sh local_gen chat multiplepython $HOME/models/deepseekcoder-instruct-7b deepseekcoder-instruct-7b
./scripts/run_code_ujb.sh eval chat multiplepython $HOME/models/deepseekcoder-instruct-7b deepseekcoder-instruct-7b
# generate with tgi inference
./scripts/run_code_ujb.sh tgi_gen chat multiplepython $HOME/models/deepseekcoder-instruct-7b deepseekcoder-instruct-7b
./scripts/run_code_ujb.sh eval chat multiplepython $HOME/models/deepseekcoder-instruct-7b deepseekcoder-instruct-7b

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

ZZR0/CoderUJB

Folders and files

Latest commit

History

Repository files navigation

CoderUJB

Contents

Install

CodeUJB

Evaluate a model on CodeUJB

Step 1. Generate model answers to CodeUJB questions

Step 2. Evaluate model answers of CodeUJB

QuickStart Scripts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

ZZR0/CoderUJB

Folders and files

Latest commit

History

Repository files navigation

CoderUJB

Contents

Install

CodeUJB

Evaluate a model on CodeUJB

Step 1. Generate model answers to CodeUJB questions

Step 2. Evaluate model answers of CodeUJB

QuickStart Scripts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages