Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

FSoft-AI4Code/SWE-Synth

Repository files navigation

SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs

About

Automated program repair (APR) aims to autonomously fix software bugs, yet its effectiveness is hampered by the lack of diverse, real-world bug datasets essential for model training. Although combining large-scale mining with human effort can yield such datasets, the associated costs limit scalability. To address this, we introduce a novel, scalable synthetic data pipeline that leverages large language models (LLMs) to generate synthetic bugs through targeted LLM-based code rewriting. Our pipeline is also capable of synthesizing valuable intermediate repair steps and enriches the training signal toward correct fixes. Using our method, we create SWE-Synth, a large and contextually rich dataset of bug-fix pairs that are natural, scalable, automated verifiable, and contain intermediate repair steps. Training LLMs on our synthetic dataset yields context-aware repair strategies, that achieve repair accuracy equivalent to those trained on manually curated datasets from GitHub like SWE-Gym while delivering superior scalability with effortless bug synthesis, as demonstrated on popular benchmarks (SWE-Bench and BugsInPy).

Source code and data are available in the following links:

Source code: https://github.com/FSoft-AI4Code/SWE-Synth

Data: https://huggingface.co/swesynth

Survey link: https://survey.swesynth.com

Setup

Install

  1. Clone the repository

  2. Install the dependencies

conda create --name swesynth -y python=3.12
conda activate swesynth
pip install -e .

Hardware Requirements

  • Minimum 10 GB RAM, recommended >= 32 GB RAM
  • > 150GB disk space

Software Requirements

  • Linux OS (tested on Ubuntu 20.04)
  • Docker (tested on version 27.5.1)

Usage

For detailed instructions, please refer to the README in swesynth/experiments/synthetic_dataset/README.md.

We released the synthetic dataset used in the paper in the following links:

Statistics of the synthetic dataset SWE-Synth can be found in the following notebook swesynth/experiments/synthetic_dataset/statistics/data-statistic.ipynb.

For detailed instructions, please refer to the README in swesynth/experiments/benchmarks/README.md.

We released the benchmarking dataset used in the paper in the following links:

Reproduce experiments

For detailed instructions, please refer to the README in swesynth/experiments/swegym_comparison/README.md.

This includes the experiments answering the following research questions RQ1, RQ2, RQ3 mentioned in the paper:

  • RQ1. Model performance comparison on synthetic and manual data: How do the manual and synthetic training data in SWE-Synth influence the performance of models, when the training data is controlled to have either (a) the same total number of variants, or (b) the same total number of trajectories?

  • RQ2. Synthetic Data Scaling: How does increasing the number of synthetic training instances affect model performance?

  • RQ3. Human Study: How well can human subjects distinguish SWE-Synth's results from real-world, manually collected bugs?

Noted that readers are encouraged to take our survey (RQ3), available in the following link: https://survey.swesynth.com

For detailed instructions, please refer to the README in swesynth/experiments/model_size_comparison/README.md.

This includes the experiments answering the following research question RQ4 mentioned in the paper:

  • RQ4. Model Performance with Different Model Sizes: How does model performance vary across different model sizes when being fine-tuned on our synthetic training data in SWE-Synth?

For detailed instructions, please refer to the README in swesynth/experiments/data_pipeline_ablation/README.md.

This includes the experiments answering the following research questions RQ5, RQ6, RQ7, RQ8 mentioned in the paper:

  • RQ5. Impact of component granularity: How do different component granularities affect the trained models' performance?
  • RQ6. Impact of component selection: How do different component selection strategies affect the trained models' performance?
  • RQ7. Impact of model size: How does the size of the model used for component rewriting affect the trained models' performance?
  • RQ8. Ground-truth extraction strategies: How well does the model perform when being trained on reverse patch diff compared to that when being trained on SWE-Synth with rollout?

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /