Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Must-read papers on Repository-level Code Generation & Issue Resolution πŸ”₯

License

Notifications You must be signed in to change notification settings

YerbaPage/Awesome-Repo-Level-Code-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

153 Commits

Repository files navigation

πŸ€–βœ¨ Awesome Repository-Level Code Generation βœ¨πŸ€–

🌟 A curated list of awesome repository-level code generation research papers and resources. If you want to contribute to this list (please do), feel free to send me a pull request. πŸš€ If you have any further questions, feel free to contact Yuling Shi or Xiaodong Gu (SJTU).

πŸ“š Contents

πŸ’₯ Repo-Level Issue Resolution

  • SWE-Exp: Experience-Driven Software Issue Resolution [2025-07-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution [2025-07-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • LIVE-SWE-AGENT: Can Software Engineering Agents Self-Evolve on the Fly? [2025-11-arXiv] [πŸ“„ paper]

  • Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories [2025-10-arXiv] [πŸ“„ paper]

  • BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills [2025-10-arXiv] [πŸ“„ paper]

  • Where LLM Agents Fail and How They can Learn From Failures [2025-09-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Effi: Re-Evaluating Software AI Agent System Effectiveness Under Resource Constraints [2025-09-arXiv] [πŸ“„ paper]

  • Diffusion is a code repair operator and generator [2025-08-arXiv] [πŸ“„ paper]

  • The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason [2025-06-arXiv] [πŸ“„ paper]

  • Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards [2025-06-arXiv] [πŸ“„ paper]

  • EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair [2025-06-arXiv] [πŸ“„ paper]

  • Coding Agents with Multimodal Browsing are Generalist Problem Solvers [2025-06-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • CoRet: Improved Retriever for Code Editing [2025-05-arXiv] [πŸ“„ paper]

  • Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents [2025-05-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development [2025-05-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Putting It All into Context: Simplifying Agents with LCLMs [2025-05-arXiv] [πŸ“„ paper]

  • SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning [2025-05-arXiv] [πŸ“„ blog] [πŸ”— repo]

  • AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions [2025-FSE] [πŸ“„ paper]

  • Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [2025-03-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs [2025-03-arXiv] [πŸ“„ paper]

  • CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph Searching [2025-03-arXiv] [πŸ“„ paper]

  • SEAlign: Alignment Training for Software Engineering Agent [2025-03-arXiv] [πŸ“„ paper]

  • DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [2025-03-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • LocAgent: Graph-Guided LLM Agents for Code Localization [2025-03-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning [2025-02-arXiv] [πŸ“„ paper]

  • SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution [2025-02-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [2025-01-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • CodeMonkeys: Scaling Test-Time Compute for Software Engineering [2025-01-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Training Software Engineering Agents and Verifiers with SWE-Gym [2024-12-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • CODEV: Issue Resolving with Visual Data [2024-12-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [2024-11-arXiv] [πŸ“„ paper]

  • Globant Code Fixer Agent Whitepaper [2024-11] [πŸ“„ paper]

  • MarsCode Agent: AI-native Automated Bug Fixing [2024-11-arXiv] [πŸ“„ paper]

  • Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [2024-11-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement [2024-10-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • AutoCodeRover: Autonomous Program Improvement [2024-09-ISSTA] [πŸ“„ paper] [πŸ”— repo]

  • SpecRover: Code Intent Extraction via LLMs [2024-08-arXiv] [πŸ“„ paper]

  • OpenHands: An Open Platform for AI Software Developers as Generalist Agents [2024-07-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • AGENTLESS: Demystifying LLM-based Software Engineering Agents [2024-07-arXiv] [πŸ“„ paper]

  • RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [2024-07-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • CodeR: Issue Resolving with Multi-Agent and Task Graphs [2024-06-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [2024-06-arXiv] [πŸ“„ paper]

  • SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [2024-NeurIPS] [πŸ“„ paper] [πŸ”— repo]

πŸ€– Repo-Level Code Completion

  • Enhancing Project-Specific Code Completion by Inferring Internal API Information [2025-07-TSE] [πŸ“„ paper] [πŸ”— repo]

  • CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation [2025-04-arXiv] [πŸ“„ paper]

  • CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases [2025-04-NAACL] [πŸ“„ paper]

  • RTLRepoCoder: Repository-Level RTL Code Completion through the Combination of Fine-Tuning and Retrieval Augmentation [2025-04-arXiv] [πŸ“„ paper]

  • Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs [2025-04-AAAI] [πŸ“„ paper] [πŸ”— repo]

  • What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond [2025-03-arXiv] [πŸ“„ paper]

  • REPOFILTER: Adaptive Retrieval Context Trimming for Repository-Level Code Completion [2025-04-OpenReview] [πŸ“„ paper]

  • Improving FIM Code Completions via Context & Curriculum Based Learning [2024-12-arXiv] [πŸ“„ paper]

  • ContextModule: Improving Code Completion via Repository-level Contextual Information [2024-12-arXiv] [πŸ“„ paper]

  • A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse With Local-Aware, Global-Aware, and Third-Party-Library-Aware [2024-12-TSE] [πŸ“„ paper]

  • RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation [2024-09-arXiv] [πŸ“„ paper]

  • RAMBO: Enhancing RAG-based Repository-Level Method Body Completion [2024-09-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • RLCoder: Reinforcement Learning for Repository-Level Code Completion [2024-07-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis [2024-06-arXiv] [πŸ“„ paper]

  • GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model [2024-06-arXiv] [πŸ“„ paper]

  • Enhancing Repository-Level Code Generation with Integrated Contextual Information [2024-06-arXiv] [πŸ“„ paper]

  • R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models [2024-06-arXiv] [πŸ“„ paper]

  • Natural Language to Class-level Code Generation by Iterative Tool-augmented Reasoning over Repository [2024-05-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback [2024-03-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Repoformer: Selective Retrieval for Repository-Level Code Completion [2024-03-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion [2024-03-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • RepoMinCoder: Improving Repository-Level Code Generation Based on Information Loss Screening [2024-07-Internetware] [πŸ“„ paper]

  • CodePlan: Repository-Level Coding using LLMs and Planning [2024-07-FSE] [πŸ“„ paper] [πŸ”— repo]

  • DraCo: Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion [2024-05-ACL] [πŸ“„ paper] [πŸ”— repo]

  • RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation [2023-10-EMNLP] [πŸ“„ paper] [πŸ”— repo]

  • Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context [2023-09-NeurIPS] [πŸ“„ paper] [πŸ”— repo]

  • RepoFusion: Training Code Models to Understand Your Repository [2023-06-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Repository-Level Prompt Generation for Large Language Models of Code [2023-06-ICML] [πŸ“„ paper] [πŸ”— repo]

  • Fully Autonomous Programming with Large Language Models [2023-06-GECCO] [πŸ“„ paper] [πŸ”— repo]

πŸ”„ Repo-Level Code Translation

  • A Systematic Literature Review on Neural Code Translation [2025-05-arXiv] [πŸ“„ paper]

  • EVOC2RUST: A Skeleton-guided Framework for Project-Level C-to-Rust Translation [2025-08-arXiv] [πŸ“„ paper]

  • Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code [2024-04-ICSE] [πŸ“„ paper] [πŸ”— repo]

  • Enhancing llm-based code translation in repository context via triple knowledge-augmented [2025-03-arXiv] [πŸ“„ paper]

  • C2SaferRust: Transforming C Projects into Safer Rust with NeuroSymbolic Techniques [2025-01-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Scalable, Validated Code Translation of Entire Projects using Large Language Models [2025-06-PLDI] [πŸ“„ paper]

  • Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis [2024-12-arxiv] [πŸ“„ paper] [πŸ•ΈοΈ website]

  • RustRepoTrans: Repository-level Code Translation Benchmark Targeting Rust [2024-11-arxiv] [πŸ“„ paper] [πŸ”— repo]

πŸ§ͺ Repo-Level Unit Test Generation

  • Execution-Feedback Driven Test Generation from SWE Issues [2025-08-arXiv] [πŸ“„ paper]

  • AssertFlip: Reproducing Bugs via Inversion of LLM-Generated Passing Tests [2025-07-arXiv] [πŸ“„ paper]

  • Mystique: Automated Vulnerability Patch Porting with Semantic and Syntactic-Enhanced LLM [2025-06-arXiv] [πŸ“„ paper]

  • Issue2Test: Generating Reproducing Test Cases from Issue Reports [2025-03-arXiv] [πŸ“„ paper]

  • Agentic Bug Reproduction for Effective Automated Program Repair at Google [2025-02-arXiv] [πŸ“„ paper]

  • LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [2024-11-arXiv] [πŸ“„ paper]

πŸ” Repo-Level Code QA

  • SWE-QA: Can Language Models Answer Repository-level Code Questions? [2025-09-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Decompositional Reasoning for Graph Retrieval with Large Language Models [2025-06-arXiv] [πŸ“„ paper]

  • LongCodeBench: Evaluating Coding LLMs at 1M Context Windows [2025-05-arXiv] [πŸ“„ paper]

  • LocAgent: Graph-Guided LLM Agents for Code Localization [2025-03-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • CoReQA: Uncovering Potentials of Language Models in Code Repository Question Answering [2025-01-arXiv] [πŸ“„ paper]

  • RepoChat Arena [2025-Blog] [πŸ”— repo]

  • RepoChat: An LLM-Powered Chatbot for GitHub Repository Question-Answering [MSR-2025] [πŸ”— repo]

  • CodeQueries: A Dataset of Semantic Queries over Code [2022-09-arXiv] [πŸ“„ paper]

πŸ‘©β€πŸ’» Repo-Level Issue Task Synthesis

πŸ“Š Datasets and Benchmarks

  • SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories [2025-12-arXiv] [πŸ“„ paper]

  • Multi-Docker-Eval: A β€˜Shovel of the Gold Rush’ Benchmark on Automatic Environment Building for Software Engineering? [2025-12-arXiv] [πŸ“„ paper]

  • CodeClash: CodeClash: Benchmarking Goal-Oriented Software Engineering [2025-11-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads? [2025-11-arXiv] [πŸ“„ paper]

  • SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models [2025-11-arXiv] [πŸ“„ paper]

  • SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks [2025-11-arXiv] [πŸ“„ paper]

  • ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases [2025-10-arXiv] [πŸ“„ paper]

  • SWE-QA: Can Language Models Answer Repository-level Code Questions? [2025-09-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SR-Eval: Evaluating LLMs on Code Generation under Stepwise Requirement Refinement [2025-10-arXiv] [πŸ“„ paper]

  • RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback [2025-09-arXiv] [πŸ“„ paper]

  • Bigcodebench: Benchmarking code generation with diverse function calls and complex instructions [ICLR-2025 Oral] [πŸ“„ paper] [πŸ”— repo]

  • Vibe Checker: Aligning Code Evaluation with Human Preference [2025-10-arXiv] [πŸ“„ paper]

  • MULocBench: A Benchmark for Localizing Code and Non-Code Issues in Software Projects [2025-09-arXiv] [πŸ“„ paper] [πŸ•ΈοΈ website]

  • SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios [2025-09-arXiv] [πŸ“„ paper]

  • SWE-bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? [2025-09] [πŸ“„ paper] [πŸ”— repo]

  • AutoCodeBench: AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators [2025-08-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • LiveRepoReflection: Turning the Tide: Repository-based Code Reflection [2025-07-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Perf: SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? [2025-07-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code [2025-06-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks [2025-06-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench [ACL-2025] [πŸ“„ paper]

  • SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner [ICML-2025] [πŸ“„ paper] [πŸ”— repo]

  • AgentIssue-Bench: Can Agents Fix Agent Issues? [2025-08-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance [2025-07-arXiv] [πŸ“„ paper]

  • OmniGIRL: OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution [2025-05-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation [2025-04-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Smith: Scaling Data for Software Engineering Agents [2025-04-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs [2025-04-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study [2025-03-arXiv] [πŸ“„ paper]

  • Unveiling Pitfalls: Understanding Why AI-driven Code Agents Fail at GitHub Issue Resolution [2025-03-arXiv] [πŸ“„ paper]

  • ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments [2025-02-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Lancer: Can Can Frontier LLMs Earn 1ドル Million from Real-World Freelance Software Engineering? [2025-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Evaluating Agent-based Program Repair at Google [2025-01-arXiv] [πŸ“„ paper]

  • SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents [2025-05-arXiv] [πŸ“„ paper] [πŸ•ΈοΈ website]

  • SWE-bench-Live: A Live Benchmark for Repository-Level Issue Resolution [2025-05-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation [2025-05-ACL] [πŸ“„ paper] [πŸ”— repo]

  • OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution [2025-05-ISSTA] [πŸ“„ paper]

  • SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents [2025-04-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving [2025-04-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation [2025-04-NAACL] [πŸ“„ paper][πŸ”— Website]

  • SWEE-Bench & SWA-Bench: Automated Benchmark Generation for Repository-Level Coding Tasks [2025-03-arXiv] [πŸ“„ paper]

  • ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation [2025 ACL-Findings] [πŸ“„ paper] [πŸ”— repo]

  • REPOST-TRAIN: Scalable Repository-Level Coding Environment Construction with Sandbox Testing [2025-03-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Loc-Bench: Graph-Guided LLM Agents for Code Localization [2025-03-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Lancer: Can Can Frontier LLMs Earn 1ドル Million from Real-World Freelance Software Engineering? [2025-02-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SolEval: Benchmarking Large Language Models for Repository-level Solidity Code Generation [2025-02-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • HumanEvo: An Evolution-aware Benchmark for More Realistic Evaluation of Repository-level Code Generation [2025-ICSE] [πŸ“„ paper] [πŸ”— repo]

  • RepoExec: On the Impacts of Contexts on Repository-Level Code Generation [2025-NAACL] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Gym: Training Software Engineering Agents and Verifiers with SWE-Gym [2024-12-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • RepoTransBench: RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation [2024-12-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • Visual SWE-bench: Issue Resolving with Visual Data [2024-12-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • ExecRepoBench: Multi-level Executable Code Completion Evaluation [2024-12-arXiv] [πŸ“„ paper] [πŸ”— site]

  • REPOCOD: Can Language Models Replace Programmers? REPOCOD Says 'Not Yet' [2024-10-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • M2RC-EVAL: M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation [2024-10-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-bench+: Enhanced Coding Benchmark for LLMs [2024-10-arXiv] [πŸ“„ paper]

  • SWE-bench Multimodal: Multimodal Software Engineering Benchmark [2024-10-arXiv] [πŸ“„ paper] [πŸ”— site]

  • Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [2024-10-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents [2024-06-arxiv] [πŸ“„ paper] [πŸ•ΈοΈ website]

  • CodeRAG-Bench: Can Retrieval Augment Code Generation? [2024-06-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • R2C2-Bench: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models [2024-06-arXiv] [πŸ“„ paper]

  • RepoClassBench: Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository [2024-05-arXiv] [πŸ“„ paper] [πŸ”— repo]

  • DevEval: Evaluating Code Generation in Practical Software Projects [2024-ACL-Findings] [πŸ“„ paper] [πŸ”— repo]

  • CodAgentBench: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges [2024-ACL] [πŸ“„ paper]

  • RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems [2024-ICLR] [πŸ“„ paper] [πŸ”— repo]

  • SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [2024-ICLR] [πŸ“„ paper] [πŸ”— repo]

  • CrossCodeLongEval: Repoformer: Selective Retrieval for Repository-Level Code Completion [2024-ICML] [πŸ“„ paper] [πŸ”— repo]

  • R2E-Eval: Turning Any GitHub Repository into a Programming Agent Test Environment [2024-ICML] [πŸ“„ paper] [πŸ”— repo]

  • RepoEval: Repository-Level Code Completion Through Iterative Retrieval and Generation [2023-EMNLP] [πŸ“„ paper] [πŸ”— repo]

  • CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion [2023-NeurIPS] [πŸ“„ paper] [πŸ”— site]

  • Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation [2025-01-arxiv] [πŸ“„ paper] [πŸ”— repo]

  • SWE-Dev: SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development [2025-05-arXiv] [πŸ“„ paper] [πŸ”— repo]

Star History

Star History Chart

About

Must-read papers on Repository-level Code Generation & Issue Resolution πŸ”₯

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 10

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /