Just like vast amounts of data on the web enabled Big Data applications, now large repositories of programs (e.g. open source code on GitHub) enable a new class of applications that leverage these repositories of "Big Code". Using "Big Code" means to automatically learn from existing code in order to solve tasks such as predicting program bugs, predicting program behavior, predicting identifier names, or automatically creating new code. The topic spans inter-disciplinary research in Machine Learning (ML), Programming Languages (PL) and Software Engineering (SE). This website lists some of the state-of-the-art techniques in the area.
Have a look at the current challenges to be solved by "Big Code".
Download or try amazing tools that leverage "Big Code".
Do research and download some of the existing datasets to compare your solution to state-of-the-art.
If you would like to contribute, fork this repository, make edits and create a pull request using GitHub.