This is a list of useful libraries for developing new "Big Code" tools.
Add your library by creating a pull request here.
codemining-*
codemining-* is a suite of Java-based tools for tokenizing, parsing Java code. The repository also contains code to analyze Git-based repositories.- codeminining-core contains code for tokenizing Java, JavaScript, Python, C and C++ in the JVM.
- codemining-treelm contains Java AST parsing and tree-level language models.
- commitmining-tools contains tools for traversing a Git repository, its history and possibly its files.
Tags: #codeanalysis
bigcode-tools
bigcode-tools is a suite of tools to fetch, parse and process source code. It also contains utility to generate vector embeddings from source code. It currently supports Python 2 and 3, Java and JavaScript. The tools are designed to be compatible with py150 and js150 datasets.Tags: #codeanalysis #embeddings