CodeOntology
CodeOntology is a building block of the Web of Code, an attempt to leverage code in a semantic framework.
Our framework is composed of three actors:
The Ontology
The ontology is designed to model the domain of object-oriented programming languages. It is written in OWL 2 and is mainly focused towards the Java programming language, but it can be easily reused to represent more languages. The modelling process underlying the creation of the ontology has been guided by common competency questions that usually arise during software process and has been inspired by a re-engineering of the Java abstract syntax. The ontology is available on Zenodo under CC BY 4.0 license.
The Parser
The parser analyzes Java code to serialize it into RDF triples. It is able to extract structural information common to all object-oriented programming languages, like class hierarchy, methods and constructors. Optionally, it can also serialize into RDF triples all the statements and expressions, thereby providing a complete RDF-ization of source code.
CodeOntology currently supports natively both Maven and Gradle projects.
The RDF serialization of a Java project acts in three steps: first the project is analyzed to download all of its dependencies and load them in class path, then an abstract syntax tree of the source code and its dependencies is built and processed to extract a set of RDF triples.
The parser, along with a tutorial on how to use it to extract a knowledge base from any Java project, is available on GitHub.
Datasets
We are currently applying the parser to analyze repositories from GitHub, retrieved automatically through the GitHub API. We have also applied the parser to extract RDF triples from the OpenJDK 8 source code. The resulting dataset is available for download on Zenodo and can be queried through our remote SPARQL endpoint.
People
Who made it possible
Maurizio Atzori
Maurizio Atzori was born in 1978 in Italy. He graduated in Computer Science (Informatica) summa cum laude in 2002, from the University of Pisa. He holds a PhD in Computer Science from the School for Graduate Studies "Galileo Galilei", University of Pisa, obtained in 2006. He has been member of the ISTI-CNR, holding a research fellowship from CNR of Pisa, and a member of Knowledge Discovery and Delivery Laboratory. He has been visiting scholar at Purdue University (Indiana, USA), working with Prof. Christopher W. Clifton and his research team. He has been visiting resercher working with Prof. Yucel Saygin at Sabanci University (Istanbul, Turkey) and at UCLA, collaborating with Prof. Carlo Zaniolo. Since December 2010 he is Assistant Professor (Ricercatore Universitario, Professore Aggregato) at the Department of Mathematics and Computer Science of the University of Cagliari (Italy). His major research interests regard database and dataspaces, data mining, knowledge graphs and privacy-preserving algorithms for data management.
Mattia Atzeni
Mattia Atzeni received his Bachelor of Science in Computer Science summa cum laude in 2016, from the University of Cagliari, discussing a thesis entitled CodeOntology: RDF-ization of Source Code. His research activity is currently focused on the design of ontology-based approaches to model software architecture and programming languages. He is studying towards his Master of Science in Computer Science at University of Cagliari and his main research interests include Semantic Technologies, Semantic Web, Data Mining, Sentiment Analysis and Knowledge Representation.
Mattia Setzu
Mattia Setzu received his Bachelor of Science in Computer Science from the University of Cagliari in 2016. He's currently studying towards his Master of Science in Computer Science at the University of Pisa. He is the developer of a first version of CodeOntology and a Semantic Web lover.