rankr rankr.online status
rankr is a platform for aggregating the results of different academic rankings.
rankr crawls university ranking tables and stores the results in .csv files, after some pre-processing. At the moment, it supports QS, Shanghai (ARWU), and Times Higher Education (THE) for both their world university rankings and subject rankings.
To match academic institutions across ranking systems (to aggregate their results), it is necessary to have an identification system that assigns unique IDs to institutions. After some research, the GRID ID system was chosen, which provides a free database of about 100,000 research institutions; for example, the GRID ID for Sharif University of Technology is grid.412553.4.
rankr first stores the entire GRID ID data (release 2020εΉ΄06ζ29ζ₯) in a database (using SQLAlchemy) and then iterates through each crawled ranking table and tries to match institutions with their respective GRID counterparts.
To achieve this, several methods are employed using:
- Institution profile URL in each ranking table
- Institution name & country
- Fuzzy matching of institutions
- Manual matching
It should be noted that the metadata from the GRID database is preferred in case of any discrepancy. For example, if a ranking website lists an institution under Country A, but the GRID database records it under Country B, the latter will be selected.
- Clone the repo:
git clone https://github.com/pmsoltani/rankr.git - Switch to the repo directory:
cd rankr - Make sure Docker is running.
- Create a
.envfile in the root directory (More info here). - Create a data directory:
mkdir backend/data - Download the GRID database (from the link above) and extract it inside the new
datadirectory. - Start the application:
docker-compose up -d - Crawl the ranking tables:
docker-compose exec backend rankr crawl rankings - Initialize the database for the first time:
docker-compose exec backend rankr db reset --confirm - And you're done! Visit the following URL in your browser: http://0.0.0.0:8000
Please note that the project is still in a pre-alpha stage and it's not ready for production. As of now, the instructions above will fail because of a small bug in the GRID database (release 2020εΉ΄06ζ29ζ₯): The line 62833 of the file addresses.csv inside the grid/full_tables directory has an additional space character at the end of the country name: "Bonaire, Saint Eustatius and Saba ". Hopefully, this will be corrected in the database's next release. Until then the file must be manually corrected. (See the TODO section).
For obvious security reasons, the project's environment variables file (the .env file) is not included in the repo, but here is a short, working version of it:
COMPOSE_PROJECT_NAME=rankr INSTALL_PATH=/home/rankr APP_ENV=development POETRY_VERSION=1.0.10 APP_NAME=rankr APP_HOST=0.0.0.0 APP_PORT=8000 USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 DIALECT=postgresql POSTGRESQL_DRIVER=psycopg2 # database host is the docker compose service name POSTGRESQL_HOST=postgres POSTGRESQL_PORT=5432 POSTGRESQL_NAME=rankr POSTGRESQL_USER=rankr POSTGRESQL_PASS=postgres_super_secret_password ADMINER_HOST=0.0.0.0 ADMINER_PORT=5050 ADMINER_DRIVER=pgsql ADMINER_SERVER=postgres ADMINER_DB=rankr ADMINER_USERNAME=rankr ADMINER_PASSWORD=postgres_super_secret_password
The stack currently has three containers:
postgres, used to store the data in a persistant manner.adminer(optional), serves as a GUI for managing the database. Can be removed by commenting out/deleteing theadminersection of thedocker-compose.ymlfile.backend, hosts the API server.
- Create a web server and an API.
- Design and develop a dashboard.
- Finish the documentation.
- Make
countries.csva public file. - Update the
config.pymodule to reflect the recent changes (i.e., the new rankr CLI). - Add more functionalities to the CLI (e.g., starting the webserver and running the tests).
- Dockerize the app.
- Add subject ranking tables
All contributions (suggestions, PRs, ...) are welcome.