This repo contains our CLIP-based, multi-modal classifiers for the Kaggle 'Predict Geographic Context from Landscape Photographs' challenge on the Geograph dataset. It provides scripts to:
- Download and preprocess training and test sets
- Train MLP and linear classifiers on CLIP image, title and location embeddings (alone or in combination)
- Evaluate model performance and generate Kaggle-ready submission files (.csv.zip)
πΌοΈ The paper is published in Remote Sensing Applications: Society and Environment [https://doi.org/10.1016/j.rsase.2025.101824].
π The preprint is available on arXiv [https://arxiv.org/pdf/2506.12214].
βοΈ Authors: Ilya Ilyankou*, Natchapon Jongwiriyanurak*, Tao Cheng, and James Haworth
*Equal contribution
We suggest running the notebooks in a separate virtual environment. Using miniconda,
# Navigate to the project folder cd ClipTheLandscape # Create a new virtual environment conda env create -f environment.yml # Activate that new virtual environment conda activate clip-the-landscape # Run Jupyter (will open in your default browser) or use VSCode instead jupyter lab
This section illustrates the subjectivity of labelling; our model's predicted tags are often as (or even more) appropriate as the original annotations. Tags like Canals, Air transport, Railways, and Burial ground, which represent distinct and objective features, achieve high
@article{clip-the-landscape,
title = {CLIP the landscape: Automated tagging of crowdsourced landscape images},
journal = {Remote Sensing Applications: Society and Environment},
volume = {41},
pages = {101824},
year = {2026},
issn = {2352-9385},
doi = {https://doi.org/10.1016/j.rsase.2025.101824},
url = {https://www.sciencedirect.com/science/article/pii/S2352938525003775},
author = {Ilya Ilyankou and Natchapon Jongwiriyanurak and Tao Cheng and James Haworth}
}
The code is released under the MIT license. The Geograph images are available under the CC-BY-SA 2.0 license.