Data science competition hosted by Solafune, focused on training a machine learning model to detect tree canopies in diverse urban environments using aerial and satellite imagery.

In accordance with the Solafune Non-Disclosure Agreement, all raw competition data is confidential. This project cannot be fully reproduced without access to the original training data via the Solafune competition page.
Instructions for downloading the Solafune raw data (in line with their terms and conditions) are provided with this project.
This project is my submission to Solafune’s Tree Canopy Detection data science competition. I built a geospatial pipeline for tree canopy segmentation using Solafune’s annotated Sentinel-2 imagery and a YOLO-based segmentation model. Using the ground-truth labels provided, I trained and validated models that run both locally and in Google Colab.
My most successful training runs have benefited from data augmentation (colour grading, image scaling, rotation, etc.), improving the model’s performance on the competition metric.
Visit the Solafune Tree Canopy Segmentation page to view my current leaderboard position and the competition's guidelines.
Solafune develops satellite and geospatial data analysis technologies by hosting worldwide data science competitions. This particular competition aims to address the problem of inaccurate vegetation segmentation in mixed urban environments. Poor segmentation can hinder infrastructure planning and maintenance, as well as efforts to classify and monitor vegetation health. Solafune’s goal is to encourage innovative machine learning and geospatial AI approaches to improve this process.
My best submission score of 0.338 used a hold-out validation split to select augmentation strategies and model hyperparameters. I iterated on augmentation across visual parameters such as rotation, hue and saturation, and image scaling, as well as duplicating annotations to increase training exposure and improve the YOLO-based model’s learning environment.
For evaluation, I tracked precision and recall for both classes (“Individual Trees” and “Group of Trees”) to tune confidence thresholds and IoU settings and optimise the competition metric.
For future iterations and projects, I plan to apply stratified k-fold cross-validation (grouped by source tile to prevent leakage and stratified by class presence) for more reliable performance estimates.
To address the class imbalances (such as this project's “Individual Trees” and “Group of Trees”), I’ll explore oversampling and targeted augmentation of the minority class, and tune class-specific confidence/IoU thresholds.
For those interested in implementation details, the GitHub repo includes: