Solafune Tree Canopy Detection Capstone
Project Overview
The project involved building a geospatial ML pipeline using Sentinel-2 imagery to detect tree canopies via image segmentation. Hosted by Solafune, I managed data imports of image segmentations, trained a segmentation model, and produced a competition-ready submission in the Solafune Tree Canopy Detection challenge.
This pipeline runs on both local environments and Google Colab with minimal path changes, enabling access to GPU acceleration for faster training.
Motivation
Accurate tree canopy mapping supports urban planning, biodiversity conservation, and climate modeling. Participating in this challenge helped develop strong skils in GIS data while yielding practical impacts while building skillsets in geospatial machine learning.
This project was also my first application of YOLO-based image segmentation and geospatial data processing, providing valuable experience working with Earth Observation data formats.
Results
My current submission model, as of September 24th 2025, places in the top 10 of 271 competitors on Solafune leaderboard, against the assesment criteria of >75% mean IoU on the prediction dataset.
Lessons Learned
- JSON label formatting and format conversion (COCO ↔ YOLO)
- Consistent directory structure for large-scale ML projects across local and digitial directories.
- Benefits of augmentation techniques like HSV rotation and color grading
- Visual data formatting.
- Convolutions and convolutional neural networks.
- YOLO Machine Learning Model
- Python 3.10
- base language for building the pipeline and running scripts
- Tasks
- PyTorch
- deep learning framework used to train segmentation models
- Google Colab
- Alternative notebook for accessing Google Colab's GPUs
- JSON, COCOJson
- Retrieval and submission of labels.
- YAML
- Organised model parameter files
- GIS Toolkits
Installation
git clone <https://github.com/Mitch-P-Analyst/solafune-canopy-capstone.git>
cd solafune-tree-canopy
pip install -r requirements.txt
Repo Directory Structure
├── configurations/ # YAML configs (data + overrides)
│ ├── model_data-seg.yaml # dataset paths & class names
│ ├── train_model_overrides.yaml # training parameters
│ ├── val_model_overrides.yaml # validation parameters
│ └── predict_model_overrides.yaml # prediction parameters
│
├── data/ # Downloaded satellite imagery and mosaics
│ ├── processed/
│ │ ├── images/
│ │ │ ├── predict/ # Unlabeled (no Ground Truth) data for prediction
│ │ │ ├── train/ # Ground Truth Data Split for model training
│ │ │ ├── val/ # Ground Truth Data Split for model valdiation
│ │ │ └── test/ # Ground Truth Data Split for model testing
│ │ ├── labels/
│ │ │ ├── train/ # Ground Truth Labels Split for model training
│ │ │ ├── val/ # Ground Truth Labels Split for model valdiation
│ │ │ └── test/ # Ground Truth Labels Split for model testing
│ │ └── JSONs/ # Converted JSON file
│ ├── raw/
│ │ ├── zips/ # Raw Data ZIP files | **Restricted by NDA**
│ │ └── JSONs/
│ └── temp/
│
├── notebooks/
│ ├── 01_data_preparation.ipynb # Convert JSONs, Unzip, Split Data
│ └── 04_test_model_evaluations.ipynb # **Optional** Indepth model evaluations
│
├── scripts/
│ ├── 02_train_model.py # Train YOLO Model
│ ├── 03_test_model.py # Test/Valdiate YOLO Model on GT Data
│ ├── 05_predict_model.py # Create predictions with trained YOLO Model on no GT Data
│ └── 06_export_submission.py # Convert prediction outputs into Solafune JSON format
│
├── runs/segments/ # All model training/validation/prediction results
├── exports/ # JSON Submission files
├── README.md # This file
└── requirements.txt # Package requirements
Process
Data
- Data Not Sharable by Solafune Non-Disclosure Agreement.
Run Order
- 01_data_preparation.ipynb
↓
- 02_train_model.py
↓
- 03_test_model.py
↓
- (Optional) 04_test_model_evaluations.ipynb
↓
- 05_predict_model.py
↓
- 06_export_submission.py
Files
Optional Google Colab
Relative pathways and constructed Google Colab .ipynb file to utilise Google Colab GPUs.
To take advantage of Googel Colab's free or paid GPU, follow the below Google Drive Directory Structure and Google Colab Juptyer Notebook file.
Required Google Drive Directory Structure
├── drive/ # Drive Base
│ ├── MyDrive/ # Google Drive
│ │ ├── Datasets/ # Host of all Datasets
│ │ │ ├── solafune-tree-canopy/
│ │ │ │ ├── data/ # Data Folder Structure
│ │ │ │ │ ├── processed/
│ │ │ │ │ │ ├── labels/
│ │ │ │ │ │ │ ├── train/
│ │ │ │ │ │ │ ├── test/
│ │ │ │ │ │ │ └── val/
│ │ │ │ │ │ └── images/
│ │ │ │ │ │ ├── train/
│ │ │ │ │ │ ├── test/
│ │ │ │ │ │ └── val/
│ │ │ │ │ │
│ │ │ │ │ ├── temp/
│ │ │ │ │ └── raw/
│ │ │ │ │ ├── zips/
│ │ │ │ │ └── JSONs/
│ │ │ │ │
│ │ │ │ ├── runs/
│ │ │ │ │ ├── segments/ # All model training/validation/prediction results
│ │ │ │ │ └── notebooks/
│ │ │ │ │
│ │ │ │ └── outputs/
│ │ │ │ └── exports/
Files
License
MIT License