Solafune Tree Canopy Detection Capstone
Project Overview
The project involved building a geospatial ML pipeline using Sentinel-2 imagery to detect tree canopies via image segmentation. Hosted by Solafune, I managed data imports of image segmentations, trained a segmentation model, and produced a competition-ready submission in the Solafune Tree Canopy Detection challenge.
This pipeline runs on both local environments and Google Colab with minimal path changes, enabling access to GPU acceleration for faster training.
Motivation
Accurate tree canopy mapping supports urban planning, biodiversity conservation, and climate modeling. Participating in this challenge helped develop strong skils in GIS data while yielding practical impacts while building skillsets in geospatial machine learning.
This project was also my first application of YOLO-based image segmentation and geospatial data processing, providing valuable experience working with Earth Observation data formats.
Results
My current submission model, as of September 24th 2025, places in the top 10 of 271 competitors on Solafune leaderboard, against the assesment criteria of >75% mean IoU on the prediction dataset.
My data pipeline uses a hold-out validation split to select augmentation and model hyperparameters.
For the future iterations to improve my results score, I will move to stratified k-fold cross-validation (grouped by source tile to prevent leakage, stratified by class presence) for more reliable estimates. Tto address the class imbalance between “Individual Trees” and “Group of Trees,”, I’ll implement oversampling and augmentation of the minority class and tune class-specific confidence/IoU thresholds.
Results will be reported as mean IoU and per-class IoU averaged across folds.
Lessons Learned
- JSON label formatting and format conversion (COCO ↔ YOLO)
- Consistent directory structure for large-scale ML projects across local and digitial directories.
- Benefits of augmentation techniques like HSV rotation and color grading
- Visual data formatting.
- Convolutions and convolutional neural networks.
- YOLO Machine Learning Model
- Python 3.10
- base language for building the pipeline and running scripts
- Tasks
- PyTorch
- deep learning framework used to train segmentation models
- Google Colab
- Alternative notebook for accessing Google Colab's GPUs
- JSON, COCOJson
- Retrieval and submission of labels.
- YAML
- Organised model parameter files
- GIS Toolkits
Installation
git clone <https://github.com/Mitch-P-Analyst/solafune-canopy-capstone.git>
cd solafune-tree-canopy
pip install -r requirements.txt
Repo Directory Structure
├── configurations/ # YAML configs (data + overrides)
│ ├── model_data-seg.yaml # dataset paths & class names
│ ├── train_model_overrides.yaml # training parameters
│ ├── val_model_overrides.yaml # validation parameters
│ └── predict_model_overrides.yaml # prediction parameters
│
├── data/ # Downloaded satellite imagery and mosaics
│ ├── processed/
│ │ ├── images/
│ │ │ ├── predict/ # Unlabeled (no Ground Truth) data for prediction
│ │ │ ├── train/ # Ground Truth Data Split for model training
│ │ │ ├── val/ # Ground Truth Data Split for model valdiation
│ │ │ └── test/ # Required by YOLO structure
│ │ ├── labels/
│ │ │ ├── train/ # Ground Truth Labels Split for model training
│ │ │ ├── val/ # Ground Truth Labels Split for model valdiation
│ │ │ └── test/ # Required by YOLO structure
│ │ └── JSONs/ # Converted JSON file
│ ├── raw/
│ │ ├── zips/ # Raw Data ZIP files | **Restricted by NDA**
│ │ └── JSONs/
│ └── temp/
│
├── notebooks/
│ ├── 01_data_preparation.ipynb # Convert JSONs, Unzip, Split Data
│ ├── 02_train_model_colab.ipynb # Google Colab notebook for model traiing
│ └── 04_test_model_evaluations.ipynb # **Optional** Indepth model evaluations
│
├── scripts/
│ ├── 02_train_model.py # Train YOLO Model
│ ├── 03_val_model.py # Valdiate YOLO Model on GT Data
│ ├── 05_predict_model.py # Create predictions with trained YOLO Model on no GT Data
│ └── 06_export_submission.py # Convert prediction outputs into Solafune JSON format
│
├── runs/segments/ # All model training/validation/prediction results
├── exports/ # JSON Submission files
├── README.md # This file
├── README.html # README in HTML format for digital portfolio
└── requirements.txt # Package requirements
Process
Data
- Data Not Sharable by Solafune Non-Disclosure Agreement.
Files & Run Order
Data Preparation
Model Training
- You can train on a Local Device or on Google Colab (GPU).
- Local Device
- Google Colab
- Notebook:
- notebooks/02_train_model_colab.ipynb'
- Use Colab’s GPU and follow the in-notebook instructions to mount Drive, set paths, train model and export model weights.
Model Validation
- Script:
- Configure hyperparameters & trained model weights
Metric Testings (Optional)
Predictions
- Script:
- Configure hyperparameters & trained model weights
Export
License
MIT License