Data science competition hosted by Solafune, tasked to train a machine learning model capable of detecting tree canopies in multiple urban environments from aerial and satellite imagery.

Skills

Geospatial Machine Learning, Image Segmentation (YOLO11n), Model Training & Evaluation, Annotation Format Conversion (COCO ↔ YOLO)

Tools

Python, Google Colab, PyTorch, YAML, JSON, COCOJson, GIS Toolkits (e.g. rasterio, geemap)

Data Non-Disclosure

In accordance with the Solafune Non-Disclosure Agreement, all raw data and information provided by Solafune is confidential for competition purposes. My project abides by these enforcements and therefore cannot be duplicated without access to training data.

Project Overview

In this project, I developed a geospatial pipeline for tree canopy segmentation using Sentinel‑2 imagery using machine learning algorithm YOLO. Utilising Solafune ground truth labelled satellite imagery, model training and validation was completed to produce a viable product for tree canopy segmentation on both local devices and google colab. Most recent successful training models have benefited from data augmentation (colour grading, image scaling, image rotation etc) and and further development is ongoing within the competition timeline.

Solafune

Visit the Solafune Tree Canopy Segemenation page to view my current leaderboard position and the competition's guidelines.

‍

Repository README

README

Solafune Tree Canopy Detection Capstone

Project Overview

The project involved building a geospatial ML pipeline using Sentinel-2 imagery to detect tree canopies via image segmentation. Hosted by Solafune, I managed data imports of image segmentations, trained a segmentation model, and produced a competition-ready submission in the Solafune Tree Canopy Detection challenge.

This pipeline runs on both local environments and Google Colab with minimal path changes, enabling access to GPU acceleration for faster training.

Motivation

Accurate tree canopy mapping supports urban planning, biodiversity conservation, and climate modeling. Participating in this challenge helped develop strong skils in GIS data while yielding practical impacts while building skillsets in geospatial machine learning.

This project was also my first application of YOLO-based image segmentation and geospatial data processing, providing valuable experience working with Earth Observation data formats.

Results

My current submission model, as of September 24th 2025, places in the top 10 of 271 competitors on Solafune leaderboard, against the assesment criteria of >75% mean IoU on the prediction dataset.

My data pipeline uses a hold-out validation split to select augmentation and model hyperparameters.

For the future iterations to improve my results score, I will move to stratified k-fold cross-validation (grouped by source tile to prevent leakage, stratified by class presence) for more reliable estimates. Tto address the class imbalance between “Individual Trees” and “Group of Trees,”, I’ll implement oversampling and augmentation of the minority class and tune class-specific confidence/IoU thresholds.

Results will be reported as mean IoU and per-class IoU averaged across folds.

Lessons Learned

JSON label formatting and format conversion (COCO ↔ YOLO)
Consistent directory structure for large-scale ML projects across local and digitial directories.
Benefits of augmentation techniques like HSV rotation and color grading
Visual data formatting.
Convolutions and convolutional neural networks.

Tools

YOLO Machine Learning Model
- image segmentation
Python 3.10
- base language for building the pipeline and running scripts
- Tasks
  - geospatial + ML pipeline
PyTorch
- deep learning framework used to train segmentation models
Google Colab
- Alternative notebook for accessing Google Colab's GPUs
JSON, COCOJson
- Retrieval and submission of labels.
YAML
- Organised model parameter files
GIS Toolkits

Installation

git clone <https://github.com/Mitch-P-Analyst/solafune-canopy-capstone.git>
cd solafune-tree-canopy
pip install -r requirements.txt

Repo Directory Structure

├── configurations/                     # YAML configs (data + overrides)
│   ├── model_data-seg.yaml                 # dataset paths & class names
│   ├── train_model_overrides.yaml          # training parameters
│   ├── val_model_overrides.yaml            # validation parameters
│   └── predict_model_overrides.yaml        # prediction parameters
│
├── data/                               # Downloaded satellite imagery and mosaics
│   ├── processed/                       
│   │ ├── images/
│   │ │  ├── predict/                     # Unlabeled (no Ground Truth) data for prediction 
│   │ │  ├── train/                       # Ground Truth Data Split for model training  
│   │ │  ├── val/                         # Ground Truth Data Split for model valdiation  
│   │ │  └── test/                        # Required by YOLO structure
│   │ ├── labels/
│   │ │  ├── train/                       # Ground Truth Labels Split for model training  
│   │ │  ├── val/                         # Ground Truth Labels Split for model valdiation
│   │ │  └── test/                        # Required by YOLO structure
│   │ └── JSONs/                          # Converted JSON file
│   ├── raw/
│   │ ├── zips/                           # Raw Data ZIP files | **Restricted by NDA**
│   │ └── JSONs/
│   └── temp/
│
├── notebooks/                          
│   ├── 01_data_preparation.ipynb           # Convert JSONs, Unzip, Split Data
│   ├── 02_train_model_colab.ipynb          # Google Colab notebook for model traiing
│   └── 04_test_model_evaluations.ipynb     # **Optional** Indepth model evaluations
│
├── scripts/                                
│   ├── 02_train_model.py                   # Train YOLO Model
│   ├── 03_val_model.py                     # Valdiate YOLO Model on GT Data
│   ├── 05_predict_model.py                 # Create predictions with trained YOLO Model on no GT Data  
│   └── 06_export_submission.py             # Convert prediction outputs into Solafune JSON format
│
├── runs/segments/                          # All model training/validation/prediction results
├── exports/                                # JSON Submission files
├── README.md                               # This file
├── README.html                             # README in HTML format for digital portfolio
└── requirements.txt                        # Package requirements

Process

Data

Data Not Sharable by Solafune Non-Disclosure Agreement.
- To access data, visit Solafune compeition webpage Tree Canopy Detection

Files & Run Order

Data Preparation

Notebook:
- 'notebooks/01_data_preparation.ipynb'
  - JSON conversion
  - Solafune format -> COCO format
  - COCO format -> YOLO format
  - Unpacking Raw Data
    - Extract ZIP files
      - Training
      - Prediction
  - Data Split Images & Annotations
    - Training
    - Validation

Model Training

You can train on a Local Device or on Google Colab (GPU).
- Local Device
  - Script:
    - scripts/02_train_model.py
  - Configure hyperparameters
    - configurations/train_model_overrides.yaml
  - Choose the pretrained YOLO weights near the top of the script (line ~21):
```
model = YOLO('yolo11s-seg.pt')  # options: yolo11n-seg.pt, yolo11s-seg.pt, yolo11x-seg.pt, yolov8s-seg.pt
```
- Google Colab
  - Notebook:
    - notebooks/02_train_model_colab.ipynb'
    - Use Colab’s GPU and follow the in-notebook instructions to mount Drive, set paths, train model and export model weights.

Model Validation

Script:
- 'scripts/03_val_model.py'
Configure hyperparameters & trained model weights
- 'configurations/val_model_overrides.yaml'
  - Select Model Weights from trained YOLO model
```
weights: 'runs/segment/train_Yolo11s_canopy_832_adamW__20251101-0151/weights/best.pt'  # example
```
  - Modify validation model parameters YAML file for fine tuning model

Metric Testings (Optional)

Notebook:
- 'notebooks/04_test_model_evaluations'
  - Optional Unfinished Notebook file. Containing indepth measures to analyse split data validation.

Predictions

Script:
- 'scripts/05_predict_model.py'
Configure hyperparameters & trained model weights
- 'configurations/predict_model_overrides.yaml'
  - Select Model Weights from trained YOLO model
```
weights: 'runs/segment/train_Yolo11s_canopy_832_adamW__20251101-0151/weights/best.pt'  # example
```
  - Modify prediction model parameters YAML file for final model deployment

Export

Script:

'scripts/06_export_submission.py'

Select Prediction Annotations on Line 16 from /runs/segement/ folder for Submission Jile

# Prediction Annotations
labels = REPO_ROOT / "runs/segment/pred_train_Yolo11s_canopy_832_adamW__20251101-0151_2025-11-01 10:33" / "labels" # example

License

MIT License

‍

Github Repository

Visit Website

Sales Analytics: Ecommerce Growth to $500K

Led the development and expansion of RMU Skis’ ecommerce channel, achieving 40% year-over-year growth across two consecutive years. Analyzed website traffic, conversion rates, and SKU-level profitability to drive targeted marketing, optimize product strategy, and support data-driven decision-making.

Climate Kegs – Carbon Neutrality Campaign

Partnering initially with Climate Neutral to measure emissions across RMU’s global operations, I redirected the project into a community-driven philanthropic sustainability campaign in response to funding limitations. This led to the creation of “Climate Kegs”, a tree-planting beer initiative launched in partnership with One Tree Planted. Each keg funded the planting of approximately 750 trees, equivalent to offsetting one tonne of CO₂ over 20 years, by turning hospitality-driven customer engagement into measurable local climate action.

Salifort Motors Attrition Rates

A capstone project for the Google Advanced Data Analytics certification, this exploration investigates trends and variable associations through classification models between employee departures at the fictional Salifort Motors company.

Tree Canopy Segmentation

Data Non-Disclosure

Project Overview

Solafune

Repository README

Solafune Tree Canopy Detection Capstone

Project Overview

Motivation

Results

Lessons Learned

Tools

Installation

Repo Directory Structure

Process

Data

Files & Run Order

Data Preparation

Model Training

Model Validation

Metric Testings (Optional)

Predictions

Export

License

Similar Projects

Sales Analytics: Ecommerce Growth to $500K

Climate Kegs – Carbon Neutrality Campaign

Salifort Motors Attrition Rates