Tree Canopy Segmentation

Data science competition hosted by Solafune, tasked to train a machine learning model capable of detecting tree canopies in multiple urban environments from aerial and satellite imagery.

Github Repository
Visit Website
Arrow
Skills
Geospatial Machine Learning, Image Segmentation (YOLO11n), Model Training & Evaluation, Annotation Format Conversion (COCO ↔ YOLO)
Tools
Python, Google Colab, PyTorch, YAML, JSON, COCOJson, GIS Toolkits (e.g. rasterio, geemap)

Data Non-Disclosure

In accordance with the Solafune Non-Disclosure Agreement, all raw data and information provided by Solafune is confidential for competition purposes. My project abides by these enforcements and therefore cannot be duplicated without access to training data.

Project Overview

In this project, I developed a geospatial pipeline for tree canopy segmentation using Sentinel‑2 imagery using machine learning algorithm YOLO. Utilising Solafune ground truth labelled satellite imagery, model training and validation was completed to produce a viable product for tree canopy segmentation on both local devices and google colab. Most recent successful training models have benefited from data augmentation (colour grading, image scaling, image rotation etc) and and further development is ongoing within the competition timeline.

Solafune

Visit the Solafune Tree Canopy Segemenation page to view my current leaderboard position and the competition's guidelines.

Repository README

README

Solafune Tree Canopy Detection Capstone

Project Overview

The project involved building a geospatial ML pipeline using Sentinel-2 imagery to detect tree canopies via image segmentation. Hosted by Solafune, I managed data imports of image segmentations, trained a segmentation model, and produced a competition-ready submission in the Solafune Tree Canopy Detection challenge.

This pipeline runs on both local environments and Google Colab with minimal path changes, enabling access to GPU acceleration for faster training.

Motivation

Accurate tree canopy mapping supports urban planning, biodiversity conservation, and climate modeling. Participating in this challenge helped develop strong skils in GIS data while yielding practical impacts while building skillsets in geospatial machine learning.

This project was also my first application of YOLO-based image segmentation and geospatial data processing, providing valuable experience working with Earth Observation data formats.

Results

My current submission model, as of September 24th 2025, places in the top 10 of 271 competitors on Solafune leaderboard, against the assesment criteria of >75% mean IoU on the prediction dataset.

My data pipeline uses a hold-out validation split to select augmentation and model hyperparameters.

For the future iterations to improve my results score, I will move to stratified k-fold cross-validation (grouped by source tile to prevent leakage, stratified by class presence) for more reliable estimates. Tto address the class imbalance between “Individual Trees” and “Group of Trees,”, I’ll implement oversampling and augmentation of the minority class and tune class-specific confidence/IoU thresholds.

Results will be reported as mean IoU and per-class IoU averaged across folds.

Lessons Learned

  • JSON label formatting and format conversion (COCO ↔ YOLO)
  • Consistent directory structure for large-scale ML projects across local and digitial directories.
  • Benefits of augmentation techniques like HSV rotation and color grading
  • Visual data formatting.
  • Convolutions and convolutional neural networks.

Tools

  • YOLO Machine Learning Model
    • image segmentation
  • Python 3.10
    • base language for building the pipeline and running scripts
    • Tasks
      • geospatial + ML pipeline
  • PyTorch
    • deep learning framework used to train segmentation models
  • Google Colab
    • Alternative notebook for accessing Google Colab's GPUs
  • JSON, COCOJson
    • Retrieval and submission of labels.
  • YAML
    • Organised model parameter files
  • GIS Toolkits

Installation

git clone <https://github.com/Mitch-P-Analyst/solafune-canopy-capstone.git>
cd solafune-tree-canopy
pip install -r requirements.txt

Repo Directory Structure

├── configurations/                     # YAML configs (data + overrides)
│   ├── model_data-seg.yaml                 # dataset paths & class names
│   ├── train_model_overrides.yaml          # training parameters
│   ├── val_model_overrides.yaml            # validation parameters
│   └── predict_model_overrides.yaml        # prediction parameters
│
├── data/                               # Downloaded satellite imagery and mosaics
│   ├── processed/                       
│   │ ├── images/
│   │ │  ├── predict/                     # Unlabeled (no Ground Truth) data for prediction 
│   │ │  ├── train/                       # Ground Truth Data Split for model training  
│   │ │  ├── val/                         # Ground Truth Data Split for model valdiation  
│   │ │  └── test/                        # Required by YOLO structure
│   │ ├── labels/
│   │ │  ├── train/                       # Ground Truth Labels Split for model training  
│   │ │  ├── val/                         # Ground Truth Labels Split for model valdiation
│   │ │  └── test/                        # Required by YOLO structure
│   │ └── JSONs/                          # Converted JSON file
│   ├── raw/
│   │ ├── zips/                           # Raw Data ZIP files | **Restricted by NDA**
│   │ └── JSONs/
│   └── temp/
│
├── notebooks/                          
│   ├── 01_data_preparation.ipynb           # Convert JSONs, Unzip, Split Data
│   ├── 02_train_model_colab.ipynb          # Google Colab notebook for model traiing
│   └── 04_test_model_evaluations.ipynb     # **Optional** Indepth model evaluations
│
├── scripts/                                
│   ├── 02_train_model.py                   # Train YOLO Model
│   ├── 03_val_model.py                     # Valdiate YOLO Model on GT Data
│   ├── 05_predict_model.py                 # Create predictions with trained YOLO Model on no GT Data  
│   └── 06_export_submission.py             # Convert prediction outputs into Solafune JSON format
│
├── runs/segments/                          # All model training/validation/prediction results
├── exports/                                # JSON Submission files
├── README.md                               # This file
├── README.html                             # README in HTML format for digital portfolio
└── requirements.txt                        # Package requirements

Process

Data

  • Data Not Sharable by Solafune Non-Disclosure Agreement.

Files & Run Order

Data Preparation

  • Notebook:
    • 'notebooks/01_data_preparation.ipynb'
      • JSON conversion
      • Solafune format -> COCO format
      • COCO format -> YOLO format
      • Unpacking Raw Data
        • Extract ZIP files
          • Training
          • Prediction
      • Data Split Images & Annotations
        • Training
        • Validation

Model Training

  • You can train on a Local Device or on Google Colab (GPU).
    • Local Device
    • Google Colab
      • Notebook:
        • notebooks/02_train_model_colab.ipynb'
          •  Open in Colab ▶
        • Use Colab’s GPU and follow the in-notebook instructions to mount Drive, set paths, train model and export model weights.

Model Validation

  • Script:
  • Configure hyperparameters & trained model weights
    • 'configurations/val_model_overrides.yaml'
      • Select Model Weights from trained YOLO model
        weights: 'runs/segment/train_Yolo11s_canopy_832_adamW__20251101-0151/weights/best.pt'  # example
        
      • Modify validation model parameters YAML file for fine tuning model

Metric Testings (Optional)

Predictions

Export

  • Script:
    • 'scripts/06_export_submission.py'
      • Select Prediction Annotations on Line 16 from /runs/segement/ folder for Submission Jile
        # Prediction Annotations
        labels = REPO_ROOT / "runs/segment/pred_train_Yolo11s_canopy_832_adamW__20251101-0151_2025-11-01 10:33" / "labels" # example
        

License

MIT License

Github Repository
Visit Website
Arrow