Salifort Motors Attrition Rates

A capstone project for the Google Advanced Data Analytics certification, this exploration investigates trends and variable associations through classification models between employee departures at the fictional Salifort Motors company.

Github Repository
Visit Link
Arrow
Skills
Machine Learning Models, Exploratory Data Analysis, PACE Model framework, Data Cleaning,
Tools
XGBoost, Logistic Regression, EDA, Feature Engineering, Visualisations, Executive Summary

Project Overview

As the capstone for the Google Advanced Data Analytics certificate, this project uses a fictional human-resources dataset from Kaggle to investigate attrition at Salifort Motors.

With recent turnover at Salifort Motors, I was tasked with investigating the drivers of attrition and providing data-driven recommendations using machine-learning models to help mitigate future departures. If Salifort could predict whether an employee will leave the company, and discover the reasons behind their departure, they could better understand the problem and develop a solution. The goal is to understand what makes an employee likely to leave.

Executive Summary

I analyzed a 15k-row human resources dataset for Salifort Motors to understand drivers of attrition. Data visualisation showed associations between key working environment variables and employee departure. I produced an Executive Summary for stakeholder communication to summarize the project's discoveries.

Key Findings

Employees with more projects, higher monthly hours, longer tenure, and lower satisfaction were much more likely to leave. This was first identified through visualisations of associated influencing variables, followed by machine learning models to predict employee attrition.

Visualisations

Employees who left report substantially lower satisfaction than those who stayed, suggesting self-rated satisfaction is a strong early signal of attrition risk.

Departed employees cluster heavily at very low satisfaction scores, with a smaller group around moderate satisfaction and only a handful of high-satisfaction leavers.

Two clear high-risk clusters emerge: overworked, very dissatisfied leavers on high-hour projects 6–7, and under-utilised leavers on low-hour project 2.

Attrition is lowest for employees with 3–4 projects and moderate hours, but spikes for under-loaded staff with only 2 projects and for heavily loaded staff on 6–7 projects, where attrition reaches 100%.
Key Observations
  • Three identifiable clusters of employees who departed Salifort Motors
    • Low Satisfaction Cluster (x <= 0.12)
    • Mid Satisfaction Cluster (0.31 <= x <= 0.48)
    • High Satisfaction Cluster (0.70 <= x)
  • Majority of all Low Satisfaction Level Employees who departed were on 6 or 7 projects.
  • Employee satisfaction level has visible associations with the number of projects; strong association with assignment to 6 and 7 different projects and moderate association with assignment to 2 different projects.
  • Additional factor tenure has moderate associations with employee satisfaction level for duration of 3 and 4 years.

Prediction Models

Applying Logistic Regression and XGBoost models, I constructed a best-fitting prediction model to assess employee attrition, validating visualised variable associations. Using logistic regression as a baseline and a tuned XGBoost model for comparison.

The XGBoost model highlights satisfaction level, number of projects, and tenure as the strongest drivers of attrition, while department membership has only a minor influence.

Compared with the baseline logistic regression, the tuned XGBoost model achieves far higher recall and F1-score while also improving accuracy, making it much better at correctly flagging at-risk employees.

Impact & Recommendations

Employees working on higher number of projects, a workload with higher monthly hours, longer tenure, lower satisfaction, are much more likely to leave.

Departed employees with the lowest satisfaction levels were assigned to 6 or 7 different projects. Examining all employees, past and present, from data provided, employees assigned to 6 or 7 projects are associated with the highest average monthly hours and are tenured employees around ≈4 years.

Summary metrics by project load show that attrition is lowest for employees handling 3–4 projects with moderate hours, but jumps sharply for under-utilised staff with only 2 projects and for overloaded staff on 6–7 projects, where satisfaction drops and attrition reaches 45–100% despite higher hours and longer tenure.

Recommendations

Reflecting upon the above visualisations and numbers presented in Project Statistics, this investigation recommends targeting the variables most strongly associated with employee departure.

  • Rebalance Project Assignment
    • Employees with the highest levels of mean Satisfaction Levels are assigned to 4 or 5 projects. Restrict employees to a maximum of 5 projects.
  • Restrict Employee Hours
    • Restrict Average Monthly Hours for employees to the ranges presented by most satisfied employees, a maximum of 230 hours.

Limitations & Next Steps

Further steps to improve this investigation may include

  • Sourcing information on Type of Employee Departure.
    • Fired
    • Redundant
    • Resignation

The data shows significant quantities of employees departed with medium and high satisfaction levels, primarily from collection of employees assigned to 2 project and in the departments of sales and human resources.

Distinguishing between how employees left Salifort may assist in analyzing where employee resources may be reassigned to compensate over-worked employees working on 6 or 7 projects.

  • Assess Employee Previous Satisfaction Levels
    The XGBoost model identifies Previous Satisfaction Levels as a key indicator towards predicting an employees eventual departure. Analysis of threshold upon satisfaction could provide potential risk management of employee attrition to monitor work environment and maintain sufficient satisfaction levels.

Repository & Technical Details

For those interested, the GitHub repo includes Jupyter notebooks for EDA, prediction models, and figure generation.

Github Repository
Visit Website
Arrow