Salifort Motors Attrition Rates

A capstone project for the Google Advanced Data Analytics certification, this exploration investigates trends and variable associations through classification models between employee departures at the fictional Salifort Motors company.

Github Repository
Visit Website
Arrow
Skills
Machine Learning Models, Exploratory Data Analysis, PACE Model framework, Data Cleaning,
Tools
XGBoost, Logistics Regression, EDA, Feature Engineering, Visualisations, Executive Summary

Project Overview

As the capstone for the Google Advanced Data Analytics certificate, this project uses a fictional human-resources dataset from Kaggle to investigate attrition at Salifort Motors.

With recent turnover at Salifort Motors, I was tasked with investigating the drivers of attrition and providing data-driven recommendations using machine-learning models to help mitigate future departures.

Repository README

README

Employment Attrition Analysis of Salifort Motors

Executive Summary

Executive Summary Document

I analyzed a 15k-row HR dataset for Salifort Motors to understand drivers of attrition. Employees with more projects, higher monthly hours, longer tenure, and lower satisfaction were much more likely to leave.

Recommendations:

  • Limit project assignments at ≤5
  • Limit average monthly hours ≤230
  • Rebalance workload for 3–4 year employees
  • Monitor satisfaction thresholds for predicting employees at risk of attrition.

Best Model:

  • XGBoost classifier (cross-validated)
    • Achieved strong metrics
      • Precision: 0.98
      • Recall: 0.91
      • F1: 0.94
      • Accuracy: 0.98
    • Ordered top Feature Importance by Gain (Gain is the average reduction in log-loss from splits using that feature (averaged across all trees). i.e Higher gain = a feature split reduced log-loss more.)
      • Satisfaction: 44.0%
      • Number of Projects: 23.3%
      • Tenure: 13.8%
      • Previous Satisfaction: 9.6%
      • Average Monthly Hours: 6.9%

Impact:

  • Reduces risk among the highest-attrition groups.

Next Steps:

  • Address limitation of turnover between voluntary vs involuntary departures
    • Source information on Type of Employee Departure to further analysis attrition.
      • Fired
      • Redundant
      • Resignation

Overview

Google Advanced Data Analytics Capstone project. Built around the fictional french Salifort Motors organisation, this repository utilises skills taught through Google's Advanced Data Analytics course such as project proposals, EDA, machine learning model building, statistical testing, PACE planning and stakeholder summaries to analyse and predict employee attrition.

Provided Context

About the company

Salifort Motors is a fictional French-based alternative energy vehicle manufacturer. Its global workforce of over 100,000 employees research, design, construct, validate, and distribute electric, solar, algae, and hydrogen-based vehicles. Salifort’s end-to-end vertical integration model has made it a global leader at the intersection of alternative energy and automobiles.

Dataset

A fictional datasets sourced from Kaggle, hosting 15,000 rows and 10 columns with the structure listed below:

Kaggle Hr Analytics Job Prediction.

Variable Description
satisfaction_level Employee-reported job satisfaction level [0–1]
last_evaluation Score of employee's last performance review [0–1]
number_project Number of projects employee contributes to
average_monthly_hours Average number of hours employee worked per month
time_spend_company How long the employee has been with the company (years)
Work_accident Whether or not the employee experienced an accident while at work
left Whether or not the employee left the company
promotion_last_5years Whether or not the employee was promoted in the last 5 years

Scenario

Currently, there is a high rate of turnover among Salifort employees. (Note: In this context, turnover data includes both employees who choose to quit their job and employees who are let go). Salifort’s senior leadership team is concerned about how many employees are leaving the company. Salifort strives to create a corporate culture that supports employee success and professional development. Further, the high turnover rate is costly in the financial sense. Salifort makes a big investment in recruiting, training, and upskilling its employees.

If Salifort could predict whether an employee will leave the company, and discover the reasons behind their departure, they could better understand the problem and develop a solution. Therefore, discover what’s likely to make the employee leave the company.

As a first step, the leadership team asks Human Resources to survey a sample of employees to learn more about what might be driving turnover.

Next, the leadership team asks you to analyze the survey data and come up with ideas for how to increase employee retention. To help with this, they suggest you design a model that predicts whether an employee will leave the company based on their job title, department, number of projects, average monthly hours, and any other relevant data points. A good model will help the company increase retention and job satisfaction for current employees, and save money and time training new employees.

As a specialist in data analysis, the leadership team leaves it up to you to choose an approach for building the most effective model to predict employee departure. For example, you could build and evaluate a statistical model such as logistic regression. Or, you could build and evaluate machine learning models such as decision tree, random forest, and XGBoost. Or, you could choose to deploy both statistical and machine learning models.

For any approach, you’ll need to analyze the key factors driving employee turnover, build an effective model, and share recommendations for next steps with the leadership team.

Deliverables

For this deliverable, you are asked to choose a method to approach this data challenge based on your prior course work.

  • Select either a regression model or a tree-based machine learning model to predict whether an employee will leave the company.

  • Identify the top features that impact employees to leave the company

Stakeholders

  • Salifort Leadership team & Executive team

    • High overview of project and summaries
  • Human Resources Department

    • Coworkers & Management
      • Mid-level overview if required

Reproduce

cd clone https://github.com/Mitch-P-Analyst/Salifort-Motors.git
pip install -r requirements.txt
python notebooks/01_notebook.ipynb

Repository Structure

├─ Data/                    # Raw Data
├─ Models/                  # saved XGBoost Model
├─ notebooks/               # Singular notebook
│  ├─ 01_notebook.ipynb
├─ Outputs/                 # Chart and CSV Outputs
├─ .gitignore
├─ Executive_Summary.pdf    # Stakeholder One-Page Summary    
└─ README.md                # This file

Project Process

PACE: Planning

The provided Kaggle dataset contain 15,000 rows across 10 variable columns. Initial analysis identified necessary data structure, types and values to further investigate trends.

  • Exploratory Data Analysis (EDA)
    • Cleaned provided dataset, identifying structural and information limitations
      • Duplicates / Null Values / Datatypes / Outliers

PACE: Analyze

Visual analysis of variable trends was conducted with Matplotlib and Seaborn over a series of 5 figures. Observations were conducted between independent variables and target variable left, cross-variable relationships, satisfaction_level distributions and further associations.

Inference from these visualisations in the Analyze stage show departed employees with low levels of satisfaction were assigned to 6 or 7 projects. The majority of whom are employees with 4 years of tenure, working higher average monthly hours. These observed associations became a focal point for further analysis as I discover what factors are leading to employee attrition.

PACE: Construct

To identify variable associations with the target variable left, two classification prediction models were produced to identify potential relationships and assess utility for future Human Resources assessments on at-risk employees.

  • Clear baseline model

    • Logistic Regression
      • Poor prediction results
      • Identified strong odds-ratios for independent variables
        • Satisfaction level increase of 0.1 is associated with ≈33% lower odds of leaving
        • Each additional year of tenure is associated with ≈30% higher odds of leaving
        • A promotion in the last five years reduces the odds of leaving by roughly ≈76%.
  • Descriptive precise metrics

    • XGBoost
      • Cross Validation performed to assess best hyperparameter metrics
        • Very strong prediction results
      • Identified strong Feature Importance assessed by Gain. (Gain is the average reduction in log loss from splits using a feature (averaged over all trees). Higher gain ⇒ feature splits reduce log loss more.)
        • Satisfaction Level
          • 44.0%
        • Number of Projects
          • 23.3%
        • Tenure
          • 13.8%
        • Previous Satisfaction Level
          • 9.6%
        • Average Monthly Hours
          • 6.9%
        • All Department Types
          • ≈2.5%

Interpretation: These percentages are not effect sizes; they show how often and how usefully a feature was used across the boosted trees to reduce log-loss.

  • Prediction Evaluation Metrics

model_predicition_metrics

As seen in model metrics, the XGBoost Classification model produced extremely high evaluation metrics. Higher than the Logistic Regression model across metrics Precision, Recall, F1, Accuracy.

Given this consistent improvement across all metrics, the XGBoost is the preferred model for predicting future employee attrition at Salifort Motors.

PACE: Execute

Conclusion

To conclude, after combining perspectives from these models, I identified a consistent pattern.

Employees working on higher number of projects, a workload with higher monthly hours, longer tenure, lower satisfaction, are much more likely to leave.

Departed employees with the lowest satisfaction levels were assigned to 6 or 7 different projects. Examining all employees, past and present, from data provided, employees assigned to 6 or 7 projects are associated with the highest average monthly hours and are tenured employees around ≈4 years.

Employees assigned to 7 projects have an attrition rate of 100%

Project_statistics

Conclusion_Charts

These results point to clear opportunities for improving attrition rates by managing workload of tenured employees of ~3 - 4 years to address satisfaction levels.

Recommendations

Reflecting upon the above visualisations and numbers presented in Project Statistics, this investigation recommends addressing the key variables that are supported by analysis to associate with employee departure.

  • Rebalance Project Assignment
    • Employees with the highest levels of mean Satisfaction Levels are assigned to 4 or 5 projects. Restrict employees to a maximum of 5 projects.
  • Restrict Employee Hours
    • Restrict Average Monthly Hours for employees to the ranges presented by most satisfied employees, a maximum of 230 hours.

Limitations & Next Steps

Further steps to improve this investigation may include

  • Sourcing information on Type of Employee Departure.
    • Fired
    • Redundant
    • Resignation

The data shows significant quantities of employees departed with medium and high satisfaction levels, primarily from collection of employees assigned to 2 project and in the departments of sales and human resources.

Distinguishing between how employees left Salifort may assist in analyzing where employee resources may be reassigned to compensate over-worked employees working on 6 or 7 projects.

  • Assess Employee Previous Satisfaction Levels
    The XGBoost model identifies Previous Satisfaction Levels as a key indicator towards predicting an employees eventual departure. Analysis of threshold upon satisfaction could provide potential risk management of employee attrition to monitor work environment and maintain sufficient satisfaction levels.
Github Repository
Visit Website
Arrow