BackOrder Prediction Model

Posted Aug 12, 2021 Updated Oct 20, 2023

By Chinmay Maganur 2 min read

Goal : Classify the products whether they would go into Backorder(Yes or No) based on the historical data of 1.9M and 24 features from inventory, supply chain and sales .

Introduction

Backorders are unavoidable, but by anticipating which things will be backordered, planning can be streamlined at several levels, preventing unexpected strain on production, logistics, and transportation. ERP systems generate a lot of data (mainly structured) and also contain a lot of historicaldata; if this data can be properly utilized, a predictive model to forecast backorders and plan accordingly can be constructed. Based on past data from inventories, supply chain, and sales, classify the products as going into backorder (Yes or No).

Proposed Solution

The proposed solution for this project is Machine learning algorithms can be implemented to predict backorder. Considering various features like inventory quantity,previous performance, minimum_balance, forecast_sales , actual_sales etc as inputs from the web app, the implemented classification model will predict the output Here, we have used Random Forest Classifier to predict . However, drawing a baseline model is important since it tells us how well other models have performed compared to base model.

Tech Stack Used

Design Details

Process Flow

Deployment Flow

Experiments

Data Cleaning

We had a data containing 1.9M rows with 24 columns describing Sales forecast for month 1,3,6,9 ,Sales Performance average, Transit ti,e stock Keeping time (SKU) etc.
We checked for null values and found that column Lead Time had 11K missing values . When checked the distribution ,ww found it was skewed hence we opted for Median Imputation and for columns with normal distribution we used Mean Imputation
Column Inventory had some neagtive numbers, since inventory cannot be neagtive we replaced neagtive values with 0.

EDA

We wanted to know how our target column was distributed. Upon analzying the data we found that our dataset was heavily imbalanced with less than 10% were backorder data and <90% were not backordered
As we analyzed the data we found some interesting realtions such as sales vs forecast for backorder and not_backorder. Sales and forecast columns seems to be directly proportional for not_backorder data

We found that our data had extreme outliers . to overcome that we implemtned 3 Std Deviation rule which catures 99% of the data
Since we had 24 columns we wanted to know whether each columns correlates with any of the other columns hence plotted heatmap as below. Found that Sales and Forecast columns were highly correlated . Having both type of columns increases the complexity , hence we decided to use only one of those.

Statistical Tests

VIF : To check for multicollinarity and choose the final numcerical columns we used Variance Inflation Factor.
Chi2 : To check if categorical column has any effect on Target. If yes including it in the model will help the model

Modeling .

We used Random FOrest, Support Vecotr Classifier, Decision Tree and XGB Classifier for our experimentation.

Hyperparameter Tuning for XGB as final model.

Results

After tuning our model we were able to achieve f1-score of 95% , recall 93% and precision 97%.

HeroKu Deploying.

Used Heroku PLatform and built a flask API . Currenlty Heroku platform has removed the free toer access and hence app may not be working.

Projects, Machine Learning

This post is licensed under CC BY 4.0 by the author.