Post

BackOrder Prediction Model

Goal : Classify the products whether they would go into Backorder(Yes or No) based on the historical data of 1.9M and 24 features from inventory, supply chain and sales .

Working Model

Introduction

Backorders are unavoidable, but by anticipating which things will be backordered, planning can be streamlined at several levels, preventing unexpected strain on production, logistics, and transportation. ERP systems generate a lot of data (mainly structured) and also contain a lot of historicaldata; if this data can be properly utilized, a predictive model to forecast backorders and plan accordingly can be constructed. Based on past data from inventories, supply chain, and sales, classify the products as going into backorder (Yes or No).

Proposed Solution

The proposed solution for this project is Machine learning algorithms can be implemented to predict backorder. Considering various features like inventory quantity,previous performance, minimum_balance, forecast_sales , actual_sales etc as inputs from the web app, the implemented classification model will predict the output Here, we have used Random Forest Classifier to predict . However, drawing a baseline model is important since it tells us how well other models have performed compared to base model.

Tech Stack Used

tech_stak

Design Details

Process Flow

process FLow

Deployment Flow

Deployemnt Process

Experiments

Data Cleaning

  • We had a data containing 1.9M rows with 24 columns describing Sales forecast for month 1,3,6,9 ,Sales Performance average, Transit ti,e stock Keeping time (SKU) etc.
  • We checked for null values and found that column Lead Time had 11K missing values . When checked the distribution ,ww found it was skewed hence we opted for Median Imputation and for columns with normal distribution we used Mean Imputation
  • Column Inventory had some neagtive numbers, since inventory cannot be neagtive we replaced neagtive values with 0.

EDA

  • We wanted to know how our target column was distributed. Upon analzying the data we found that our dataset was heavily imbalanced with less than 10% were backorder data and <90% were not backordered back_order_cnt

  • As we analyzed the data we found some interesting realtions such as sales vs forecast for backorder and not_backorder. Sales and forecast columns seems to be directly proportional for not_backorder data

foreacste_vs_sales

  • We found that our data had extreme outliers . to overcome that we implemtned 3 Std Deviation rule which catures 99% of the data outlier_det

  • Since we had 24 columns we wanted to know whether each columns correlates with any of the other columns hence plotted heatmap as below. Found that Sales and Forecast columns were highly correlated . Having both type of columns increases the complexity , hence we decided to use only one of those. heatmap

Statistical Tests

  • VIF : To check for multicollinarity and choose the final numcerical columns we used Variance Inflation Factor. vif

  • Chi2 : To check if categorical column has any effect on Target. If yes including it in the model will help the model chi2

Modeling .

  • We used Random FOrest, Support Vecotr Classifier, Decision Tree and XGB Classifier for our experimentation. base_model_code result

Hyperparameter Tuning for XGB as final model.

Hyperparameter Best_para Final Results

Results

After tuning our model we were able to achieve f1-score of 95% , recall 93% and precision 97%.

HeroKu Deploying.

  • Used Heroku PLatform and built a flask API . Currenlty Heroku platform has removed the free toer access and hence app may not be working.

App_working

This post is licensed under CC BY 4.0 by the author.