# Lean Six Sigma with Python — Chi-Squared Test

## Perform a **Chi-Squared Test **to explain a **shortage of drivers **impacting your **transportation network**

Lean Six Sigma is a method that can be defined as a stepwise approach to process improvements.

In a previous article, we used the **Kruskal-Wallis Test** to verify the hypothesis that a specific training positively impacts operators **Inbound VAS productivity**. (Link)

In this article, we will implement the **Chi-Squared Test **with Python to understand if **transportation delays** are due to a **bad allocation of drivers**.

**SUMMARY**

**I. Problem Statement**

Transportation delays are due to drivers' allocation issues?

**II. Data Analysis**

1. Exploratory Data Analysis

Analysis with Python sample data from historical records

**2. Perform Cross Tabulation**

Summarise the relationship between several categorical variables.

**3. Pearson’s Chi-Square Test**

Validate that your results are significant and not due to random fluctuation

**III. Conclusion**

# I. Problem Statement

## 1. Scenario

You are the **Inbound Transportation Manager **of a **small factory **in the United States.

Your transportation network is simple, you have two routes:

**Route 1:**coming from your northern regional hub*(with difficult road conditions and busy traffic)***Route 2:**coming from your southern regional hub*(with no traffic and a beautiful modern road)*

Transportation is managed by an external service provider with a fleet of three trucks (with three different drivers: D1, D2, D3).

**Replenishment Process**

- The Factory sends a replenishment order to your ERP
- The Southern regional hub receives the order first
**If**the stock in the southern hub is too low**then**the order is transferred to the northern hub**ERP sends a pick-up request**to the transportation service provider*(From Selected Hub to Factory)***The first driver accepting the request**is delivering the raw materials to the factory

*P.S: As a customer, we do not have any visibility on the process of driver allocation.*

**Problem**When an order is allocated to the northern regional hub the lead time to get the request accepted is 35% higher than the southern hub.

**Question**Are there drivers avoiding as much as possible to be allocated to the north route?

**Experiment**We have analyzed the shipments of

**the last 18 months**to build a sample of

**269 records**.

# II. Data Analysis

## 1. Exploratory Data Analysis

## 2. Perform Cross Tabulation

A cross-tabulation of the data can provide some insights and help us to discover a potential pattern in the repartition of driver’s allocation.

Example

82.65 % of shipments handled by Driver 1 are from SOUTH HUB

Example

38.89 % of shipments from SOUTH HUB are handled by Driver 1

**Minitab**

Menu Stats> Tables > Cross Tabulation and Chi-Square

## 3. Pearson’s Chi-Squared Test

The first table is called also called a Contingency table. It is used in statistics to summarise the relationship between several categorical variables.

We’ll calculate the **significance factor** to determine whether the relation between the variables is of considerable significance using the Chi-Squared Test.

`p-value is 0.410`

**Conclusion**

Because the p-value >0.05, there is no significant proof that the driver’s allocation is linked to the Hub.

**Code**

# III. Conclusion

This analysis helped us to refute our initial feeling that some drivers deliberately avoid the northern hub.

Therefore, we need to perform a deeper root cause analysis to understand why we have a longer lead time to find a driver for replenishment from this hub.

