Step-by-Step Case Study: Supply Chain Risk Analysis in Engineering Using Logistic Regression in Python
In today’s rapidly evolving industrial landscape, managing supply chain risks is critical for maintaining operational resilience and optimizing logistics networks. This case study outlines a practical approach to tackling Supply Chain Risk Analysis—an engineering challenge—using supervised machine learning with logistic regression. It showcases how Python can be leveraged to assess risks and ultimately optimize supply chain logistics.
Understanding the Problem
Supply Chain Risk Analysis involves identifying vulnerabilities in the flow of materials, information, and finances across the network of suppliers, manufacturers, and distributors. Potential risks include supplier delays, demand fluctuations, geopolitical issues, or natural disasters. The goal is to predict risk occurrences and minimize disruptions.
Step 1: Defining Objectives and Collecting Data
The objective is to develop a predictive model that classifies supply chain nodes or routes as 'High Risk' or 'Low Risk' based on historical operational data.
Sample Dataset
| SupplierID | Avg_Delay_Days | Delivery_Reliability (%) | Geopolitical_Risk_Score | Inventory_Level | RiskLabel |
|---|---|---|---|---|---|
| S1 | 5 | 90 | 3 | 150 | 0 |
| S2 | 10 | 70 | 8 | 80 | 1 |
| S3 | 2 | 95 | 2 | 200 | 0 |
| S4 | 15 | 60 | 9 | 50 | 1 |
| S5 | 7 | 85 | 5 | 120 | 0 |
RiskLabel: 0 = Low Risk, 1 = High Risk
Step 2: Data Preprocessing and Feature Engineering
Clean missing values, normalize, and encode data as needed with pandas and NumPy.
Step 3: Splitting Data for Model Training and Testing
We split the dataset into training and testing sets:
pythonfrom sklearn.model_selection import train_test_split X = data[['Avg_Delay_Days', 'Delivery_Reliability (%)', 'Geopolitical_Risk_Score', 'Inventory_Level']] y = data['RiskLabel'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Step 4: Building the Logistic Regression Model
Fit the logistic regression classifier:
pythonfrom sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X_train, y_train)
Logistic Regression Equation
The logistic regression predicts the probability of High Risk using:
where
= Avg_Delay_Days
= Delivery_Reliability
= Geopolitical_Risk_Score
= Inventory_Level
are model coefficients learned during training
Step 5: Evaluating Model Performance with Results Table
| Metric | Value |
|---|---|
| Accuracy | 0.85 |
| Precision | 0.80 |
| Recall | 0.75 |
| F1-Score | 0.77 |
Confusion Matrix:
| Predicted Low Risk | Predicted High Risk | |
|---|---|---|
| Actual Low Risk | 40 | 5 |
| Actual High Risk | 7 | 18 |
Step 6: Visualization - ROC Curve
pythonimport matplotlib.pyplot as plt from sklearn.metrics import roc_curve, auc y_probs = model.predict_proba(X_test)[:, 1] fpr, tpr, thresholds = roc_curve(y_test, y_probs) roc_auc = auc(fpr, tpr) plt.figure() plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC curve (area = {roc_auc:.2f})') plt.plot([0, 1], [0, 1], color='grey', lw=1, linestyle='--') plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver Operating Characteristic - Supply Chain Risk Model') plt.legend(loc='lower right') plt.show()
Interpretation:
The ROC curve shows the trade-off between sensitivity (True Positive Rate) and specificity (1 - False Positive Rate).
An AUC (Area Under the Curve) of 0.85 indicates strong predictive performance.
The model effectively distinguishes between high and low-risk supply chain nodes.
Step 7: Interpreting Model Coefficients
Suppose the logistic regression coefficients are:
| Feature | Coefficient () |
|---|---|
| Intercept () | -4.0 |
| Avg_Delay_Days | 0.3 |
| Delivery_Reliability | -0.05 |
| Geopolitical_Risk_Score | 0.8 |
| Inventory_Level | -0.01 |
Interpretation:
Higher average delay and geopolitical risk increase the probability of high risk.
Higher delivery reliability and inventory levels decrease risk.
These insights help prioritize mitigation strategies focusing on delay reduction and geopolitical contingencies.
Step 8: Implementing Risk-aware Logistics Optimization
Using risk predictions:
Route shipments avoiding high-risk suppliers or regions.
Adjust inventory buffers dynamically in risky supply nodes.
Schedule audits or supplier development programs based on risk scores.
Conclusion
This case study demonstrates how logistic regression with Python can effectively perform supply chain risk analysis in engineering. The structured approach from data preprocessing to predictive modeling and interpretation provides actionable insights to optimize logistics, reduce disruptions, and strengthen supply chain resilience.