A Robust Machine Learning Model for Patient Outcome Forecasting
Full Analysis on GitHubLiver cirrhosis is a progressive disease with high mortality rates. Predicting patient survival—whether a patient will die (D), be censored due to a liver transplant (C), or be censored without a transplant (O)—is crucial for timely clinical interventions and resource allocation. This project applies advanced classification techniques to complex clinical data to provide early, accurate prognostic assessments.
Best Classifier
LogReg
(Logistic Regression, Balanced)
Best CV Score
0.7436
Cross-Validation Performance
Test Accuracy
69.05%
Final Classification Performance
Test F1-Score
0.693
Harmonic mean of Precision/Recall
Six classifier variants were benchmarked using cross-validation to ensure model robustness and to mitigate the effects of class imbalance. The models were ranked based on their cross-validation F1-weighted score.
| Rank | Model Variant | Best CV Score (F1-Weighted) |
|---|---|---|
| 1 | Logistic Regression (Balanced) | 0.7436 |
| 2 | Logistic Regression (Original) | 0.7377 |
| 3 | SVM (Original) | 0.7275 |
| 4 | SVM (Balanced) | 0.7185 |
| 5 | Ridge Classifier (Original) | 0.7180 |
| 6 | Ridge Classifier (Balanced) | 0.7180 |
Justification for Selection:
The Logistic Regression (Balanced) model was chosen due to achieving the **highest F1-weighted cross-validation score (0.7436)**. The F1-weighted metric ensures performance is accurately assessed across all three imbalanced classes, providing the most reliable estimate of the model's true predictive capability.
Commonly Important Features (Both Methods):
Method-Specific Insights:
Conclusion: Moderate agreement between statistical (ANOVA) and model-driven (Permutation) importance suggests complex feature interactions. 17/19 features were found to be statistically significant (p<0.05).