Fairness Score: A Composite Metric for Fairness and Performance Evaluation
algorithmic fairness; credit risk; fairness metrics; imbalanced data; machine learning; simulation; German Credit.
The rapid adoption of machine learning in critical domains has intensified concerns about algorithmic bias and the fairness-performance trade-off. While existing literature often focuses on mitigation techniques, model evaluation still tends to assess performance and fairness in isolation. This study proposes a unified evaluation framework centered on a Fairness Score (FS) on a common [0, 1] scale. The framework builds on the Joint Fairness-Performance Index (JFPI), which combines predictive performance (F1-score) and a fairness composite derived from Demographic Parity Ratio, Equalized Odds Ratio, and Predictive Rate Parity. The framework then applies structural adjustments that penalize extreme imbalance and enforce a minimum justice requirement only when a model violates it. We treat the minimum justice threshold as the exogenous policy input and illustrate it with the Four-Fifths Rule. Given this requirement, the framework calibrates the weighting parameter a endogenously from the evaluated model set, rather than treating it as a discretionary preference. We evaluate the framework through controlled Monte Carlo experiments on synthetic credit-risk datasets and assess external validity on the German Credit benchmark. The empirical analysis shows that high-capacity models, especially neural networks, dominate the Pareto frontier in high-bias regimes and outperform linear baselines. Among mitigation strategies, pre-processing interventions provide the most robust results by improving fairness while preserving model stability and predictive integrity. The proposed framework provides a practical tool for responsible and reliable automated decision-making under explicit institutional fairness requirements.