How to make sure results from AI lending models hold up over time
Lenders everywhere are switching from legacy credit scoring to AI-driven underwriting models. Why? AI-based models produce faster decisions 24/7 and generate more good loans and fewer bad ones. We’ve recently written about how to compare the statistical outperformance of AI-driven models over traditional models and how to translate those improvements into revenue and profit gains for your business.
Knowing that an AI model outperforms a benchmark model is nice, but you want to ensure that this statistical superiority holds up over time. The way to do that is to analyze the AI model’s KS score over time and compare its AUC to your benchmark model across the entire period you used to test the model, using that same target (KS and AUC are common measures of statistical accuracy). Having slight variation between months is okay, but wide variability could indicate an overfitted model.
Here’s an example of what the outcomes of a successful stability tests looks like:
The results on the top show excellent stability over a six-month test period for all three product models. Score accuracy is holding up well. The second chart analyzes the AUC of the AI/ML models (aggregated) against benchmark generic scores over a similar test period. Again, the AI model outperforms the benchmark model consistently over time.
You should always ask about model and score stability over time when you’re considering or reviewing an internal or vendor model. Zest AI’s technology builds these kinds of powerful and durable models — even with a limited set of performance data — by using a proprietary reject inference method that uses a rules-based approach to augment the data from funded and unfunded populations on which the models are trained. The result: Resiliency that reduces your need for costly refits or rebuilds.
One small note: Accuracy and performance are not the end of your analysis. You should also be thinking about the cost to document your model and how you will set up performance monitoring to ensure the model is working.
Regarding responsible monitoring, any AI/ML modeling solution you’re looking to buy or build should come with multivariate input monitoring to ensure that the distribution of features in production matches expectations, as well as output monitoring to ensure the score distribution is consistent versus training and your score cut-off decisions are still valid. You’ll also want performance monitoring to highlight the ongoing economic and technical performance of the model.