Here’s how ML underwriting fits within Federal Model Risk Management guidelines
Federal banking regulators have yet to issue official rules about the use of AI and machine learning in credit underwriting, but on Monday, the Office of the Comptroller of the Currency explicitly called out the use of AI and machine learning (ML) in credit for the first time in its semi-annual report on the key issues facing the national banking system.
A small stone, with a big ripple
The financial services industry has been increasing adoption of ML for a range of applications.
ML models are powerful outcome predictors because they make use of more data than traditional models, and apply math theories to correlate multiple variables and the relationships among them. ML technologies have the potential to bring more unbanked and underbanked consumers into the financial system, enhance access to fair credit, and contribute positively to the overall safety of the economic ecosystem.
However, increased predictive power comes with increased model risk and complexity. In its report this week, the OCC wrote: “Bank management should be aware of the potential fair lending risk with the use of AI or alternative data in their efforts to increase efficiencies and effectiveness of underwriting. It is important to understand and monitor underwriting and pricing models to identify potential disparate impact and other fair lending issues. New technology and systems for evaluating and determining creditworthiness, such as machine learning, may add complexity while limiting transparency. Bank management should be able to explain and defend underwriting and modeling decisions.”
Zest AI couldn’t agree more about the need to “explain and defend” complicated ML models.
Several lenders are already using ML for credit underwriting, but many more are waiting for more clarity about how the current regulatory framework applies to AI credit decisioning. The OCC and Federal Reserve last issued supervisory guidance around managing model risk in 2011, largely before ML was used in financial services, and thus the guidance does not explicitly address ML models.
To help bridge this gap, and facilitate the industry’s transition to ML, we’ve developed the following FAQ to address questions about how ML fits within the existing Model Risk Management (MRM) guidance, especially for AI credit underwriting.
An overview of Model Risk Management (MRM)
Model development, implementation, and use
Is it acceptable to use ML models in high stakes financial services decision-making?
Yes. Nothing in the guidance precludes the use of ML models. The guidance applies to a financial institution’s use of any “model,” which it defines as a “quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates.” ML fits squarely within this definition.
Fairness, anti-discrimination, and safety and soundness goals tend to support the use of more predictive models, including ML models. ML models can make more accurate and fair credit risk assessments using better math and data insights. Consequently, ML’s enhanced predictive power has the potential to safely expand access to credit while reducing losses and systemic risk.
However, ML-based credit risk models must be validated, documented, and monitored using methods appropriate to the modeling approach selected in order to comply with the principles articulated in the guidance.
Can you use as many variables as desired in a model?
Yes. The guidance does not address or limit the number of variables that may be used in a model, and nothing in the guidance suggests that fewer variables necessarily decreases risk. ML models can consider many more variables than traditional methods, which is a key reason why ML models often provide greater predictive power compared to traditional models.
The same data review and documentation practices outlined in the guidance still apply to ML models even though ML models consider many more variables than traditional models.
Can model developers analyze vastly more variables and still comply with the guidance?
Yes. As the guidance states: “Developers should be able to demonstrate that such data and information are suitable for the model and that they are consistent with the theory behind the approach and with the chosen methodology. If data proxies are used, they should be carefully identified, justified, and documented.”
Variables should be reviewed for unexpected and/or inconsistent distributions, mappings, and other data degradation issues. The guidance calls for documentation of these review methods and descriptions of the assumptions and theoretical basis for their use to ensure that fair lending laws are being followed. ML models consider hundreds or even thousands of variables, so it may be impractical to review all of them manually, but automated variable review is a helpful way to support comprehensive analysis and documentation of the data and the model.
Model validation
The guidance does not prescribe any specific method for validating any model — including a ML model. Nonetheless, the guidance sets out a core framework for effective model validation: evaluation of conceptual soundness, ongoing monitoring, and outcomes analysis.
What methods are permissible for assessing the soundness of an ML model?
Effective ML model evaluation techniques should be efficient and tractable, and designed to test the how ML models actually work. Such techniques should also assess the impact of multivariate interactions because ML models evaluate such interactions. Appropriate methods of evaluating ML models include techniques derived from game theory, multivariate calculus, and probabilistic simulation.
Certain conventional evaluation methods described in the guidance would, if applied to ML models, be ineffective and would likely produce misleading results. For example, one of the testing methods identified by the guidance is sensitivity analysis. Common implementations of sensitivity analysis include exploring all combinations of inputs and permuting these inputs one-by-one (univariate permutation) in order to understand the influence of each variable (or a combination thereof) on model scores. Exploring all combinations of inputs (exhaustive search) is computationally infeasible for most ML models. Univariate permutation (permuting inputs one-by-one), while more computationally tractable, yields incorrect results for ML models that capture and evaluate multivariate interactions.
How do the guidance’s monitoring standards apply to ML models?
The guidance calls for ongoing model monitoring: “Such monitoring confirms that the model is appropriately implemented and is being used and is performing as intended…” The guidance further states: “Many of the tests employed as part of model development should be included in ongoing monitoring and be conducted on a regular basis to incorporate additional information as it becomes available.”
A thorough approach for monitoring ML models should include:
- Input distribution monitoring – Recent model input data may be compared with model training data to determine whether incoming credit applications are significantly different from model training data. The more that live data differs from training data, the less accurate the model is likely to be.
- Missing input data monitoring – Comprehensive model monitoring should include monitoring for missing input data. A complete model monitoring program should monitor and trigger alerts to monitors and validators when the rate of missing data, and its impact on model outputs and downstream business outcomes, exceed pre-defined thresholds.
- Output distribution monitoring – Model outputs should be monitored by comparing distributions of model scores over time. Monitoring systems should compute statistics that establish the degree to which the score distribution has shifted from the scores generated by the model in prior periods such as those contained in training and validation data sets.
- Execution failure monitoring – Error and warning alerts generated during model execution can indicate flaws in model code that may affect model outputs. The causes of such alerts should be investigated and identified, and appropriate remediation should be implemented where necessary.
- Latency monitoring – Model response times should be monitored to ensure model execution code and infrastructure meet the latency requirements of applications and workflows that rely on model outputs. Establishing clear latency objectives and pre-defined alert thresholds should be part of a comprehensive model monitoring management program.
- Economic performance monitoring – A complete ML model monitoring solution should include business dashboards that enable analysts to configure or pre-define alert triggers on key performance indicators such as default rate, approval rate, and volumes. Substantial changes in these indicators can signal operational issues with model execution and, at a minimum, should be investigated and understood in order to manage risk.
- Reason code stability – Reason codes explain the key drivers of a model’s score. Reason code distributions should be monitored because material changes to the distributions can indicate a change in the character of the applicant population or even in the decision-making logic of the ML model.
- Fair lending analysis – Machine learning models can develop unintended biases for a variety of reasons. To ensure that all applicants are treated fairly and in a non-discriminatory manner, it is important to monitor loan approvals, declines, and default rates across protected classes. Because of the possibility of bias and the advanced predictive fit of ML models, monitoring of these models should occur in real time.
Should model monitoring include automation?
Yes. The guidance states: “monitoring should continue periodically over time, with a frequency appropriate to the nature of the model.” Given the complexity of ML models, automated model monitoring, which can run concurrently with model operations, is essential to meet the expectations set by the guidance, especially when combined with multivariate input monitoring and alerts.
Governance, policies, and controls
How do the guidance documentation requirements apply to ML models?
As the guidance states, “documentation of model development and validation should be sufficiently detailed so that parties unfamiliar with a model can understand how the model operates, its limitations, and its key assumptions.”
In the case of ML models, documenting how a model operates, its limitations, and its key assumptions requires using explainability techniques that accurately reveal how the model reached its decisions and why. Lenders should ensure that they use explainability methods that accurately explain how a model operates.
The most commonly used explainability methods are unable to provide accurate explanations. For example, some methods (e.g. LOCO, LIME, PI) look only at model inputs and outputs, as opposed to the internal structure of a model. Probing the model only externally in this way is an imperfect process leading to potential mistakes and inaccuracies. Similarly, methods that analyze refitted and/or proxy models (e.g. LOCO and LIME), as opposed to the actual final model, result in limited accuracy. Explainability methods that use “drop one” or permutation impact methods (e.g. LOCO and PI) rely on univariate analysis, which fails to properly capture feature interactions and correlation effects. Finally, methods that rely on subjective judgment (e.g. LIME) create explanations that are both difficult to reproduce and overly reliant on the initial judgment. These errors in explanation cause model accuracy to suffer. Even slight inaccuracies in explanations can lead to models that discriminate against protected classes, are unstable, and/or produce high default rates.
Appropriate explainability methods rely on mathematical analyses of the underlying model itself, including high-order interactions, and do not need subjective judgment.
Should model documentation include automation?
Yes. Although the guidance is silent on whether model documentation may be generated automatically, automated model documentation is the most practical solution for ML models.
ML model development is complex, and operationalizing and monitoring ML models is even harder. It is not feasible for a human, unaided, to keep track of all that was done to ensure proper model development, testing and validation. There are tools to automate model documentation for review by model developers, compliance teams, and other stakeholders in the model risk governance process. Given the number of variables in ML models, automated documentation is likely to provide a higher degree of accuracy and completeness than manual documentation. In general, participants in model risk management should not rely upon manually generated documentation for ML models.
Are there other best practices for ML Model Risk Management?
Yes. The guidance makes clear that the quality of a bank’s model development, testing, and validation process turns in large part on “the extent and clarity of documentation.” Therefore, model documentation should be clear, comprehensive, and complete so that others can quickly and accurately revise or reproduce the model and verification steps. Documentation should explain the business rationale for adopting a model and enable validation of its regulatory compliance.
Records of model development decisions and data artifacts should be kept together so that a model may be more easily adjusted, recalibrated, or redeveloped when conditions change. Such artifacts include development data, data transformation code, modeling notebooks, source code and development files, the final model code, model verification testing code, and documentation.
Model documentation should be clear, comprehensive, and complete so that others can quickly and accurately revise or reproduce the model and verification steps. Documentation should explain the business rationale for adopting a model and enable validation of its regulatory compliance.