WIRED held a great virtual event called RE: WIRED where esteemed machine learning thought leader Kai-Fu-Lee spoke about the future of AI and its potential. The discussion covered a lot of aspects including privacy, transparency, and bias where ML is used.
One topic covered was the issue of “explainability” of AI, where Mr. Lee says that adoption of AI will increase in areas where there is less concern such as enterprise business AI. But in applications where there is much more concern regarding underlying “rationale” used in AI decisions, such as in healthcare, there is a growing expectation that AI be completely explainable.
One interesting statement he made was that if AI were so easy to explain, it wouldn’t be so powerful, so Mr. Lee suggests that rather than attempt to explain everything single feature and calculation involved, the industry needs to forget any efforts to explain thousands of different parameters to provide insight that is easy for a human to understand and is “reasonably accurate as possible”. Any requirement or expectation to explain the “black box” perfectly otherwise it will not be used is impractical. Rather it needs to be similar to how a human might explain their own decision-making process which isn’t perfect either.
Certainly, one way to address the issue is to have humans verify the results of AI, something commonly referred to as “human-in-the-loop”. Sampling and reviewing output on some statistical basis is a realistic way to verify system fidelity; incredible as it may seem, rather than statistical sampling, there are a large number of organizations using IDP that verify all results. This is in large part to overcome a general lack of reliability with legacy and even modern system’s ability to discern between accurate and erroneous data.
Presuming that systems reliability is significantly improved, another problem still faced within IDP is when a combination of AI and human-centered automation identifies disparities between the decisions staff would make vs. that made by AI. We have seen the misalignment with AI-based vs. human-based decisions in a variety of applications.
One is vote-by-mail where the application of AI to provide automatic verification of signatures on a ballot is increasingly popular. Yet, there is the expectation that either AI must not make mistakes or it cannot be used. For example, while most states using signature verification in elections rely solely upon humans where errors are almost certainly being made, where states have moved to adopt AI, some require the AI to make zero errors; if the system outputs a result that differs from a human election judge, the system is shut down. Using this logic, even in a situation where errors made by the system might be less than 1% while humans could be well over 10%, the system would still be shut down in favor of a completely manual process.
As Mr. Lee states, the right approach is for the industry to help set appropriate expectations of how AI performs, especially against humans, and to provide better insight into understanding why these systems behave the way they do. Even in less-critical document automation, trust in AI will be an issue without a change in approach.