Machine Learning Fairness: Public Demands Human Oversight When AI Models Disagree

Recent research from the University of California San Diego and University of Wisconsin–Madison reveals critical insights about public expectations for algorithmic decision-making in high-stakes contexts. The study, presented at the 2025 ACM CHI conference, explored how ordinary people react when multiple high-accuracy machine learning models reach different conclusions for identical applications. The findings challenge both current industry practices and academic assumptions about fair automated decision-making, with direct implications for Thailand’s rapidly expanding use of AI systems in financial services, employment, and government programs.

The research addresses “multiplicity” in machine learning—the phenomenon where many different models can achieve comparable accuracy while producing different predictions for individual cases. This technical reality creates ethical dilemmas when organizations must choose which model’s decision to implement, particularly for consequential decisions affecting loans, jobs, or social services. The study’s significance for Thai readers stems from the Bank of Thailand’s active development of AI risk management guidelines for financial service providers, signaling regulatory attention to exactly these fairness concerns.

Public Preferences Challenge Industry Standards

The CHI study conducted multiple experiments with thousands of participants across various decision-making scenarios including loan applications, hiring decisions, and university admissions. Researchers found three critical patterns in public expectations. First, participants strongly rejected current industry practices of selecting single “best” models without explanation when multiple equally accurate models disagree about individual cases. This challenges the standard machine learning development approach where teams typically choose one model based on cross-validation metrics and deploy it without considering alternatives.

Second, study participants overwhelmingly rejected simple randomization—essentially flipping coins among equally performing models—as an acceptable tie-breaking mechanism in high-stakes contexts. This finding contradicts some academic proposals suggesting randomization provides neutral arbitration when models perform equivalently. Participants viewed randomization as abdication of institutional responsibility rather than fair resolution of algorithmic disagreement.

Third, participants strongly supported remedies requiring organizational accountability and transparency. These included searching across broader sets of models to identify those aligning better with fairness objectives, and involving human decision-makers to adjudicate disagreements rather than leaving outcomes to opaque algorithmic choices. The study lead researcher noted that these preferences “contrast with standard practice in ML development and philosophy research on fair practices,” highlighting gaps between technical approaches and public expectations.

Implications for Thailand’s Financial Sector

Thailand’s financial sector increasingly relies on automated scoring and digital lending platforms using machine learning to accelerate decisions, expand access, and reduce costs. However, if different vendors or internal teams could have chosen alternative models yielding opposite outcomes for identical loan applications, Thai consumers may face arbitrary differences undermining trust in digital financial services. The Bank of Thailand’s draft risk management guidelines emphasize governance, transparency, and human oversight for high-risk AI applications—elements directly aligned with the CHI study’s recommendations.

The cultural context makes these findings particularly relevant for Thailand. Thai society places high value on perceived fairness, community harmony, and relational accountability. Institutional arbitrariness often provokes trust loss and informal reputational consequences. In public services, citizens expect clear reasoning for decisions, and opaque automated rejections can amplify perceptions of unfairness. Traditional practices of appealing decisions through social networks and media to pressure institutions for accountability fit naturally with the study’s finding that people prefer human review and transparent adjudication over sealed algorithmic processes.

Thai customers experiencing unfair treatment by banks or government offices frequently turn to social media and community networks to voice grievances, creating reputational pressure on institutions. The CHI study’s findings suggest this cultural pattern reflects deeper expectations about institutional accountability that automated systems must accommodate rather than bypass.

Technical Implementation and Practical Solutions

The researchers recommend several practical measures that Thai institutions can implement immediately to address multiplicity concerns. First, organizations should expand model search procedures beyond training single “best” models to explore broader sets of possibilities, assessing whether different models systematically disadvantage particular demographic groups. This approach requires additional computational resources but provides crucial fairness information.

Second, institutions should introduce multiplicity audits into development pipelines, measuring outcome variability across different models and identifying cases where model selection determines final decisions. These audits help organizations understand the extent to which their choices affect individual applicants, enabling informed decisions about when human review becomes necessary.

Third, organizations should require human adjudication for high-stakes or borderline decisions where model disagreement occurs. This ensures disputed cases receive transparent, accountable resolution rather than arbitrary algorithmic selection. Such processes must include clear guidelines for human reviewers and accountability mechanisms to prevent bias introduction.

Fourth, institutions should document and disclose decision-making processes to affected individuals, including information about whether multiple models were considered and how disagreements were resolved. This transparency enables affected parties to understand and potentially appeal decisions while building public trust in automated systems.

Consumer Rights and Advocacy

For Thai consumers and applicants, the study suggests concrete actions when facing automated decisions. When applying for loans, employment, university admission, or government benefits and receiving automated denials, applicants should ask providers about algorithmic usage and processes for handling model disagreements. Requesting human review and clear explanation of decision reasoning becomes both a consumer right and quality assurance mechanism.

Under the Bank of Thailand’s draft guidance, financial institutions should prepare to explain AI risk management practices and provide human oversight for high-risk decisions. Citizens and consumer advocacy groups should utilize public consultation processes to advocate for explicit protections against multiplicity-based arbitrariness. Civil society organizations and media can pressure banks and platforms to publish multiplicity audit results and avoid black-box deployment in sensitive domains.

Consumer education campaigns could help applicants understand their rights and the limitations of algorithmic decision-making. Teaching people to recognize when they might benefit from human review and how to request explanations creates informed consumer bases capable of demanding accountability from automated systems.

Regulatory and Policy Development

The interaction between public expectations, technical research, and Thai regulatory development will shape multiplicity handling practices across the country. The Bank of Thailand’s draft guidance represents promising beginning; if final rules emphasize transparency, governance, and human oversight for high-risk applications while requiring documentation of model selection and multiplicity audits, Thai financial institutions could establish regional leadership in responsible AI deployment.

Technology vendors and data scientists will need to adapt development practices to include multiplicity metrics and provide interfaces enabling human adjudicators to interpret and resolve algorithmic disagreements. This may require substantial changes to current development workflows, but could improve both fairness outcomes and institutional accountability.

Policymakers should consider extending multiplicity protections beyond financial services to employment, education, and government benefit decisions. Consistent standards across sectors would provide clearer expectations for both organizations and citizens while preventing regulatory arbitrage where problematic practices migrate to less regulated domains.

Research and Development Priorities

Future research priorities important for Thailand include testing micro-multiplicity and audit programs in local institutional settings to determine whether transparency and human oversight measures reduce complaints, improve satisfaction, or enhance decision quality. Studies should examine cost-effectiveness comparisons between current single-model deployment and expanded multiplicity management approaches.

Research should also investigate cultural adaptation of fairness concepts, examining whether Thai cultural values around community harmony, institutional respect, and collective decision-making suggest different optimal approaches to algorithmic accountability than those developed in Western contexts. Such research could inform policy development that leverages rather than conflicts with existing social expectations.

International collaboration becomes valuable as countries develop regulatory responses to similar technological challenges. Thailand could participate in comparative studies examining different approaches to algorithmic accountability, contributing to global knowledge while adapting international best practices to local contexts.

Conclusion

The CHI 2025 study reveals fundamental expectations about accountability and fairness when algorithms disagree, with direct relevance for Thailand’s expanding use of AI systems across finance, employment, and government services. The research demonstrates that people expect institutional responsibility rather than randomness or opacity when equally accurate models produce different outcomes for individual decisions.

For Thailand, implementing these insights requires coordinated action across regulatory development, industry practices, and consumer education. Regulators should incorporate multiplicity auditing and human oversight requirements into AI governance frameworks. Financial institutions and other organizations should expand model search procedures, implement transparency measures, and develop human adjudication processes. Consumers should be empowered to demand explanations and human review for important automated decisions.

Successfully integrating these approaches could position Thailand as a regional leader in responsible AI deployment, building public trust in automated systems while maintaining accountability and fairness standards that reflect Thai cultural values and social expectations.

Machine Learning Fairness: Public Demands Human Oversight When AI Models Disagree

Public Preferences Challenge Industry Standards

Implications for Thailand’s Financial Sector

Technical Implementation and Practical Solutions

Consumer Rights and Advocacy

Regulatory and Policy Development

Research and Development Priorities

Conclusion

Related Topics

Related Articles

When the stakes are high: new study finds people distrust single AI models and want human oversight when algorithms disagree

New Insights Reveal Why Human Brains Outthink Artificial Intelligence

New study warns “emotionally smart” AI can make us see people as less human — and more disposable