Image for lesson 31.png
MIT is developing a testing framework to help detect AI making unfair decisions. Photo: Midjourney

Artificial intelligence is increasingly being applied to optimize decisions in critical contexts. For example, an autonomous system can suggest the most cost-effective power distribution plan while maintaining voltage stability.

However, is a “technically optimal” solution truly fair? What happens if a low-cost strategy makes low-income areas more vulnerable to power outages than wealthier areas?

To help stakeholders detect ethical risks early before implementation, the MIT research team developed an automated assessment method that balances quantitative indicators (such as cost and reliability) with qualitative values ​​(such as fairness).

This system separates objective evaluation from user-defined human values, and uses a large language model (LLM) as a human “representative” to record and integrate stakeholder priorities.

The adaptive evaluation framework will select the most important scenarios for further analysis, simplifying a process that would be costly and time-consuming if done manually. These scenarios can indicate when an AI system aligns with human values, as well as when it fails to meet ethical criteria.

According to Chuchu Fan (MIT), simply setting rules or "safety barriers" for AI is insufficient, as these only prevent risks that humans can foresee. Therefore, a systematic approach is needed to detect "unknown risks" before they cause consequences.

Ethical evaluation in complex systems

In large systems like power grids, assessing the ethical appropriateness of AI-generated proposals is challenging, especially when multiple objectives must be considered simultaneously.