Ethical assessment of autonomous systems

MIT researchers have developed a testing framework to help identify situations where AI decision-making support systems fail to ensure fairness for individuals and communities.

VietNamNet•23/04/2026

Image for lesson 31.png — MIT is developing a testing framework to help detect AI making unfair decisions. Photo: Midjourney

Artificial intelligence is increasingly being applied to optimize decisions in critical contexts. For example, an autonomous system can suggest the most cost-effective power distribution plan while maintaining voltage stability.

However, is a “technically optimal” solution truly fair? What happens if a low-cost strategy makes low-income areas more vulnerable to power outages than wealthier areas?

To help stakeholders detect ethical risks early before implementation, the MIT research team developed an automated assessment method that balances quantitative indicators (such as cost and reliability) with qualitative values (such as fairness).

This system separates objective evaluation from user-defined human values, and uses a large language model (LLM) as a human “representative” to record and integrate stakeholder priorities.

The adaptive evaluation framework will select the most important scenarios for further analysis, simplifying a process that would be costly and time-consuming if done manually. These scenarios can indicate when an AI system aligns with human values, as well as when it fails to meet ethical criteria.

According to Chuchu Fan (MIT), simply setting rules or "safety barriers" for AI is insufficient, as these only prevent risks that humans can foresee. Therefore, a systematic approach is needed to detect "unknown risks" before they cause consequences.

Ethical evaluation in complex systems

In large systems like power grids, assessing the ethical appropriateness of AI-generated proposals is challenging, especially when multiple objectives must be considered simultaneously.

Current methods often rely on readily available data, but data labeled according to ethical criteria is rare. At the same time, ethical values and AI systems are constantly changing, quickly rendering static evaluation methods obsolete.

The research team developed an experimental design framework called SEED-SET, which consists of two parts:

- Objective model: evaluates performance based on measurable indicators (such as costs)
- Subjective model: reflects human judgment (such as feelings of fairness)
This approach allows for the identification of scenarios that meet both technical criteria and human values, or vice versa.

In particular, SEED-SET does not require pre-existing evaluation data and can adapt to a wide range of objectives. For example, in an electricity system, different user groups (such as rural communities and data centers) may have different ethical priorities despite both desiring affordable and stable electricity.

Modeling subjective factors

To evaluate subjective factors, the system uses LLM as a representative of the evaluator. The preferences of each group are encoded into natural language statements.

LLM will compare scenarios and select the more appropriate option based on ethical criteria. This approach helps avoid human fatigue and inconsistency when evaluating hundreds or thousands of scenarios.

SEED-SET then uses the selected scenarios to simulate the system (e.g., power distribution strategy) and continues searching for new scenarios with higher evaluation value.

The end result is a set of typical scenarios, allowing users to analyze the performance of the AI system and adjust their strategy as needed.

For example, the system could detect instances where power distribution prioritizes high-income areas during peak hours, making disadvantaged areas more vulnerable to power outages.

Effectiveness and future development

When tested on real-world systems such as smart grids or urban traffic management, SEED-SET generates twice as many optimal scenarios as traditional methods, while also detecting more situations that other methods miss.

Notably, when user preferences change, the scenarios generated by the system also change significantly, demonstrating a high degree of adaptability to human values.

In the future, the research team plans to conduct studies with real users to assess the system's usefulness in the decision-making process. Simultaneously, they aim to expand the methodology to more complex problems, such as evaluating the decisions of larger language models.

This research was partially funded by the U.S. Defense Advanced Research Projects Agency (DARPA).

(According to MIT News)

Source: https://vietnamnet.vn/danh-gia-dao-duc-cua-cac-he-thong-tu-hanh-2508477.html