Justice as Fairness in Artificial Intelligence: Aligning Neural Architecture Search with Rawlsian Ethical Principles
- Don Hilborn
- Jan 8
- 77 min read
Abstract
Artificial intelligence (AI) systems increasingly make high-stakes decisions in areas ranging from finance and employment to criminal justice. Ensuring these algorithmic decisions are fair and ethical has become a pressing concern for policymakers and technologists alike. This Article explores a novel interdisciplinary approach to algorithmic fairness by combining Neural Architecture Search (NAS) technology with John Rawls’s conception of justice as fairness. NAS, an automated method for designing AI model architectures, allows engineers to optimize models for multiple objectives—including not just accuracy or efficiency, but ethical criteria such as fairness. Rawls’s A Theory of Justice provides a philosophical foundation for defining those ethical criteria, emphasizing that social rules (or in this case, algorithmic decision rules) should be chosen under a “veil of ignorance” to ensure impartiality and protect the least-advantaged. By integrating Rawlsian principles into the objectives and constraints of NAS, we argue that AI systems can be designed to uphold fairness norms in a principled way. This Article provides a comprehensive analysis of how Rawls’s two principles of justice—equal basic liberties and the difference principle (maximizing benefits to the worst-off under conditions of fair equality of opportunity)—can guide the design and regulation of ethical AI. We examine the technical feasibility of encoding these principles in machine learning models (for instance, by maximizing the minimum utility across protected groups), survey recent research on fairness-aware NAS frameworks, and discuss how this approach aligns with broader efforts in AI ethics and law. In doing so, we engage with the works of thinkers from Aristotle and Kant to modern computer scientists, illustrating how enduring philosophical ideas can inform cutting-edge technology. We address potential counterarguments, such as conflicts between fairness and accuracy and the limits of technical solutions, and propose policy measures to ensure that Rawlsian ethical design becomes part of standard AI development and governance. Through this interdisciplinary approach, the Article shows that AI actions can be rendered ethical when viewed objectively by anyone impacted—essentially passing Rawls’s test of fairness from an original position. We conclude that a Rawlsian NAS paradigm offers a promising path toward AI that is not only intelligent but also just, and we outline regulatory frameworks to support this vision in practice.
Introduction
Lady Justice symbolizes the aspiration for impartial, fair decision-making—an ideal increasingly expected of AI systems. The rapid integration of artificial intelligence into society has raised urgent questions about how to ensure these systems act ethically and fairly toward all people affected by their decisions. From algorithms deciding who is offered a job interview or a loan, to machine learning models used in criminal sentencing and healthcare prioritization, AI’s impact on human lives is profound. Yet numerous incidents have shown that AI systems can inadvertently reproduce or even amplify biases and injustices present in their training data or design. The results can be troubling: credit algorithms offering lower credit lines to women or minorities, policing tools disproportionately flagging individuals from certain neighborhoods, or facial recognition systems that perform better on lighter skin tones than darker ones[1][2]. These examples underscore that algorithmic decisions, if left purely to computational optimization, may conflict with social notions of justice.
How can we ensure that AI models act in ways that any reasonable person impacted by their actions would consider ethical and just? This Article argues that part of the answer lies in marrying modern AI design techniques with classical principles of justice. In particular, we explore the convergence of Neural Architecture Search (NAS)—a cutting-edge technology for autonomously discovering high-performing neural network designs—with the moral philosophy of John Rawls, especially his famous framework of justice as fairness from A Theory of Justice (1971). Rawls’s theory provides a robust, objective standpoint (the “original position” behind a veil of ignorance) from which to evaluate fairness[3][4]. If an AI model’s decisions would be deemed fair by any person who did not know their own position in society, then that model’s actions meet Rawls’s criterion of justice. Our core proposal is that Rawlsian fairness principles can be explicitly incorporated into the design objectives of AI models via NAS, so that the resulting algorithms inherently strive to treat individuals fairly and protect the least advantaged in outcomes[5][6].
The approach we outline is inherently interdisciplinary. It requires translating philosophical ideals into technical design constraints and, conversely, using technical tools to operationalize ethics. This demands engagement with diverse intellectual traditions. The concept of fairness has deep roots: Aristotle discussed distributive justice and the idea of giving each their due, Confucius taught the Golden Rule of not imposing on others what you would not want for yourself, and Immanuel Kant emphasized the intrinsic dignity of individuals and the need for universal moral laws. These perspectives set the stage for modern theories like Rawls’s, which synthesizes liberty and equality through a rational choice thought experiment. In parallel, mathematicians and scientists from Euclid and Archimedes (masters of logical reasoning and early computation) to Ada Lovelace and Alan Turing (pioneers of algorithmic thought) built the foundations for today’s AI. Indeed, contemporary AI is a product of both philosophical insight and mathematical rigor—a fact symbolized by the legacy of figures like Isaac Newton, Carl Friedrich Gauss, Leonhard Euler, David Hilbert, Emmy Noether, Srinivasa Ramanujan, and Terence Tao, whose contributions to logic, calculus, symmetry, and problem-solving underpin the algorithms we now deploy. By invoking this wide cast of thinkers, we emphasize that ensuring AI models behave ethically is not a task isolated to computer science or law alone, but a grand collaboration across human knowledge.
This Article proceeds in seven parts. Part I (Background) provides an overview of the relevant philosophical and technical foundations: it summarizes Rawls’s theory of justice and its relevance to algorithmic decision-making, reviews the current challenges of AI bias and fairness, and explains what NAS technology is and how it can be used to optimize machine learning models. Part II (Rawlsian Justice and AI Ethics) delves deeper into how Rawls’s principles—particularly the equal liberty principle and the combination of fair equality of opportunity and the difference principle—can serve as normative criteria for ethical AI. We discuss how these principles might translate into measurable fairness metrics or constraints for AI systems, ensuring that AI’s “decisions” align with what a just planner behind the veil of ignorance would choose[3][7]. Part III (Neural Architecture Search for Fairness) explores the technical integration of those criteria into NAS. We describe how NAS algorithms can be configured to search for neural network architectures that maximize not only accuracy or efficiency but also fairness measures—yielding models that Pareto-dominate others on both accuracy and fairness[8]. We highlight recent research successes, such as fairness-aware NAS frameworks that have discovered architectures reducing bias in facial recognition and medical diagnosis without sacrificing performance[9][10]. Part IV (Argument: A Rawlsian Framework for Ethical AI Design) synthesizes the philosophical and technical insights into a coherent framework. We propose practical methods to enforce Rawlsian principles in AI (for example, by using a maximin objective to improve the worst-off group’s outcomes[11][5] or by imposing fair equality of opportunity constraints so that algorithms do not unjustly favor any demographic). This part also illustrates the approach with examples and hypothetical case studies, showing how a Rawls-guided AI might handle scenarios like loan approvals or resource allocations differently from a conventional AI. Part V (Counterarguments and Challenges) addresses potential objections. We consider the tension between fairness and other objectives like accuracy or efficiency, questioning whether a Rawlsian approach could unduly hamper performance or innovation. We discuss the limits of technical solutions to moral problems—acknowledging that algorithmic fixes alone cannot resolve all issues of justice[12]—and engage with alternative ethical theories (such as utilitarianism or deontological ethics), explaining why we focus on Rawls and how other perspectives can complement our framework. Part VI (Policy Proposals) examines the legal and regulatory implications. Drawing on emerging AI governance regimes (like the EU’s AI Act) and scholarship on algorithmic accountability, we propose policies to embed Rawlsian NAS principles into AI development norms. These include mandating fairness impact assessments, requiring that AI systems demonstrate compliance with equal opportunity standards, and ensuring transparency and avenues for individuals to challenge AI-driven decisions (a due process analog in algorithmic governance)[13][14]. We argue that regulators should encourage or require the use of tools like NAS to achieve these ethical standards, effectively bridging the gap between high-level principles and engineering practice. Part VII (Conclusion) reflects on the broader significance of aligning AI with justice. As AI becomes ever more powerful, the need for it to reflect our best ethical values becomes not only a technical challenge but a defining societal project. By showing how Rawls’s vision of a just society can find new life in the code and architecture of AI systems, we aim to inspire both optimism and concrete action toward AI that is objectively fair when viewed by all stakeholders. Ultimately, the union of Rawlsian theory and NAS technology exemplifies how ancient questions of justice can—and must—be addressed anew in the age of algorithms, ensuring that technological progress does not come at the expense of our fundamental moral commitments.
Background
Philosophical Foundations: From Aristotle to Rawls
The quest for ethical and just decision-making is as old as civilization. Philosophers across cultures have wrestled with questions of fairness, morality, and the proper distribution of benefits and burdens in society. A brief journey through this intellectual history provides context for Rawls’s contribution and why it is particularly suited to modern AI dilemmas.
Aristotle (4th century BCE) offered one of the earliest systematic accounts of justice. In his Nicomachean Ethics and Politics, Aristotle distinguished between distributive justice (how honor or wealth should be allocated among individuals in a polis) and rectificatory justice (how to correct wrongs or unequal transactions)[15]. For Aristotle, justice meant giving each person their “due” in proportion to their merit or need; it was a virtue aimed at achieving balance and harmony in society. This emphasis on proportional fairness echoes today’s concerns that algorithms not arbitrarily favor one group over another. Aristotle also believed the lawgiver’s role was crucial in crafting rules that cultivate virtue and the common good. This resonates with the idea that we might “code” certain virtues or principles (like fairness) into AI systems so that they inherently promote just outcomes.
Plato, Aristotle’s teacher, famously explored justice in The Republic. There, justice was conceived as a principle of social order—each class performing its proper role and not encroaching on others. While Plato’s vision was more about macro-level social structure than individual decision processes, his notion that a well-ordered system is one where each part is aligned with the good of the whole can inform thinking about AI in society. We might analogize that each algorithm (or each component of a complex AI system) should play its part in sustaining a fair social order, rather than undermining it with unchecked bias or arbitrary behavior.
Moving eastward, Confucius (6th–5th century BCE) in ancient China taught ethical principles emphasizing harmony, respect, and reciprocity. One of Confucius’s maxims, often called the Silver Rule, advises: “Do not impose on others what you do not wish for yourself.” This rule captures a spirit of impartiality and empathy that aligns with later Western concepts of the Golden Rule and even Rawls’s veil of ignorance. It suggests that a decision-maker should abstract away from personal interest and consider the perspective of others—a philosophical stance strikingly similar to Rawls’s requirement that just principles are those one would choose without knowing one’s own position[3][4]. In a sense, Confucius anticipated the importance of moral neutrality in decision-making, a lesson we can apply to AI by designing systems that do not privilege specific individuals or groups unfairly, just as a Confucian sage would counsel a ruler to be even-handed.
Immanuel Kant (18th century) provided another pillar in ethical theory, one that directly influences modern thinking about ethics in technology. Kant’s categorical imperative commands that we act only according to maxims that we could will to be universal laws, and that we treat humanity “never merely as a means to an end, but always at the same time as an end.” These ideas underscore respect for persons and the need for rules that hold universally (without favoritism or exception). In algorithmic terms, a Kantian might say an AI decision rule should be non-arbitrary and respect the autonomy and dignity of each person—for instance, not by making judgments based on characteristics that ought to be morally irrelevant (such as race or gender) and not using individuals instrumentally. Rawls was influenced by Kantian ethics; his framework can be seen as an attempt to formulate principles that any rational, free, and equal persons would agree to as the basis for social cooperation, which is very much in the spirit of Kant’s idea of laws acceptable to all rational beings[16][17]. Rawls’s original position is in part a procedural rendition of Kantian impartiality: it forces us to consider universality by stripping away particulars of identity.
In the modern era, John Rawls (1921–2002) stands as a central figure, and his work A Theory of Justice (1971) revitalized and refined the social contract tradition of political philosophy. Rawls asked: What principles of justice would free and rational people choose if they were placed in an original position of equality, behind a veil of ignorance that hides their own characteristics and social status? By depriving decision-makers of information about whether they themselves would be rich or poor, talented or less talented, belonging to a majority or minority group, etc., Rawls argued that we can identify principles of justice that are truly fair and impartial[3][4]. Because no one wants to end up disadvantaged, the chosen principles would ensure a fair treatment of all positions in society. The thought experiment builds on earlier ideas (as noted, it has antecedents in the Golden Rule, in Kant, and even in utilitarian economist John Harsanyi’s analyses), but Rawls’s specific outcome was distinctive: rather than pure utilitarianism, which might allow sacrificing the few for the many, the veil-of-ignorance rational choosers in Rawls’s account adopt two key principles of justice.
Rawls’s first principle is the Principle of Equal Basic Liberties: “Each person is to have an equal right to the most extensive scheme of equal basic liberties compatible with a similar scheme of liberties for others.”[18][6] In other words, fundamental rights and freedoms (freedom of speech, conscience, personal property, due process, etc.) must be guaranteed equally to all; liberty can only be limited for the sake of liberty (i.e., one person’s freedom can be curtailed only to protect the equal freedoms of others). This principle takes lexical priority in Rawls’s system—meaning no trade-offs can be allowed that infringe basic liberties for the sake of social or economic gains[19]. In the context of AI, this principle alerts us that some values (like privacy, freedom from discrimination, and the right to an explanation or appeal when decisions are made about us) may be inviolable constraints on algorithm design. For instance, an AI system that maximizes accuracy but does so by unjustly surveilling individuals or by denying people any chance to contest decisions would violate this Rawlsian principle. As scholars have noted, the use of algorithms in ways that obscure reasoning or deny recourse threatens individuals’ basic liberties and rights[20][21]. A Rawlsian perspective would demand that AI governance ensures transparency and accountability such that people can challenge and understand algorithmic decisions—effectively extending due process into the algorithmic realm[13][14].
Rawls’s second principle addresses social and economic inequalities. It has two parts, often referred to as (a) the difference principle and (b) the principle of fair equality of opportunity[18][22]. Rawls states this second principle as: “Social and economic inequalities are to be arranged so that they are both (a) to the greatest benefit of the least advantaged, and (b) attached to offices and positions open to all under conditions of fair equality of opportunity.”[6] Part (b) requires that if there are opportunities (jobs, education, etc.), they must be genuinely accessible to all—this goes beyond mere formal non-discrimination (it’s not enough that jobs are legally open to all; there must be real, substantive opportunity for those with the same talent and ambition to attain them, regardless of their social background). Part (a), the difference principle, is especially distinctive: it permits departures from strict equality only if those departures improve the lot of the least advantaged group in society. In effect, Rawlsian justice doesn’t insist on equal outcomes, but it does insist that any inequalities work to everyone’s advantage, especially the worst-off[23]. For example, higher pay for doctors is justifiable if it leads to better healthcare that raises the standard of the poorest, but any inequality that simply makes the rich richer without helping the poor would be unjust. This is a maximin rule (maximize the minimum), focusing on elevating the floor of society rather than the average or the total sum of welfare[11][5].
The second principle also has a lexical priority: Rawls specifies an order—first the Liberty principle, then equality of opportunity, then the difference principle[19]. That means we cannot violate equal rights for the sake of opportunity or distribution, nor can we violate fair opportunity for the sake of economic gain for the worst-off. All must be satisfied, but in cases of conflict, earlier principles outweigh later ones.
Rawls’s principles have been hugely influential not only in political philosophy but also in allied fields—economics, law, public policy. They present a conceptual test for fairness that we can attempt to apply to institutions and rules: Are we comfortable that a given rule would be chosen by rational people who do not know whether they personally will be on the winning or losing side of it? Rawls thought that his two principles would indeed be chosen over utilitarian maximization of average welfare, because behind the veil of ignorance people are averse to the risk of being the big loser in an unequal society[24][25]. (Notably, some economists like John Harsanyi disagreed, arguing a rational person behind the veil might use expected utility and choose utilitarianism, but Rawls maintained that the uncertainty in his scenario was of a special kind that would lead to a maximin choice favoring the worst-off[26][25]. This debate aside, the spirit of Rawls’s construction is widely embraced: fairness means not designing rules to favor oneself.)
The veil of ignorance idea has direct analogues in how we might approach AI fairness. It suggests that if we can get an AI to act as if it doesn’t know who it’s dealing with in terms of social identities or arbitrary traits, it could make more impartial decisions. We will later see that some technical approaches literally try to implement a “veil of ignorance” by removing sensitive attributes from data or otherwise preventing the model from using them[27][28]. The veil also highlights the moral intuition that anyone impacted by an AI’s action should, if they step back and view the decision objectively, find it acceptable and justifiable. This is essentially the user’s request phrased in Rawlsian terms: we want AI actions that would be considered ethical if viewed by any person affected, with an objective lens. Rawls provides the lens.
Before moving on, it’s useful to note that Rawls’s ideas, while powerful, are not the only moral theories relevant to AI. Utilitarianism (proposed by philosophers like Jeremy Bentham and John Stuart Mill) would urge us to evaluate AI by its consequences—perhaps allowing some bias if it leads to greater overall good. By contrast, deontological ethics (like Kant’s) focuses on duties and principles, aligning with the idea that certain things (like violating rights or human dignity) are wrong regardless of outcome. Virtue ethics, inspired by Aristotle, would ask what a “virtuous” AI or AI developer would do (e.g. show fairness, compassion, honesty). And non-Western philosophies (like Ubuntu in African ethics or Buddhist principles) might stress community harmony or compassion as metrics of ethical AI. This Article focuses on Rawls not to the exclusion of these perspectives, but because his framework of justice as fairness offers a particularly salient and structured way to think about equality and ethics in automated decisions. Notably, Rawls directly addresses how to handle inequality and impartiality, which are central problems in algorithmic bias debates. Moreover, Rawlsian thinking has begun to permeate discussions of AI ethics and law, as scholars recognize that current fairness metrics in machine learning echo themes from distributive justice theory[29]. By using Rawls’s principles as a guide, we can ground the nebulous concept of “ethical AI” in a well-established theory of justice that has objective criteria.
AI Ethics and Algorithmic Fairness: Challenges in Machine Decisions
Even as philosophers developed ideals of justice, societies have always struggled to put those ideals into practice. Today’s arena for such struggle is increasingly the realm of algorithms. AI systems—especially those based on machine learning—learn patterns from historical data and then apply those patterns to make decisions or predictions. While powerful, this approach carries an inherent risk: any biases or inequities present in past data can be ingested and perpetuated by the AI. Furthermore, even a seemingly neutral algorithm can yield unequal outcomes due to complex interactions with social reality (for example, a hiring algorithm might favor applicants from neighborhoods associated with higher income, indirectly disadvantaging certain racial or ethnic groups due to longstanding residential segregation).
Over the past decade, numerous studies and real-world cases have documented algorithmic bias and raised alarms about fairness:
Employment and Credit: One high-profile example was the case of the Apple Card’s credit algorithm, which in 2019 was reported to offer women significantly lower credit limits than men with similar financial profiles. This led to public outcry and regulatory scrutiny; it appeared the algorithm learned from historical credit data that reflected gender inequalities, thus perpetuating them. Such incidents underscore how “black box” algorithms can unintentionally encode discriminatory rules—rules that would likely not pass a Rawlsian impartiality test, since no one behind the veil of ignorance would agree to a financial system that consistently offers one gender worse terms[30]. Similar issues have arisen in hiring, where AI screening tools disfavored female applicants by picking up on subtle correlations between gender and past hiring decisions, or in online advertising where ads for high-paying jobs were shown more to men than women.
Criminal Justice: The COMPAS algorithm, used in parts of the United States to predict recidivism risk and inform bail or sentencing decisions, was found to have significant racial bias. An investigative report showed that Black defendants were more likely to be falsely labeled as high risk than white defendants, whereas white defendants were more often incorrectly labeled low risk – even when actual reoffense rates did not justify such a disparity. This kind of bias in predictive policing or judicial tools clearly conflicts with the ideal of equal treatment under the law, a cornerstone of justice. It illustrates how an algorithm can violate both equal liberty (if it affects people’s liberty through detention decisions) and fair opportunity (if it labels individuals in a biased way, affecting their chance for release or rehabilitation). A Rawlsian would object that no rational person behind a veil would accept a justice system that works to the systematic disadvantage of one race. The COMPAS example also sparked a debate about competing definitions of fairness (e.g., equal false positive rates vs. equal predictive value across races), demonstrating that defining fairness in practice is complex, and different metrics capture different conceptions of what is just[30].
Healthcare and Public Services: AI models used to allocate healthcare interventions or public resources have shown biases too. One health algorithm was found to systematically underestimate the health needs of Black patients compared to white patients at the same level of illness, because it used healthcare spending as a proxy for need—and historically, less money is spent on Black patients, reflecting unequal access to care. Such a system would direct resources away from populations that actually needed them more, failing Rawls’s difference principle (since it worsened the position of a disadvantaged group rather than improving it). The error here was subtle: the objective was mis-specified (spending ≠ need), leading to an unfair outcome. It is a cautionary tale that the choice of objective function in AI (what the algorithm is optimizing for) is crucial to fairness[31].
Facial Recognition and Biometrics: Studies by researchers (like Joy Buolamwini and Timnit Gebru in Gender Shades) found that commercial facial recognition systems had much higher error rates for darker-skinned and female faces than for lighter-skinned males. This bias stemmed from training data that was skewed toward lighter-skinned subjects and from models that perhaps had design choices not robust across demographics. If such technology is used for important tasks (like identifying suspects, or even unlocking smartphones for payments), it could lead to systematically worse service or higher risk of false accusations for certain groups—another clear failure of equal treatment. Furthermore, the very deployment of facial recognition raises liberty concerns, potentially conflicting with privacy and freedom if done without consent or oversight (Rawls’s first principle would insist that such basic rights not be infringed without compelling justification).
These examples collectively define the problem that algorithmic fairness as a field seeks to address: how can we detect, measure, and mitigate bias in AI systems to ensure more equitable outcomes? In response, a rich body of research in Fairness, Accountability, and Transparency (FAT) in AI has emerged. Researchers have proposed various formal definitions of fairness, such as:
Demographic Parity: The model’s decisions (e.g., positive loan approvals) should be statistically independent of protected attributes like race or gender. In other words, the percentage of positive outcomes should be the same across groups (unless justified by job-related or business necessity factors). This corresponds to a notion of “equality of outcome” in a narrow sense.
Equal Opportunity / Equalized Odds: Introduced by Hardt et al. (2016), this concept requires that the model’s error rates are equal across groups. For example, in a hiring context, the true positive rate (qualified candidates accepted) should be equal regardless of group, ensuring fair equality of opportunity in effect[32]. A variant is equal false positive rates or equal false negative rates. This ties to the idea of horizontal equity — treat equals equally. It resonates with Rawls’s focus on fair opportunity: those who are similarly qualified should have similar chances, no matter their demographic.
Calibration: Another metric says that for individuals with a given risk score, the probability of the outcome should be equal across groups. This is more about interpretability and consistency of probabilistic predictions.
Maximin Fairness: Some approaches explicitly take a Rawlsian angle by trying to maximize the minimum utility or accuracy across groups[33]. This aligns with Rawls’s difference principle. It doesn’t necessarily demand equal outcomes, but it ensures no group is left too far behind. If one group has significantly worse performance (say a higher error rate), improving that becomes the priority, even if it means a small sacrifice in the performance of a better-off group. This is arguably the most direct translation of Rawls into a fairness criterion.
Each of these definitions captures a different aspect of “fairness,” and a crucial insight (often called the impossibility theorem in algorithmic fairness) is that you cannot satisfy all fairness definitions at once except in trivial cases. For example, it’s been shown that except under special circumstances, you cannot have perfect calibration and equalized odds simultaneously if base rates (incident rates of the outcome) differ by group. This means trade-offs are inevitable[30]. It echoes the philosophical point that there are different conceptions of equality (equality of opportunity vs. equality of outcome vs. procedural fairness) that might conflict. What Rawls offers in this morass is a coherent prioritization: basic rights first, then genuine opportunity, then helping the worst-off. We can see the fairness definitions above in that light: some correspond to equal opportunity (equalizing true positive rates, for instance), while others correspond to outcomes or to overall parity. A Rawlsian perspective might favor those interventions that ensure a fair baseline for the disadvantaged (e.g., focus on maximin fairness or equal opportunity which protect those who might otherwise have lower success rates due to structural inequities). Importantly, Rawls also reminds us not to trample basic rights in the process. For AI, that means, for example, we shouldn’t impose fairness by violating privacy (like collecting intrusive personal data to adjust decisions) because that could infringe on fundamental liberties.
The law and policy landscape is beginning to respond to these challenges. Anti-discrimination law (like the Equal Credit Opportunity Act or employment laws) already prohibits certain disparate treatment by algorithms just as by human decision-makers. But many argue this is not enough, since current laws don’t always recognize disparate impact (unintentional bias) or require proactive bias testing. Some jurisdictions have begun adopting algorithmic accountability acts, requiring assessments of automated decision systems for bias. The EU’s draft Artificial Intelligence Act (AIA) is a notable example of a comprehensive regulatory approach. It classifies certain AI uses as “high-risk” (including many that affect people’s rights) and would mandate risk assessments and some level of transparency/fairness. However, as one Rawlsian analysis of the EU AIA points out, compliance alone may not ensure justice: the AIA sets baseline requirements but doesn’t fully address whether AI actually honors justice as fairness in outcomes[34]. That analysis suggests we need to go beyond compliance and embed ethical reflection (like Rawls’s principles) into the AI development lifecycle[34]. We will pick up this point in our policy discussion.
In summary, AI ethics and fairness is about diagnosing when algorithms are treating people unfairly (by various definitions) and finding ways to prevent or correct that. Solutions fall into three broad categories: 1. Pre-processing: modify the input data to remove bias (e.g., balance the training dataset, or “mask” sensitive attributes). 2. In-processing: change the learning algorithm itself to penalize unfair outcomes (e.g., add a term to the model’s objective function that measures disparity, or constrain the model to equalize certain metrics). 3. Post-processing: adjust the model’s outputs to mitigate bias (e.g., calibrate scores differently for different groups to equalize outcomes).
Each method has pros and cons. Importantly for this Article, the idea of using NAS intersects mostly with the in-processing approach: we are talking about changing how we design the model from the ground up.
Neural Architecture Search (NAS): Engineering AI with Objectives
Neural Architecture Search is a relatively recent innovation in machine learning that represents a significant step towards automating the design of AI models. Traditionally, human engineers and researchers craft the architecture of a neural network—deciding how many layers to use, what type (convolutional, recurrent, transformer, etc.), how they connect, how many neurons, and so forth. This process can be as much an art as a science, often relying on expert intuition or trial-and-error. NAS turns this process over to an algorithm, effectively having the computer “design itself” (within a predefined search space) by trying out many possible architectures and selecting the best.
Here’s how NAS generally works: First, one defines a search space of possible architectures. This might be all neural networks of a certain form (say, convolutional networks up to 20 layers, where each layer can be one of a few types, and connections can vary). Next, one defines a search strategy—how to explore the space. This could be using evolutionary algorithms (mutating and recombining network designs), reinforcement learning (an RNN controller that generates architectures as sequences of actions and gets rewarded if the architecture performs well), or more recently, gradient-based methods (like DARTS, which relaxes the search to a continuous optimization problem). Finally, one needs an evaluation metric to judge which architectures are better. In early NAS work, the metric was usually the validation accuracy on a task (like image classification) – essentially the same metric we train the network for. The NAS algorithm will propose architectures, train them (or a proxy for them), see how well they do on the metric, and use that feedback to propose better architectures over time[35][10].
NAS has been remarkably successful in some domains. For example, Google’s NASNet project auto-discovered convolutional networks that rivaled the best human-designed models for image recognition. It also has been applied to designing hardware-efficient models (finding small networks suitable for mobile devices), to natural language processing, and more. The advantage of NAS is that it can often discover non-intuitive architectures and handle the enormous combinatorial design space more systematically than a human could.
Crucially, NAS is not limited to optimizing for accuracy alone. Because we have control over the evaluation metric, we can make it a multi-objective or composite metric. Researchers have done NAS to optimize for accuracy and inference speed (so the found model is both accurate and fast) by combining metrics or setting constraints (like “maximize accuracy subject to latency < X ms”). They have done NAS for accuracy and model size (to fit on small devices). The same principle extends to accuracy and fairness: we can define a metric that includes how fair the model’s predictions are, or includes a penalty for disparate performance, etc.
For instance, suppose we have a fairness metric $F$ (which could be something like negative disparity between groups, so higher is fairer) and accuracy metric $A$. We could set up a joint objective: $J = A - \lambda \times \text{BiasPenalty}$, where $\lambda$ is a trade-off parameter that determines how much we prioritize fairness relative to accuracy. NAS could then attempt to find an architecture that maximizes $J$. Alternatively, we might use a Pareto optimization approach: search for architectures that lie on the Pareto frontier of the two objectives, meaning you can’t improve fairness without hurting accuracy and vice versa. One study’s NAS framework explicitly sought to output “a suite of models which Pareto-dominate all other competitive architectures in terms of accuracy and fairness”[8].
A concrete example of fairness-aware NAS is the FaHaNa framework (short for Fairness- and Hardware-aware Neural Architecture search) described by Sheng et al. (2022)[1][9]. This research was motivated by the observation that many existing neural networks had worse performance (higher error rates) on minority skin types in a dermatology diagnosis task – larger models tended to be more fair (smaller gap between light and dark skin accuracy) but were too heavy for mobile devices, whereas small models were efficient but had larger fairness gaps[36][37]. The authors identified an opportunity: use NAS to find a sweet spot – an architecture that remains lightweight yet has far better fairness than off-the-shelf small models. FaHaNa’s search used a multi-component objective: it considered accuracy, fairness, and hardware efficiency (latency on a device) simultaneously[38][10]. The search algorithm (which in their case involved an RNN controller suggesting blocks for a CNN, plus a training and evaluation loop[35]) found architectures that indeed improved fairness without compromising accuracy, all while reducing model size[10]. In their dermatology case, the resulting network (dubbed “FaHaNa-fair”) had a significantly smaller fairness gap between lighter and darker skin patient outcomes, yet was as accurate as larger networks and could run on a Raspberry Pi[39]. This demonstrates the potential of NAS to discover novel architectures that are inherently more fair.
Another study by Sukthanker et al. (2023) took a similar approach for face recognition networks[40][41]. They first analyzed a variety of existing architectures and hyperparameter settings to see their impacts on standard fairness metrics (like differences in false match rates across demographics). They observed that the conventional practice of just picking the model with highest overall accuracy was not yielding the fairest model—some architectures had slightly lower accuracy but significantly better fairness[8]. Based on this, they launched what they call the first neural architecture search for fairness, which jointly searches architectures and training hyperparameters to optimize both accuracy and fairness measures[8]. The outcome was a set of face recognition models that outperformed all known architectures in the accuracy–fairness trade-off sense (Pareto-dominating them). Moreover, these models generalized well: they remained fairer on other datasets without needing retraining[41]. This is an encouraging sign that certain architectural features (maybe the number of layers, or how features are learned in later layers[42]) can reduce bias inherently.
Why might architecture affect fairness at all? It’s an intriguing question, because one might think any sufficiently expressive neural network could just learn a biased or unbiased solution depending on data and training. However, architecture sets the representational and learning biases of a model. For example, a network that is too small might not be able to capture subtle patterns needed to avoid mistaking background correlations for real features, which could lead to higher error on minority groups if the majority pattern dominates. Larger or differently connected networks might learn a more nuanced decision boundary that serves all groups better (this was the observed trend in some cases: larger networks = more fairness[43]). Architecture also interacts with training dynamics; some architectures might overfit to majority group data or might amplify noise. So, exploring architectures is like exploring different solution spaces where some solutions are fairer.
NAS technology typically involves some heavy compute, since it may require training hundreds or thousands of candidate models. There are techniques to make it efficient (like weight sharing among candidates, lower fidelity evaluations, etc.). But an important point for our discussion is that NAS allows explicitly encoding human objectives into the search. If society decides “fairness toward protected groups” is a priority, NAS provides a systematic way to enforce that priority in model design: we give the NAS algorithm a definition of fairness and tell it to consider that in picking the best model. The algorithm might then surprise us with a design we wouldn’t have guessed that achieves a better balance than any off-the-shelf model. It’s akin to telling an automated architect “build me a courthouse that is not only tall (accuracy) but also accessible to people in wheelchairs (fairness)”—the resulting blueprint should meet both specs, possibly in a creative way (like adding ramps in a clever configuration).
In summary, NAS is a powerful tool for multi-objective optimization in AI design. Traditionally used for performance metrics, it can be steered towards ethical metrics too. This is a key enabler for our Rawlsian approach: we can think of Rawls’s principles as high-level objectives (or constraints) and ask NAS to find AI models that uphold those. Of course, doing so requires that we translate “uphold Rawls’s principles” into something measurable, which is a challenge we address next. Nonetheless, the existence of NAS means we are not limited to just tweaking existing models for fairness; we can proactively search the vast design space for models that are “fair by construction.”
With these foundations laid—the moral philosophy context, the nature of the AI fairness problem, and the technological capability that NAS provides—we can now turn to the heart of the inquiry: How do we integrate Rawls’s Theory of Justice with NAS to ensure AI behaves ethically?
Rawlsian Justice and Ethical AI: A Theory of Justice for Algorithms
John Rawls’s influence on discussions of algorithmic fairness has grown as scholars and practitioners grapple with aligning AI systems with societal values[29]. Rawls’s framework offers both conceptual clarity and normative guidance: it clarifies what we mean by a fair outcome and guides us toward how we might achieve it. In this part, we examine how each of Rawls’s two principles of justice can be interpreted in the context of AI systems, and how those interpretations can serve as ethical design goals.
Equal Basic Liberties in the Digital Realm
Rawls’s first principle, ensuring equal basic liberties for all, might seem at first to lie outside the scope of a machine learning model’s design—after all, liberties like free speech or freedom of thought are typically guaranteed by constitutions and laws, not algorithms. However, when AI systems mediate important decisions, they can either enhance or impede individuals’ effective enjoyment of their rights. A Rawlsian perspective compels us to ask: Does the deployment of a given AI system respect and reinforce the equal liberties of those affected, or does it undermine them?
One immediate application is in the arena of privacy and data rights. Privacy can be seen as a basic liberty in modern societies; individuals have a right to control information about themselves and not be subject to arbitrary surveillance. If an AI system requires invasive data collection or constant monitoring of individuals, a Rawlsian might question if that system is justifiable—would we agree to live under such surveillance from behind the veil of ignorance? Only if the surveillance were strictly necessary for greater liberty or fairness might it pass muster (for example, one could argue certain data collection is needed to ensure others’ rights, but the bar is high). The principle of equal liberty would suggest AI designers follow a privacy-preserving approach: use the minimal necessary data, employ techniques like differential privacy, and avoid encroaching on individual autonomy unless absolutely justified. In our context of NAS and model design, this could mean including privacy as a constraint or part of the objective when searching for architectures. For example, a NAS might be configured to prefer architectures that allow for on-device processing (keeping data local) or that degrade gracefully when certain sensitive inputs are removed—thus favoring designs that do not rely on violating privacy to achieve accuracy.
Another aspect of basic liberty is freedom from discrimination, which overlaps with fairness but is also a right in many legal systems (like the Equal Protection Clause in the U.S. or human rights law internationally). If an AI system discriminates against a protected class (say in lending or employment), it’s infringing on an individual’s right to equal treatment. Rawls’s first principle doesn’t explicitly list non-discrimination (it was formulated with political rights in mind), but in a modern interpretation, equal citizenship and basic rights certainly include the right not to be treated as second-class due to innate characteristics. For AI, this means that explicitly using protected attributes (like race, gender) in a way that harms a group’s interests would be a red line. One could design algorithms that simply exclude such features. However, as is well known, even without those features, algorithms can find proxies (e.g., ZIP code might proxy for race). This is why more nuanced fairness measures are needed. Ensuring respect for equal basic liberties might require algorithmic transparency and accountability measures as well: people should have the ability to know when AI is affecting their rights and to challenge those decisions. From a Rawlsian law review standpoint, one can argue that an algorithmic decision-making system that lacks a mechanism for an individual to contest or understand the decision is fundamentally at odds with the idea of citizens having equal status and rights[13][20]. Such an AI becomes an unaccountable authority over individuals, which we would not agree to under principles of justice. Therefore, any just AI framework might include a requirement for explainability or appeal processes, echoing the legal principles of due process and administrative justice.
Interestingly, these considerations have been explicitly raised in some policy contexts. For instance, the EU General Data Protection Regulation (GDPR) includes provisions often interpreted as a “right to explanation” or at least a right to be informed about automated decisions and to object to them. One can connect this to Rawls’s idea of public reason and understanding: in a well-ordered society, citizens have a common knowledge of what is just and unjust and can see that their institutions are just[44]. If algorithms are inscrutable and unchallengeable, we lack that public understanding. In Rawls’s words, “there is also a public understanding as to what is just and unjust” in a just society[44]. Applied to AI, we can interpret that as a mandate for algorithmic decisions to be explainable enough that people see (and believe) they are being treated fairly, and if not, can demonstrably call it out.
Thus, while the first principle might not translate into a single mathematical metric for NAS to optimize, it sets side-constraints and contextual requirements for ethical AI: - No algorithm should violate basic rights (e.g., an AI content filter shouldn’t censor political speech beyond what a human-run system could justify under law; a face recognition surveillance system shouldn’t be deployed in a way that crushes freedom of association or privacy without due process). - Algorithms should facilitate, or at least not hinder, accountability and transparency. In practice, when using NAS to design models, this might lead us to consider architectures that are more interpretable (some researchers have worked on NAS for interpretable models as well), or simpler models where appropriate, or at least to ensure the overall system design has a human-in-the-loop or appeals mechanism.
For example, if we imagine using NAS to build a decision system for government benefits eligibility, a Rawlsian design criterion would be: the system must include an explanation facility where any denial can be explained in understandable reasons, and the applicant has an opportunity to provide additional info or appeal. While NAS might only design the model, we might extend the concept to architecture of the system as a whole. Perhaps one could even encode something like “for any decision threshold, ensure some slack for human reconsideration” as a parameter. That might be outside of NAS’s typical scope, but conceptually, the entire pipeline’s “architecture” can be treated with similar optimization thinking.
In summary, Rawls’s first principle in AI ethics translates to ensuring AI does not compromise fundamental rights and provides mechanisms for equal participation and challenge. From a regulatory perspective, one can incorporate this by requiring algorithmic systems to undergo liberty-impact assessments just as they undergo privacy or bias assessments. For our purposes, we keep in mind that any fairness optimization we do with NAS must not come at the cost of violating core rights (for example, we wouldn’t accept a “fair” algorithm that achieved parity by leveling down—denying everyone a beneficial service equally to avoid inequality, which Rawls criticized unless it was necessary to preserve liberty[19]).
Fair Equality of Opportunity: Leveling the Playing Field
The second part of Rawls’s second principle demands “offices and positions open to all under conditions of fair equality of opportunity.” This goes beyond formal non-discrimination. In the context of algorithmic decision-making, this principle is especially pertinent to systems that allocate opportunities: such as hiring algorithms, school admissions algorithms, lending (credit offers opportunity to start a business or get education), and so forth. Fair equality of opportunity (FEO) means that two individuals with similar talents and ambitions should have equal chances of success, regardless of their starting point in society.
For an AI system, achieving FEO implies that the system’s decisions should not be swayed by factors that are irrelevant to merit or qualifications. In a hiring algorithm, for example, the only things that should matter are the candidate’s abilities and fit for the job, not extraneous attributes correlated with demographic background. But because AI learns from historical data, we might see, say, that historically fewer women were software engineers, so female applicants get lower “fit scores” by the model’s logic—not because they are less skilled, but because the model is pattern-matching to past hiring which was biased. Rawlsian ethics would demand a correction here: we should alter the algorithm (or the data, or the interpretation of the output) to restore fair opportunity. In concrete terms, one might enforce a constraint that for equally qualified candidates, the selection rate should be the same across groups (this is essentially one definition of equal opportunity in fair ML: “predictive parity for qualified individuals”).
There have been attempts to formalize Rawls’s FEO in algorithmic terms. One approach is counterfactual fairness: ensuring that if one were to “swap” the protected attribute of an individual (imagine the same resume but with a male vs female name), the decision would remain the same[27]. This ties to the idea that one’s demographic group should not disadvantage one if all else is equal. Another approach is focusing on the accuracy for each group in qualification-based decisions. For instance, in a loan context, FEO might mean if someone is credit-worthy, they should be approved at equal rates whether they’re in group A or B (equal true positive rate). Similarly, if someone is not qualified, they should be rejected at equal rates (equal true negative rate). Ensuring both is equalized odds.
From a NAS perspective, we could encode fair opportunity by using such metrics in the search objective. We might, for example, add a term to penalize any disparity in true positive rates between groups. A simplified metric could be: $$ \text{FEO_gap} = \max_{g,g'} | \text{TPR}g - \text{TPR} |, $$ and we’d try to make that gap as small as possible (zero ideally)[27]. A NAS objective might then be: maximize accuracy minus $\lambda \times \text{FEO_gap}$. In practice, care is needed because making groups equal in one metric can affect others. But at least the search would actively try to find model structures that naturally equalize performance.
One interesting possibility is that certain model architectures may inherently satisfy something like equal opportunity. Perhaps a model that learns separate representations for each group then combines them might overfit or underfit differently. In contrast, a model that enforces a shared representation for all individuals could potentially reduce disparate performance (or vice versa, it’s an open research question in fairness: e.g., should we “multi-task” learn across groups to help the weaker group, or separate models? Some found separate might target each better, but separate can encode bias too).
Fair equality of opportunity also suggests interventions outside of the model. In society, Rawls’s FEO acknowledges that ensuring true equal opportunity might require compensatory measures (like education for the less advantaged, or affirmative action in some cases, to counterbalance unearned advantages). For algorithms, this could translate to algorithmic affirmative action: techniques like giving a slight boost to under-represented groups to counter historical bias, or setting group-specific thresholds that equalize outcomes. Indeed, some companies have tried interventions like adjusting credit scoring algorithms to boost traditionally underserved applicants (in a regulated, careful way) or adjusting image generation outputs to be demographically balanced[45]. These interventions can be controversial—critics call it “quota systems” or worry about reverse discrimination. But Rawls’s framework, particularly the difference principle, provides a justification for certain forms of affirmative action: if it helps the least advantaged and doesn’t violate equal rights, it could be just. In algorithmic terms, if a slight calibration in the model helps the disadvantaged group get more opportunities without stopping the advantaged group from also getting what they deserve, a Rawlsian would likely endorse it.
Derek Leben, in proposing an algorithmic justice theory, emphasized principles like “equal opportunity” and “equal impact” in AI[46]. He argues these should guide AI design alongside a requirement of a minimally acceptable accuracy (because a “fair” but totally inaccurate system helps no one)[47]. This nicely complements Rawls: we can’t sacrifice so much accuracy in the name of fairness that the system stops being useful (since that could hurt everyone, including the disadvantaged). Rawls wouldn’t want to make everyone worse off (the famous “leveling down” critique), and requiring a decent level of accuracy ensures the system actually provides some utility to distribute. Leben’s point about avoiding irrelevant attributes is essentially a call for fairness: algorithms should ignore features that are not justifiable to use[46]—again, a fairness-through-unawareness approach unless those features are causally relevant in a non-biased way.
Summing up this section: Fair equality of opportunity in AI means that an algorithm gives individuals a fair shot based on their relevant qualifications, not on arbitrary characteristics. We can strive for this by measuring and enforcing parity in error rates or selections, by designing models to exclude or neutralize biases, and by potentially using corrective measures to offset historical disadvantages. Rawls gives us the ethical backing to do so, telling us that any inequality of opportunity is unjust unless it’s structured to benefit those with less opportunity. That can validate policies like extra outreach to underrepresented groups in data collection or using decision thresholds that account for systemic bias.
The Difference Principle: Maximizing the Minimum (and AI’s Role)
The difference principle is Rawls’s second principle (part a) and perhaps the most distinctive element of his theory. It tells us to arrange inequalities such that they are “to the greatest benefit of the least advantaged”[48]. In plainer terms, when making decisions that affect distribution of benefits or burdens, we should focus on improving the well-being of whoever is worst off in the outcome.
In algorithmic decisions, one can interpret “least advantaged” in multiple ways. It could mean the historically disadvantaged demographic group (like a racial minority that has less wealth on average). Or it could mean individuals who, in a particular context, have the worst outcomes (e.g., those predicted to have the lowest credit scores or highest risk). Rawls originally conceived it at the level of basic social institutions and classes, but the ethos can be applied more granularly: don’t design a system that simply maximizes total utility—design it such that those on the bottom are as well-off as possible.
If we take it in the demographic sense, the difference principle would urge us to evaluate an AI system by looking at how it treats the worst-off group in society (say, a marginalized community). Does the system help or hurt them compared to other possible systems? For instance, imagine two versions of a job recommendation algorithm: Algorithm A might increase overall hires at a company by being very efficient but ends up primarily benefiting already overrepresented groups, doing little for minority candidates. Algorithm B might result in slightly fewer total hires (maybe a tad less accuracy in matching), but it significantly improves outreach and hiring of minority candidates who otherwise had faced disadvantage. A Rawlsian analysis might say Algorithm B is more just, because it improves the position of the least advantaged group (the minority candidates) even if the overall number of hires (sum of utility) is a bit lower. This counters a pure utilitarian approach which might pick Algorithm A for maximizing total hires or profit.
In practice, applying the difference principle could involve a maximin fairness objective for the AI: for each relevant outcome metric (like hiring rate, loan approval rate, error rate), focus on maximizing the metric for the worst-performing group. Some fairness researchers explicitly formulate the problem this way[33]. For example, one could define: $$ U_g = \text{utility (or accuracy) for group } g, $$ and the goal is $$ \max_{\text{model}} \min_{g \in G} U_g. $$ This is a mathematical encoding of Rawls’s difference principle into the model selection problem. It ensures the model is chosen not because it’s best on average, but because it’s best for lifting the worst-off group’s performance to as high a level as possible. In many cases, optimizing this will naturally narrow gaps between groups, because to improve the minimum you have to bring up whoever is behind. One study in late 2025 (Chen et al.) implemented a Rawlsian “veil of ignorance” approach by making the classifier ignorant of demographic attributes and then optimizing performance in a way that ended up nearly equalizing outcomes—and interestingly, they found this improved overall accuracy slightly as well[49][28]. This is a hopeful finding: it suggests that sometimes helping the worst-off group can actually push the algorithm out of a local optimum that was bad for everyone. By focusing on the under-served, you might discover features or strategies that overall generalize better, thus a Pareto improvement (they reported collapsing a demographic parity gap from 15.7% to 0.1% while boosting accuracy by 2%[28]).
However, one must be cautious: maximizing the minimum could, if done naively, cause a degenerate solution where the model basically tailors itself only to that worst-off group and ignores others. That’s why often these approaches still consider a balance (and Rawls did allow that the difference principle operates under the constraint that others are not made worse than they need to be; it’s about improving the least advantaged as much as possible without making others worse off than is necessary). In an algorithmic sense, we might implement it as lexicographic: first ensure the worst-off’s outcome is maximized, then within that, maximize something else. Or simply optimize a weighted sum where the weight on the worst-off group is very high.
An example in content distribution might be: say an algorithm recommends educational opportunities. Rawlsian design might ensure that the kids in the lowest-performing schools get recommendations that boost their learning resources even if that means the kids in top schools get slightly less optimized suggestions—because the latter are already well-off educationally. So long as we don’t violate anyone’s rights or completely neglect the others, focusing on the bottom can be justified.
Critics of difference principle in AI might argue that it could lead to “biasing” the algorithm in favor of certain groups. For instance, some would call it unfair if a credit scoring AI was tuned to give disadvantaged minorities relatively better scores than their data alone would indicate, effectively redistributing credit. In law, this touches on debates about disparate impact and whether remediation is allowed. But Rawls provides a strong ethical counter: if that redistribution is the only way to ensure the financial system benefits those who have least, then it is justified—provided, of course, that no one’s fundamental rights are infringed (i.e., you’re not denying credit to others for arbitrary reasons, you’re just possibly accepting a tiny bit more default risk as a trade-off to extend opportunity).
To ground this in legal terms, affirmative action has always been controversial but Rawls’s theory supports it under conditions. In algorithms, an analog might be “affirmative algorithms” that actively seek to uplift marginalized group outcomes. Indeed, the concept of algorithmic affirmative action has been discussed[50]. It might involve adding a constant boost for minority group predictions (some classifiers do that in thresholding), or oversampling those groups in training, or using group-specific models that are more finely tuned. These interventions can achieve something like the difference principle in effect.
We should also mention that Rawls’s difference principle is not simply making everyone equal. It allows inequality, but demands those at the bottom do as well as possible. For AI, this means we don’t necessarily have to force equal performance if it’s not possible or if doing so worsens the bottom line. For instance, consider an AI that allocates organ transplants. If we purely equalize some metric, we might do things like randomize regardless of need or outcome, which could reduce the total saved lives and possibly even hurt the worst-off (if they needed more resources or special consideration). A Rawlsian approach would instead ensure that policies are in place that those who are worst off (perhaps the sickest patients or those from poorer backgrounds) are benefiting as much as possible—perhaps via some priority scheme—while still maintaining efficiency. It’s a nuanced balance of equity and effectiveness.
Translating difference principle into a regulatory or design requirement might look like: “When evaluating an AI system’s impact, examine the outcomes for the most disadvantaged group and ensure that no alternative design (that doesn’t violate other principles) could improve their situation further.” This is a high bar and somewhat theoretical, but it encourages continuous improvement and not settling for an average-good outcome that masks a terrible result for a minority. In a sense, it’s a call to pay attention to the tails of distribution, not just the center.
By aligning NAS with the difference principle, we effectively instruct the NAS to search for architectures that give the best worst-case group performance. If two architectures have the same overall accuracy but one yields, say, 80% accuracy for both Group X and Y, while another yields 90% for Y but 50% for X, the Rawlsian NAS would prefer the 80-80 model over the 90-50 model. Traditional accuracy optimization might pick the latter because average accuracy (weighted by group proportion maybe) could be higher, but Rawlsian fairness rejects that as unjust. Notably, the Rawlsian choice also avoids egregious unfairness (the 50% group is clearly disadvantaged). So, in effect, it safeguards against one group being sacrificed.
To illustrate with numbers: imagine an algorithm for loan approvals. Group A historically has high incomes, Group B historically lower. A utilitarian algorithm might approve 90% of A and 30% of B (say mostly giving loans to A because they are safer bets), achieving perhaps a certain profit. A Rawlsian approach might find a model that approves say 70% of A and 50% of B, raising B’s approval significantly at the cost of a little more risk with A. If that means Group B’s communities get more investment and lift themselves out of poverty (the least advantaged are benefiting more), Rawls would favor that outcome. Over time, this could also reduce inequality. AI models could be part of such redistribution of opportunity—though it’s important this be done under guidance of policy, since companies on their own might not choose to do that without a mandate or incentive.
Finally, we should acknowledge one caveat raised by Rawlsian scholars: Rawls’s theory was meant for the basic structure of society—institutions like the legal system, economy, etc. An AI algorithm might be seen as one component within those institutions, not the whole show. One might question whether it’s appropriate to directly apply Rawls’s societal principles to a single algorithm. Our stance is that when an algorithm takes on a major decision-making role traditionally held by a human institution (like lending or sentencing), it effectively becomes part of the basic structure, thus legitimately appraised by those justice principles[51][52]. For example, if judges are expected to uphold justice and follow principles (e.g., equal protection), then an AI risk assessment tool used by judges should likewise be evaluated for how it aligns with justice principles. So we’re extending Rawls to this micro level by proxy: ensuring each algorithmic component is just helps ensure the overall system stays just.
To sum up, the difference principle guides us to use AI in a way that explicitly helps those who are worst off in whatever context the AI operates. It urges designers and policymakers to ask, “How does this algorithm affect the people at the bottom of the outcome distribution, and can we tweak it to improve their outcomes even if it means the privileged don’t gain as much?” In doing so, it provides a morally compelling answer to the common tension: should we optimize for efficiency or for equity? Rawls says equity for the worst-off, as long as we still respect liberties and opportunities.
With Rawls’s principles thus interpreted for AI, we next consider how the technology of NAS can concretely be used to fulfill these principles in designing actual AI systems.
Neural Architecture Search for Fairness: Merging Ethical Objectives with Design
Incorporating Rawlsian principles into AI design is an interdisciplinary challenge: it requires translating high-level ethical requirements into engineering targets. Neural Architecture Search (NAS) provides a promising toolkit for this translation. In this part, we discuss strategies for encoding fairness objectives (like those derived from Rawls’s principles) into NAS and review evidence that doing so yields tangible improvements in model ethics. We also consider practical constraints and how NAS might be deployed in real-world development pipelines to ensure ethical outcomes.
Encoding Fairness Metrics into NAS Objectives
The first step in leveraging NAS for fairness is to define a quantitative objective or constraint that represents our fairness or ethical goals. NAS can optimize what it’s given – so we must be precise about what “ethical AI” means in a computable sense. From the Rawlsian analysis above, potential targets include: - Minimizing disparity between groups on certain performance measures (equal opportunity). - Maximizing the worst-off group’s performance (difference principle). - Possibly maintaining a floor on accuracy or other utility (so that fairness doesn’t come at a cost of an unusably bad model – the “minimally acceptable accuracy” constraint[47]). - If relevant, including terms for interpretability or simplicity (to assist with the transparency and accountability considerations of equal liberty).
One straightforward approach is a multi-objective optimization formulation. For example, consider two objectives: $A$ (overall accuracy) and $D$ (a fairness disparity measure, which we want to minimize). We can use methods like: - Pareto optimization: Get a set of architectures that offer different trade-offs between $A$ and $D$, allowing a human to then choose one that best meets policy preferences (like a good balance or the most fair that still meets some accuracy threshold). - Scalarization: Combine them into a single objective like $J = A - \lambda D$ (assuming higher $A$ is better, higher $D$ is worse). The constant $\lambda$ reflects how much fairness matters relative to accuracy. Setting $\lambda$ is tricky—it might involve trial or even another optimization outer loop to find a $\lambda$ that yields acceptable trade-off. But it gives a dial to tune Rawlsian emphasis: $\lambda \to \infty$ would mean we only care about minimizing disparity (Rawls’s extreme lexicographic priority to justice after basic utility), whereas $\lambda = 0$ reduces to standard accuracy optimization (which is what historically was done and often led to bias). - Constraint-based: e.g., maximize accuracy such that $D \le \epsilon$ for some small $\epsilon$ (meaning a near-equal fairness achieved). Or conversely, minimize $D$ such that accuracy remains above some threshold. These constraints can be incorporated into NAS either by a penalty method or by explicit rejection of architectures that don’t meet them.
A concrete fairness metric for $D$ could be chosen per application. For classification tasks, one might use difference in false positive rates, or in true positive rates, or in predictive values. Some research uses normalized metrics like average odds difference (the average difference in false positive and false negative rates between groups). Others might use information-theoretic measures of unfairness or even the coefficient of determination of outputs with protected attributes.
Since we are discussing Rawls, a compelling choice is: $$ D_{\text{min}} = - \min_{g \in G} U_g, $$ where $U_g$ is (for instance) the accuracy or utility for group $g$. Maximizing $-D_{\text{min}}$ is then maximizing the minimum group utility, directly Rawlsian. We could incorporate that by itself or as part of a multi-term objective.
As an example, consider designing a hiring model via NAS. Let’s say we have historical data with outcomes, but we suspect bias. We can decide on a fairness metric like “qualified selection rate” differences. If group A and B have qualification labels (past performance etc.), the model’s true positive rate for each should be equal ideally. A fairness metric could be $D = |\text{TPR}_A - \text{TPR}_B|$. NAS would then try to find an architecture that naturally yields $D$ close to 0 while still getting high accuracy in predicting who will perform well. If the search finds that certain architecture features (like including interaction terms or certain layer normalizations) help equalize TPRs, it will favor those architectures.
Importantly, fairness objectives might be non-differentiable or noisy (since they depend on evaluation on a validation set). NAS algorithms like evolutionary search or reinforcement learning can still handle those by treating it like a reward to maximize. Differentiable NAS (DARTS and its ilk) usually need continuous proxies; one could maybe use a smooth approximation for fairness or use a bilevel optimization (outer loop for fairness). These are technical details but surmountable given current research.
One potential caution is overfitting to fairness on a specific dataset. If NAS tailors an architecture to minimize bias as measured on one dataset, will it generalize? There’s evidence in [7] that fairness-optimized architectures did generalize to other data[53], which is encouraging. Likely because the architecture that is fairer might be picking up more robust features that aren’t just dataset-specific quirks. But it’s prudent to evaluate fairness on multiple scenarios if possible, or include in the search evaluation multiple data splits or fairness under slight shifts (robust fairness, essentially).
Case Study: Fairness-Aware NAS in Action
To make this more concrete, let’s revisit the FaHaNa case[9][10] and analyze it from a Rawlsian standpoint. In FaHaNa, the goal was to serve dermatology diagnosis on a mobile app. The authors realized that if the model was unfair (worse for dark skin patients), it would be unjust and also harm adoption in diverse populations. They defined fairness in terms of the model’s sensitivity across skin tone groups. Essentially, their fairness metric was something like the difference in diagnostic accuracy between light and dark skin categories[2]. They then had a multi-objective: maximize accuracy, maximize fairness (minimize difference), and also minimize computational cost (which is more of a practical objective). Using NAS, they found that a certain architecture, which perhaps allocated more capacity to later layers and froze early layers to handle basic features, gave a good balance[42]. Notably, they found earlier layers affect all groups similarly, but later layers can introduce bias, so focusing NAS search on the “tail” of the network (later layers) while keeping the “head” fixed helped navigate fairness[42]. This is an interesting insight: fairness issues might often arise in the high-level feature or decision layers, so one strategy is to enforce some constraints or search specifically on those.
From a Rawls perspective, FaHaNa’s outcome improved the worst-off group (dark skin patients) by improving their diagnosis accuracy significantly, without hurting (and even improving slightly) the accuracy for light skin group[54]. This is a Pareto improvement—a win-win—which is the ideal scenario: justice achieved without cost. But even if there had been a slight trade-off (say light skin accuracy dropped a bit but dark skin accuracy rose more), a Rawlsian would likely accept that if it meant dark-skinned patients, previously disadvantaged in diagnostic reliability, now got much better service. That aligns with difference principle thinking. And importantly, it was done without violating any patient’s rights (no one’s being denied diagnosis, just the model is tuned to be equitable). It’s a technical fix that yields an ethical benefit, illustrating how Rawlsian goals can guide engineering.
Another hypothetical case: Suppose we use NAS to design a college admissions algorithm (some universities use algorithms to score applicants). A Rawlsian approach might incorporate an objective that the admitted class should have a certain diversity or that the selection process is fair in the sense of equal opportunity. If the NAS is allowed, for instance, to select features or transformations, it might choose to de-emphasize features that correlate with privilege (like expensive extracurriculars) because those hurt fairness metrics, and emphasize more meritocratic features (like actual grades or achievements). The architecture might even split into sub-networks that evaluate different aspects to ensure no single biased factor dominates. The result could be an AI that picks students more equitably. However, one must be careful: if, say, we encoded a hard constraint like “the demographic mix of admits must reflect the applicant pool,” the NAS might sacrifice academic quality too much. We might instead aim for “maximize average admitted student success, subject to the constraint that each demographic’s admission rate relative to application is above some threshold” – a kind of reservation. That’s a complex constraint, but not impossible to encode. And it would likely push the algorithm to find signals of talent in underrepresented applicants that otherwise might be missed if the data is biased (for example, maybe standardized test scores—known to correlate with income—get down-weighted in favor of recommendation letters or personal essays which might be more revealing of potential).
These case studies show the versatility of NAS: it’s like having a small army of virtual engineers each proposing solutions under given ethical guidelines, and we pick the best one. It’s trial-and-error guided by a principle.
Limitations and Considerations
While NAS is powerful, it’s not a silver bullet. There are practical and ethical considerations in relying on NAS for fairness:
Computational Cost: NAS can be computationally expensive. Adding fairness evaluation (which may require larger or multiple validation sets to reliably measure group performance) increases the load. However, with improved techniques and hardware, this is becoming more tractable. Also, one might restrict the search space to moderate size or use surrogate modeling to predict performance of architectures without fully training each.
Dynamic or Shifting Data: If the deployment environment changes (say the demographics of users shifts or their behavior changes), a model that was Pareto-optimal may no longer be so. One might need continuous or periodic NAS re-evaluation. This raises questions of governance: do we allow models to automatically reconfigure (which could drift in fairness)? Or do we occasionally re-run NAS offline and redeploy a new model? The latter might be safer and subject to review.
Fairness Definition Choice: We’ve discussed how Rawls provides a guiding star, but in coding it, choices must be made. Those choices have moral and legal dimensions. For example, optimizing for demographic parity (equal outcomes) vs equal opportunity (equal conditional outcomes) vs difference principle (max-min). A Rawlsian might lean towards equal opportunity and difference principle metrics. But one can imagine disagreements. Ideally, policymakers or ethicists should set the fairness criteria which then engineers implement via NAS. It shouldn’t just be up to a random choice by a programmer. Thus, one needs a governance structure where the fairness objective in NAS is chosen through a deliberative process reflecting social values (perhaps even involving stakeholders or those impacted, aligning with the idea of a “social contract” for AI design).
Data Bias and NAS: NAS doesn’t remove bias in data; it just tries to not exacerbate it or to counteract it. If data is severely lacking or poisoned, any model might struggle to be fair. NAS might choose an architecture that slightly mitigates, but data collection and pre-processing (like ensuring representation in the dataset, or re-weighting samples) is still crucial. Rawls’s approach to opportunity might also imply we need to invest in better data for underrepresented groups – e.g., gather more training examples for them – analogous to investing in education for the less advantaged in society to truly have equal opportunity.
Interpretability vs Complexity: NAS often finds somewhat complex architectures, sometimes more complex than a human might design. There’s a risk that these highly optimized structures are even harder to interpret than standard models. If we value transparency (as argued under equal liberty), there’s a possible tension: the “fair” model might be a complex tangle of layers that we don’t fully understand, whereas a simpler (but maybe less fair) model could be more interpretable. There is ongoing research on interpretable NAS or including interpretability as an objective. A Rawlsian might say fairness for the worst-off is more important than simplicity, but not at the total expense of explanation. So perhaps a balance must be struck: one could include a term for model complexity in the NAS objective or set a constraint (like no more than X parameters, or certain architecture patterns favored) to avoid outlandish designs that cannot be explained to stakeholders. The law might even require that decisions are explainable, in which case that’s a hard constraint on acceptable architectures.
Verification and Validation: Suppose NAS finds a model that on the validation data is fair and accurate. We deploy it. We must still continuously monitor it in the real world to ensure it behaves as intended. If any drift or unexpected bias arises, we need mechanisms to catch and correct it. One advantage of our approach is that if such issues are found, we can always re-run a NAS with updated data or constraints, so it’s adaptive. But oversight is needed. Perhaps independent auditors or regulatory agencies will want to see the fairness metrics and even the architectures (though reading a neural architecture might not be as meaningful to them as seeing results).
Human-in-the-Loop and Rawls: Rawls’s theory is about a social choice made by hypothetical humans. One might argue we should keep humans involved in algorithmic decision-making for important matters (like a judge reviewing a risk score, or a loan officer having discretion to override). That can add a layer of fairness that pure automation might miss (for example, a human can catch an obviously unjust outcome in a particular case and correct it, something a rigid algorithm wouldn’t). Combining NAS-designed models with human oversight could be ideal: the model handles routine equitable processing, and humans handle exceptional cases and maintain accountability. Policy could mandate this human involvement for certain decisions (which some jurisdictions are considering – “meaningful human review” clauses). This doesn’t negate our approach; it just means the algorithm is one part of a socio-technical system aiming for justice.
Interdisciplinary Collaboration
One theme that emerges is that no single field can do this alone. We need: - Ethicists and philosophers to set what is fairness in a given context (like how do we trade-off different groups, what historical injustices to account for, etc.). - Legal scholars and policymakers to incorporate those ethical goals into guidelines or regulations, and ensure they align with laws (e.g., affirmative algorithmic treatments might run afoul of certain anti-discrimination laws unless those laws evolve or are interpreted in light of fairness goals – some laws allow affirmative action, others are stricter). - Computer scientists to implement objectives and innovate NAS techniques that effectively find fair solutions. - Domain experts (e.g., in healthcare, finance) to provide context and help interpret results – ensuring that what we optimize doesn’t inadvertently cause harm by missing domain-specific nuances (like fairness in healthcare might also consider that sometimes different treatment is needed for different groups due to genuine differences in needs—so equal outcome might not always mean identical procedure, etc.).
Encouragingly, we see efforts like the one referenced by Grace and Bamford (2020) who advocated using Rawlsian approaches to legislate on machine learning in government[55][52]. They, and others, suggest that principles like Rawls’s can guide public sector AI deployment. For example, they note that a Rawlsian approach would require high transparency and avenues for citizens to challenge algorithmic decisions[14][56], aligning with what we discussed. Such insights from legal academia can feed into technical requirements.
On the technical side, as we’ve seen, research at places like ICLR and NeurIPS (major ML conferences) is already moving toward fairness-aware model design, which is in harmony with what law scholars like Solon Barocas or Joshua Kroll argue (that fairness can be partly addressed through design, not just policy).
When Rawls published A Theory of Justice, he couldn’t have foreseen AI specifically, but he provided a method to think about any system of rules. We are essentially treating an AI’s decision rules as part of society’s rules. The original position thought experiment could be conceptually applied by AI designers or regulators: if you didn’t know your demographics or background, would you trust this algorithm to make decisions about you? If not, redesign it. This is a human check that complements the NAS approach: we might, after NAS delivers a candidate model, do a kind of “veil of ignorance audit” where we simulate being different kinds of individuals through the model and see if any of those roles gets a raw deal.
All of these ideas show that combining Rawls and NAS is not just about a one-time engineering hack, but about establishing a framework and process where ethical principles continuously inform the technical development.
Having laid out how Rawls’s justice principles align with NAS techniques and what considerations come into play, we will now turn to addressing some possible counterarguments and challenges to this Rawlsian NAS approach. It’s important to examine these to strengthen our proposal and clarify any misconceptions.
Counterarguments and Challenges
No proposal to embed ethical principles into AI goes without critique. In this section, we consider a range of counterarguments—from technical objections to philosophical challenges—and provide responses grounded in both theory and empirical evidence. The goal is to demonstrate that while challenges exist, the Rawls + NAS approach is a robust and justifiable path forward, or at least that its benefits outweigh the concerns.
1. “Fairness vs. Accuracy Trade-off: Does Justice Diminish Utility?”
One immediate challenge often raised is the concern that focusing on fairness will significantly reduce accuracy or overall performance of AI systems. Companies deploying AI might worry that imposing Rawlsian constraints means their model won’t be as profitable or efficient. In other words, is there an inherent zero-sum trade-off between fairness and utility?
Response: It’s true that unconstrained optimization would put all resources into whatever metric you set (typically accuracy or profit), and adding constraints or additional objectives (like fairness) means the solution may not reach the same peak on the original metric. However, this doesn’t necessarily mean a dramatic loss of utility; sometimes it means finding a more balanced optimum that is only slightly below the unconstrained maximum. Empirical results suggest that with smart approaches, fairness can be improved with minimal accuracy loss, and occasionally with no loss or even a gain[28]. The aforementioned Chen et al. study found a fairness intervention that actually improved accuracy for all by avoiding overfitting to biases[28]. Similarly, the fairness-aware architectures discovered by NAS managed to be as accurate as (or within a hair of) the best baseline while being much fairer[54][41].
The key point is that traditional models were not Pareto-optimal; they might have been overspecialized to patterns that served the majority population well but left minority performance as an afterthought. By explicitly accounting for minority performance, we often guide the model to more robust solutions that generalize better across groups, which sometimes doesn’t hurt majority performance much or at all. There is a concept in learning called “empirical risk minimization under a fairness constraint” – solving that can yield near-optimal risk but fair outcomes. In practice, a small accuracy drop (say 1% relative) may be acceptable for significant fairness gains, especially when those gains translate into real social benefit (e.g., thousands more qualified minority applicants being hired or fewer wrongful denials of loans).
Even if some trade-off exists (and in some cases it will—e.g., if data for a group is really noisy, forcing equal performance could degrade others slightly), a Rawlsian framework justifies accepting a modest efficiency loss for justice. We already do this in other domains: we accept that due process rights (like appeals, legal counsel, etc.) make the justice system slower or occasionally let guilty people go free, because we prioritize fairness and rights over a purely utilitarian “maximize convictions” approach. Similarly, companies accept certain costs to comply with safety or anti-discrimination laws. Ethical AI should be seen in that light: as a necessary condition of responsible deployment, not simply an optional performance tweak.
Furthermore, with NAS, we might mitigate the trade-off by discovering clever model designs that handle the fairness constraints more gracefully than a human-adjusted model would. It’s possible that a naive fairness fix (like just adjusting thresholds) might hurt accuracy more than an integrated solution that NAS finds (like a better feature representation). So, using NAS might reduce the “cost of fairness” compared to other methods.
2. “Whose Fairness? The Subjectivity of Ethical Values”
Another argument is that fairness is not a single objective concept; different people or cultures might disagree on what is fair. Rawls’s theory is one take—albeit influential in Western philosophy—but others might value different principles (e.g., a strict meritocracy with no adjustments, or conversely, equality of outcome). By embedding Rawls, are we arbitrarily choosing one ethical stance? What if someone impacted doesn’t agree with Rawlsian fairness?
Response: It’s true that fairness is multifaceted. This is precisely why existing AI fairness literature has many definitions. Rawls’s theory, however, was an attempt to find principles of justice that free and rational people would agree on under fair conditions. It carries a certain philosophical weight as being a reasoned middle ground between total egalitarianism and libertarian laissez-faire, capturing both liberty and equality. While one might not universally impose Rawls’s view, it is a widely respected framework for thinking about justice that has informed many democratic societies’ ideals. That said, our approach is not dogmatic: the NAS can incorporate any well-specified fairness criterion. If a society decided on a different metric (say, a more utilitarian notion of fairness, or something like “maximize total welfare but with a constraint that no group falls below X”), NAS can optimize for that too. The methodology stands—choose ethical objectives, use NAS to achieve them. We pick Rawls because (a) the question specifically asks for Rawls, and (b) Rawls aligns with current concerns about protecting disadvantaged groups and impartiality. But the framework is adaptable.
In practice, which fairness definition to use should be a collective decision, possibly guided by law or regulation. For example, a jurisdiction might say, “We require equal opportunity (as defined by equal TPR) in lending, plus a focus on improving access in underserved communities.” That could be translated into Rawlsian terms or a specific metric. Our point is that such values can be encoded and NAS will follow suit. If the worry is about paternalism—imposing values via code—note that currently AI often unintentionally imposes the values of historical bias or the developer’s implicit choices. Making the values explicit (say in code or law) is actually more transparent and democratic. Moreover, a Rawlsian approach invites thinking from the perspective of any impacted person, which is an inclusive way to set values, arguably more so than just reflecting status quo data.
Additionally, Rawls’s concept of a reflective equilibrium could apply: we might fine-tune our fairness objectives by reflecting on results and adjusting until they match our considered moral judgments[57][58]. So, if stakeholders feel the outcome of a certain fairness criterion is unintuitive or unfair in a particular scenario, they can adjust the criterion (within the Rawlsian spirit or outside it) and redo the NAS. It’s an iterative, participatory process, not one static imposition.
3. “Technical Feasibility and Complexity”
Critics might point out that while NAS can incorporate multiple objectives, the complexity of searching over architectures with ethical constraints might be huge. There’s also a risk the optimizer will find some “hack” to satisfy fairness metrics in a trivial way that doesn’t actually solve the intended ethical problem (like trivially predicting the same outcome for everyone to equalize parity – which is fair but useless).
Response: The worry about trivial solutions is valid—if one only optimizes fairness, the model could, for instance, give everyone the same score, achieving parity by default but destroying utility. That’s why we emphasize maintaining accuracy or utility as a concurrent objective or constraint. By requiring minimally acceptable performance, we block the trivial “constant model” solution[47]. NAS will therefore have to find non-trivial architectures that thread the needle.
The search complexity can indeed grow with more objectives, but many NAS algorithms are already built for multi-objective use (there’s an entire field on multi-objective evolutionary NAS). The fairness metrics do add evaluation overhead. However, these are not insurmountable for modern computing clusters, especially if the search space is appropriately pruned. Techniques like one-shot NAS (where many architectures share weights) can massively speed up evaluation. There is active work on making NAS more efficient.
It’s worth noting that requiring fairness might even reduce the search space in some sense, because extremely imbalanced architectures might be ruled out by poor fairness scores, concentrating search on a smaller subset of fairer designs (this is speculative, but plausible). And even if the search takes more time, that’s an acceptable cost given the importance of the outcome. We should compare it to how much effort companies put into hyper-optimizing for accuracy or profit—surely we can invest effort to optimize for fairness too, as a societal goal.
Another technical challenge is measurement noise: fairness metrics might be noisy due to finite sample sizes, which could mislead the search. Researchers handle this by using larger validation sets or smoothing metrics. Also, one can incorporate uncertainty (like penalize only if a disparity is statistically significant). So the NAS doesn’t chase random fluctuations.
The general feasibility has been demonstrated by the research we cited: they indeed performed such multi-objective searches and got meaningful results[9][8]. So it’s not just theory; the technology is here. The next step is integrating it into practice, which might require user-friendly tools, standard datasets for fairness, etc. But these are engineering hurdles that can be overcome with the right prioritization.
4. “Over-reliance on Technical Fixes (‘Ethics Washing’)”
Some ethicists caution that focusing on technical solutions like fairness algorithms might lead to complacency or avoidance of deeper societal reforms. They use terms like “ethics washing” to describe when companies tweak algorithms a bit to appear fair without addressing root causes of inequality. Could Rawlsian NAS be just a band-aid, diverting attention from larger issues?
Response: This is an important critique. Indeed, algorithmic fairness fixes are not a panacea for social injustice. If, for example, certain communities have less access to quality education, a fairer college admissions algorithm can only do so much; the real solution includes improving schools, providing support—things beyond the algorithm’s scope. Rawls himself would acknowledge the limits of any one institution; his theory is about the basic structure as a whole. So, we should view Rawlsian AI design as one piece of a bigger puzzle. It can ensure AI doesn’t make things worse and ideally makes some things better, but it won’t eliminate inequality by itself.
However, dismissing technical fixes entirely is also problematic; algorithms are very influential today, and leaving them biased because we focus only on root causes means in the interim they perpetuate harm. We can walk and chew gum: push for broader social reforms (like diverse hiring pipelines, improved data representativeness, digital literacy) and make the algorithms as just as possible. In fact, technical interventions can buy time and reduce harm while we work on harder structural changes[59]. For example, if an AI hiring tool is shown to disfavor women due to historical bias, we should both fix the tool (so qualified women aren’t missed right now) and invest in encouraging women in fields where they’re underrepresented (long-term pipeline fix).
Transparency about what is done is key to avoid ethics washing. If a company uses Rawlsian NAS, it should openly report its fairness criteria, performance, and limitations. Regulators or auditors could verify these. That way it’s not just PR—there’s accountable evidence of fairness improvements. The Rawlsian approach could even highlight structural issues: if to satisfy fairness the algorithm has to contort a lot, that indicates deep input inequalities that society should address. If one finds that no model can be fair because the underlying data is too skewed, it points to needing better data collection or systemic change.
Finally, consider that laws might demand these technical fixes soon. For instance, anti-bias regulations for AI could require demonstrating that an AI was designed and tested for fairness. Rawlsian NAS is a way to comply with that proactively and rigorously, rather than just retrofitting an existing biased model. It’s part of ethical and legal diligence, not a cover-up.
5. “Rawlsian Theory Limitations: Is it fully applicable to AI?”
Some might argue that Rawls’s theory was meant for distributing rights and resources in society, not for individual automated decisions. They could ask: does it really make sense to talk about an algorithm “following Rawls’s principles” when those principles assumed a holistic societal perspective? Additionally, there have been critiques of Rawls from philosophers like Nozick (libertarian critique that difference principle violates freedom) or from multicultural perspectives (that Rawls’s original position abstracts too much from real-world identities). Could applying Rawls to AI inadvertently ignore important context?
Response: It’s true that Rawls’s original target was the basic structure of society. But AI systems, when deployed widely, become part of that basic structure, especially if governments use them or if they effectively control access to jobs, credit, information, etc. For example, a social media algorithm that dictates what news people see could affect political discourse—very much a basic structure concern. Therefore, applying justice principles to these systems is appropriate. Rawls’s principles are general enough (liberty, opportunity, helping least advantaged) that they can guide design at micro and macro levels. Think of them as constraints any component of society’s decision-making machinery should satisfy. Indeed, law review articles (like the one by Grace & Bamford) explicitly try applying Rawls to algorithmic governance and find it enlightening[55][60].
As for critiques of Rawls: - Libertarian views (Nozick) would oppose deliberate bias corrections, viewing them as unfair to the individuals from groups that lose some advantage. In AI terms, if you adjust a loan model to approve slightly more disadvantaged minority loans, a libertarian might say you’re unfairly denying some majority individuals who now might not get a loan they otherwise would have. This is a philosophical stance prioritizing procedural merit (even if the procedure was biased by history). A response is that algorithms have never been neutral meritocracy to begin with; they were encoding historical privilege. Rawlsians argue that justice demands correcting for those unearned advantages to ensure genuine fairness. Also, many legal systems allow such measures under the banner of anti-discrimination or reparative justice, so long as they’re reasonable and time-bound. - Another critique is that Rawls’s approach is very rational and idealized, and maybe real humans wouldn’t all agree behind the veil due to risk attitudes or different utility functions (Harsanyi’s utilitarian outcome argument). But even if not everyone would pick Rawls’s exact difference principle, the veil of ignorance is a powerful tool that tends to push toward some form of fairness and equality, which any ethical AI should aim for. Rawls’s specific principles are arguably one of the most ethically robust sets we have. - Some argue Rawls doesn’t explicitly cover issues like recognition or historical injustices except through the difference principle indirectly. For AI, one might think of fairness not only in distributive terms but in terms of representation and dignity (e.g., not stereotyping people). While Rawls doesn’t discuss AI bias in images, his principle of equal respect for persons can be invoked. For example, an image generation AI that produced distorted or demeaning images of certain groups fails to respect equal dignity (even if distributionally it’s “equal,” maybe qualitatively it’s not). We might need complementary ethical guidelines (like virtue ethics or specific anti-hate principles) to cover those cases. But that doesn’t invalidate using Rawls for the many cases it does cover (like allocation of opportunities). - One might also ask: do individuals or groups feel satisfied by Rawlsian fairness? Sometimes a group might prefer, say, a compensatory justice beyond difference principle (like absolute parity as the only fair outcome). Rawls allowed that if inequalities are necessary to help them, okay, but if not, maybe strict equality is best. In AI, if we can achieve nearly equal outcomes, some will say why not equalize completely. Rawls’s framework can accommodate that if such strict equality indeed maximizes the position of the worst-off (which in many cases it basically does, or it might be equivalent).
Thus, while Rawls may not answer everything, it provides a strong baseline. We can integrate other principles for aspects Rawls doesn’t explicitly address. Importantly, Rawls’s approach is procedural: it tells us how to evaluate fairness (impartially). Even if one plugs in slightly different values, the exercise of the veil and focusing on the least advantaged remains a valuable mode of thinking.
6. “Algorithmic Limitations: Context and Nuance”
Opponents might highlight that ethical decision-making can require context, judgment, and nuance that algorithms lack. For example, justice sometimes means treating cases differently, not just applying one formula (think equity vs equality debates). Can a Rawlsian NAS approach adapt to context, or is it a one-size-fits-all solution that might be blind to particular needs?
Response: This is where the collaboration of AI with human oversight is important. We envision Rawlsian NAS models as tools to assist and augment human decision-makers, not always fully automate them out. In cases where context is crucial, the AI could provide a fair baseline recommendation and humans can adjust for context. For instance, a fair lending model might flag an applicant as high risk normally, but a human loan officer could note an extraordinary circumstance (like a pandemic job loss with recovery in sight) and decide to approve anyway. As long as those human decisions themselves are checked for bias, this synergy can work well.
Another perspective is that if context and nuance can be described or learned, the model can incorporate them. NAS might, for example, include sub-networks that handle different contexts (maybe first classify the situation, then apply a different decision rule). If well-designed, that could be automated nuance. But some contexts defy enumeration.
It’s also possible to incorporate fairness at a group level and an individual level: Rawls’s principles ensure group fairness, but individual justice might demand cases be considered individually. A criticism of algorithmic fairness is it deals in averages. One person might say: “I was denied a loan even though I’m creditworthy, just because my group had a lower approval rate.” From Rawls’s view, if that person is truly equal in qualifications, then equal opportunity was violated—so our fairness metrics must catch that scenario (if our definitions are good, they should). But sometimes, randomness or unobserved factors mean individuals may be exceptions. No system is 100% fair to every single individual, just as even a fair legal system can have hard cases. The aim is to minimize those errors and have recourse (appeal). That’s why an appeals process or second review is a good safety valve.
7. “Complexity of Regulatory Implementation”
Finally, a pragmatic counterargument: even if Rawls + NAS works, how do we implement and enforce it in the real world? Will companies voluntarily adopt such complex objectives, or will regulators have the expertise to demand it? Perhaps it’s too academic and not practical for widespread use.
Response: Implementing these ideas will require developing standards and best practices. However, momentum is growing for responsible AI. Big tech firms and industries are already researching fairness and some are using AutoML techniques. If regulators start requiring proof of fairness, companies might actually find NAS-based approaches efficient to meet those requirements. It’s easier to tell an AutoML system “find me a fair model” than to have engineers manually trial-and-error to de-bias a model. So there’s an incentive: efficiency in compliance and bragging rights for having more ethical AI.
Of course, regulators themselves need knowledge. This calls for interdisciplinary bodies – e.g., maybe a federal AI fairness bureau that has both technologists and lawyers to set guidelines like “your model must satisfy X fairness metric within Y tolerance unless justified otherwise.” Already, the US FTC and EU bodies are discussing algorithmic audits. They could specify Rawlsian criteria (e.g., check that no demographic’s error rate is disproportionately worse unless it’s unavoidable and explain why). Our policy proposal section will detail ideas like requiring a “fairness impact assessment” akin to an environmental impact assessment for big AI systems.
Adoption will likely start in high-stakes domains: finance, healthcare, etc., where fairness is critical. As successful case studies emerge, best practices trickle down to other areas. It’s not trivial, but neither was introducing safety standards in automobiles or privacy rules in data – industries adapted over time with regulatory nudges and consumer expectations. Ethical AI could follow a similar trajectory, especially as public awareness of bias grows.
In conclusion, none of these counterarguments are fatal. They represent important considerations that shape how we implement Rawlsian NAS rather than reasons to abandon it. The trade-offs can be managed, the definitions refined collaboratively, and the technical challenges overcome with research. If anything, these critiques show that a purely technical solution is insufficient in isolation—it must be accompanied by legal, social, and human-in-the-loop measures. We fully acknowledge that, and our vision includes those complementary aspects.
Having addressed challenges, we now turn to how we can operationalize these ideas at the level of policy and institutional practice. After all, to truly ensure AI models act ethically and impartially, we need frameworks that make Rawlsian NAS not just an academic concept but a standard part of AI development and deployment.
Policy Proposals and Governance Frameworks
Translating the combination of NAS technology and Rawlsian ethics into real-world impact requires supportive policy and governance. We outline several proposals that regulators, lawmakers, and industry bodies could adopt to ensure AI models are built and deployed in line with justice as fairness. These proposals aim to embed the principles discussed above into the AI lifecycle—from design and training to testing and oversight. The overarching idea is to make ethical AI development (including Rawlsian NAS methods) a normative expectation much like safety testing or privacy compliance is today.
1. Mandate Algorithmic Fairness Impact Assessments
Before deploying high-impact AI systems (in areas like finance, employment, education, criminal justice, healthcare, etc.), organizations should be required to conduct an Algorithmic Fairness Impact Assessment (AFIA). Similar to environmental impact assessments for large projects, an AFIA would evaluate the system’s potential biases and disparate impacts on different groups, and detail what design choices were made to mitigate unfairness.
· The AFIA should explicitly reference fairness criteria aligned with Rawls’s principles: for example, assessing whether the system ensures equal opportunity (no qualified group is systematically disadvantaged) and whether any residual inequalities are justified by a benefit to less advantaged groups[23][14]. It should check that fundamental rights (privacy, due process) are not being violated by the AI’s operation.
· It should include simulation results or pilot studies showing the model’s performance broken down by demographic groups, highlighting the worst-off group’s outcomes. If, say, the false negative rate for a certain minority is higher, the assessment must discuss why and what’s being done about it.
· A section of the AFIA can document the use of fairness-aware techniques such as NAS. For instance, it could state: “We utilized a Neural Architecture Search with a multi-objective function prioritizing equalized opportunity between gender groups, resulting in a model that meets the fairness threshold X while maintaining accuracy above Y.” This demonstrates proactive steps.
· Regulators can publish guidelines on acceptable fairness thresholds for different contexts (akin to how OSHA might set exposure limits for toxins, here agencies could set, e.g., “loan approval disparity should not exceed Z% unless justified by credit risk factors not correlated with protected traits”). These guidelines can be informed by Rawlsian reasoning—i.e., set thresholds that ensure no group is egregiously disadvantaged.
· The AFIA should be reviewed by an independent auditor or the relevant regulator. Agencies like the CFPB (for credit), EEOC (for hiring algorithms), or a dedicated Digital Authority could have jurisdiction to approve or require modifications to the system before it goes live. This ensures accountability beyond self-assessment.
The concept of impact assessments is gaining traction (the EU AI Act will likely require something along these lines for high-risk AI). By formalizing it, we force organizations to think through fairness at design time, which likely means they’ll turn to methods like Rawlsian NAS to get a positive report.
2. Fairness-By-Design Standards and Certification
Industry standards bodies (like ISO/IEC or NIST in the US) could develop a Fairness-by-Design framework akin to privacy-by-design. This would provide a blueprint of steps and technical measures to bake fairness into AI systems from the ground up.
· Such a standard might encourage the use of automated tools (like NAS) for exploring fair model alternatives, recommending including a fairness term in model selection. It could include best practices such as “use diverse development datasets, test model on all subgroups, and optimize using multi-objective techniques for performance and fairness.”
· Certification: Companies could get a certification (e.g., “Fair AI Certified”) if they follow these practices and demonstrate their model meets certain fairness benchmarks. This could be overseen by an independent panel of AI ethics experts and perhaps even include stakeholder representatives (e.g., civil rights groups).
· The certification process might involve submitting the model (or API access to it) for evaluation on standardized fairness test suites. These suites would run the model on specially curated datasets or scenario analyses to probe for bias. For instance, one might test a hiring model on synthetic resumes that are identical except for gender or race signifiers to see if outputs differ – an embodiment of the veil of ignorance test in practice.
· A Rawlsian twist in the standard could be requiring that for any identified disparity, the team must show either that (a) they’ve minimized it and can’t reduce it further without serious performance loss, or (b) the disparity actually works to benefit the disadvantaged (which is rare, but for example, an algorithm might purposely grant more leeway to historically marginalized applicants – an approved affirmative measure). Essentially, require a justification grounded in fairness principles for any inequality the model exhibits.
· Over time, these standards can become part of procurement requirements. Governments (a big buyer of AI services) could require that any AI system they procure has been certified for fairness. This would echo how governments often require contractors to have environmental or security certifications. If a policing department wants to buy a predictive tool, they must show it passed fairness cert. That leverages market power to spread ethical design.
3. Regulatory Sandboxes and Veil-of-Ignorance Testing
Regulators could set up AI regulatory sandboxes where companies can submit their AI systems for evaluation and feedback in a controlled environment without immediate penalty. One feature of these sandboxes would be Veil-of-Ignorance testing:
· In such a test, the regulator (or a collaborating research institution) would evaluate the AI by simulating decision outcomes for a variety of synthetic individuals across the spectrum of possible attributes (without considering which ones are real or not)[61]. This conceptually mirrors the impartial viewpoint: we’re seeing what it’d be like for any arbitrary person to face the algorithm. If certain profiles (like those combining multiple disadvantage factors) consistently get the short end of the stick, that flags an issue.
· The sandbox could allow the use of sensitive attributes for evaluation even if the model doesn’t see them, to assess fairness, under secure conditions. E.g., a bank might not feed race to its model (per law), but in the sandbox, the regulator can tag data by race to measure outcomes by race (a legally allowed proxy analysis).
· Feedback from the sandbox would guide the developer on how to adjust. For example, “Our tests indicate your model has a 15% lower approval rate for applicants from zip codes with median income below \$40k, which doesn’t appear justified by default risk. Consider retraining with fairness constraints or using a NAS approach to improve this. Resubmit for another test.”
· The advantage is it’s collaborative and preventive, not just punitive after deployment. It helps organizations learn and apply better methods like fairness-aware NAS before any harm occurs.
Legislatively, enabling regulators to do this may require changes to allow them to handle sensitive data or to compel companies to provide models and data. Also, funding would be needed for agencies to have such technical capabilities. But it could parallel what e.g. the FDA does with drugs in pre-approval clinical trials – here it’s pre-approval algorithm trials.
4. Incentives for Ethical AI R&D
Governments and foundations can encourage the development and adoption of fairness techniques through grants, challenges, or tax incentives.
· For instance, a grant program could fund research into better fairness metrics, better NAS algorithms that incorporate ethics, or development of open-source fairness-aware AutoML tools that smaller companies can use. This reduces the barrier to entry – you wouldn’t need a PhD team to do Rawlsian NAS; you could use a toolkit that’s been vetted.
· Challenges or prizes: Similar to how DARPA or NSF might hold a contest for “explainable AI,” they could hold one for “fair AI.” For example, a challenge might provide a dataset where bias is known to exist and ask participants to develop the fairest model (with some accuracy floor). Rawlsian approaches would likely shine here. Winners could be recognized and their methods promulgated as best practice.
· Tax or procurement incentives: A government might give, say, a tax break or fast-track procurement consideration to companies with strong ethical AI processes. While hard to quantify, if certification exists (see point 2), that could be a criterion. This carrot approach could motivate companies that otherwise might only do minimum compliance.
5. Transparency and Disclosure Requirements
To empower external oversight (academics, civil society, etc.), we should require a degree of transparency about algorithmic decision systems, within appropriate IP and privacy bounds.
· Companies should disclose, at least to regulators or auditors, the key design features of their models: what objectives were optimized (did they consider fairness?), what data was used, and what fairness outcomes were observed in testing. This can be summarized in something like a Model Fact Sheet or Nutrition Label for the AI, an idea floated by researchers.
· If NAS was used, the fact sheet might note, “This model was auto-generated using a multi-objective search balancing AUC and demographic parity. It considered 10,000 candidate architectures and selected one optimizing the trade-off at parity difference = 0.02.” This demystifies the process for trust-building[62].
· Also, if a model is found to have significant bias after deployment (say by journalists or researchers), having the documentation allows analysis of what went wrong (did they not test certain scenarios? Was their metric incomplete?). It could also enable pressure to recall or fix the model, analogous to a defective product recall.
· From a Rawlsian perspective, this transparency is crucial for the “public understanding” of justice that Rawls said is needed in a well-ordered society[44][20]. People should not feel algorithms are mysterious black boxes making decisions about their lives with no rationale. Transparency (in a form laypeople can grasp when needed) fosters legitimacy and trust, as well as enabling challenges and improvements.
One might worry companies won’t want to share too much, citing trade secrets. But perhaps regulators can see full detail while the public gets summary stats. And if an algorithm effectively becomes like a public rule (think credit scoring formulas or sentencing guidelines), there’s a strong argument it should be public or at least the fairness properties should be public.
6. Dynamic Monitoring and Enforcement
Even after deployment, continuous monitoring is needed (as bias can creep in, data can shift, etc.). Regulators could require periodic audits – say annually or when major updates happen.
· These audits could be conducted by third-party algorithmic auditors who are certified (an emerging profession). They would run tests similar to the regulatory sandbox, but on current data and possibly using any complaints or real-world outcomes as guidance (e.g., if many complaints from a community, focus analysis there).
· If an audit finds the AI is violating agreed fairness standards, the regulator can enforce corrective action. That might mean requiring retraining with stronger fairness constraints, or even discontinuing the system until fixed. In severe cases (willful negligence leading to discrimination), fines could apply under anti-discrimination laws or new AI-specific laws.
· Enforcement should be done in a way that encourages improvement, not just punishment. The point is to get the systems to be fair, not to penalize companies for the sake of it. So regulators might say: fix it in 90 days and show us the improvement (again possibly via NAS or other means), rather than immediately applying heavy sanctions unless harm was grievous or intentional.
A Rawlsian notion here is “guaranteeing the avenues for challenge”[13][56]. For individuals, one enforcement aspect could be giving them rights: e.g., the right to know an algorithm significantly impacted a decision about them, and the right to seek reconsideration by a human or a higher tier algorithm that is more fairness-focused. For instance, if a loan was denied, the person can appeal and the bank must either do a manual review or run a special fairness-optimized secondary model that takes a “second look” emphasizing inclusive criteria. This two-tier approach (first pass efficiency, second pass fairness for appeals) might be a pragmatic compromise in some cases.
Additionally, public reporting can be mandated. Large companies might have to include in annual reports metrics on their AI fairness (like how many decisions made automatically, results of bias audits, etc.). This kind of transparency can spur companies to do better for fear of reputational damage if they lag peers.
7. Education and Interdisciplinary Training
Finally, for all this to work, we need people who understand both technology and ethics/law to drive it. Policies should support developing that expertise.
· Universities and professional programs should integrate AI ethics (with Rawls’s theory as part of the curriculum) into data science and computer science degrees, and conversely, teach some technical aspects in law and public policy programs.
· The bar for licensed professionals may need to expand: maybe in the future, there could be a certification for “Algorithmic Fairness Expert” akin to CPA for accountants, to ensure those assessing these systems have standardized knowledge.
· Government regulators need funding to hire or train such experts so that they can meaningfully evaluate AFIAs and audits. Without that, any regulation could become a toothless or check-the-box exercise.
By implementing these policies, we essentially create an ecosystem that rewards and reinforces the development of AI that aligns with justice as fairness. Companies would have clarity on the rules of the road (which reduces their uncertainty and risk), individuals gain protections and trust in AI decisions, and society reaps the benefits of AI innovation without as much fear of the technology exacerbating injustice.
Notably, these proposals resonate with Rawls’s own notion that beyond formal rules, a culture of justice is needed. If everyone from developers to CEOs to regulators is habituated to ask “how does this algorithm affect the least advantaged?” or “would I accept this decision rule if I didn’t know who I’d be?”, then we have instilled a Rawlsian ethos in AI governance.
We now conclude by reflecting on the broader implications of uniting NAS technology with Rawls’s vision of justice, and how this approach positions us to face the future of AI with confidence that it can serve humanity’s highest ethical standards.
Conclusion
Artificial intelligence is often characterized as the defining technology of the 21st century—a powerful tool that will shape economies, influence social structures, and touch every aspect of our lives. With such power comes a profound responsibility: to ensure that AI acts in service of human values and social justice, rather than undermining them. In this Article, we have argued that a fruitful way to meet this responsibility is to combine cutting-edge AI design techniques (Neural Architecture Search) with enduring philosophical principles of justice (John Rawls’s justice as fairness). By doing so, we move toward AI systems that are not only intelligent and efficient, but also impartial, equitable, and respectful of human dignity.
Our analysis proceeded from abstract theory to concrete practice and policy. We began by rooting the discussion in a rich philosophical tradition—tracing how thinkers from Aristotle and Confucius to Kant and Rawls have sought to define fairness and moral conduct. This intellectual lineage provides a backdrop that legitimizes treating AI ethics as continuous with age-old questions of justice. AI may be a novel arena, but the core question it raises—“what is the right way to make decisions affecting others?”—is as old as philosophy itself. Rawls’s answer to that question, crafted for the realm of social institutions, turns out to be remarkably pertinent to algorithmic decision-makers: design rules as if you did not know what position you’d occupy, thereby protecting those who might be worst-off[3][5].
Rawls’s two principles of justice gave us normative targets: robust protection of basic rights and liberties, true equality of opportunity, and attention to the welfare of the least advantaged[6][19]. We translated these into the language of AI fairness metrics and objectives—showing, for example, how equal opportunity can be linked to requiring equal accuracy or error rates across groups, and how the difference principle aligns with maximizing the minimum group outcome or actively improving disadvantaged groups’ model performance[33][28]. This theoretical exercise was not mere idealism; it pointed to tangible criteria that AI developers can strive for and that regulators can enforce.
On the technical side, we explored Neural Architecture Search as a dynamic instrument to achieve multi-objective design goals. NAS exemplifies the creativity of AI: it can discover solutions humans might overlook. By steering that creativity with ethical objectives, we effectively program the search for not just “How can I best predict?” but “How can I best predict fairly?” The case studies and research findings we discussed demonstrate that this is feasible—one can integrate fairness into the very optimization loop that crafts AI models, yielding systems that dominate conventional models in the trade-off between accuracy and fairness[8][10]. In essence, NAS can serve as the engineer working tirelessly to satisfy the constraints of a Rawlsian social contract as applied to an algorithm.
The synergy of Rawlsian theory and NAS technology also offers a hopeful message: that we need not accept a future of “AI versus humanity” or “efficiency versus ethics.” Instead, we can envision and create a future of “AI for humanity,” where efficiency and ethics are aligned. By encoding fairness into design, we reduce the adversarial tension between what is algorithmically optimal and what is morally acceptable. The process can even uncover win-win scenarios (as we saw in some studies where fairness interventions improved overall performance[28]). It’s a reminder that biases and injustices are often themselves inefficiencies and distortions of a system, which, when corrected, allow everyone to benefit more justly.
We addressed counterarguments to show that this approach is neither naïve nor inflexible. Yes, there are trade-offs and philosophical debates. But none of these nullify the core imperative to pursue justice in AI. If anything, they underscore that implementing ethical AI is a continuous process of refinement, dialogue, and vigilance. Fairness is not a one-time checkbox but a constant commitment, much like democratic governance or the rule of law. There will be hard cases and imperfections; our aim is to keep improving, guided by principles that ensure we never lose sight of human well-being.
The policy proposals we outlined sketch a roadmap for institutionalizing these ideas. Law has a crucial role: it can crystallize societal values into enforceable norms, ensuring that competitive pressures or unconscious biases do not derail the ethical design of AI. Just as environmental laws have made sustainability a standard practice in industry, well-crafted AI regulations can make justice as fairness a standard design spec for algorithms. Importantly, policy can also democratize the benefits of approaches like NAS. While big tech firms have the resources to do AutoML, mandating fairness and providing tools and standards can help smaller players and public sector agencies to procure or develop ethical AI without needing superstar AI teams. This widens the impact: fairness in AI shouldn’t be a luxury good; it should be the default expectation.
One might ask, in a broader sense, what does it mean for an AI’s actions to be “considered ethical if viewed objectively by anyone impacted,” as the user’s prompt framed it. This essentially describes a state of reflective equilibrium where people, regardless of their stake or identity, see the decision-making process as justifiable[44][20]. Achieving this is ambitious—people have different perspectives—but Rawls’s veil of ignorance offers arguably the best proxy for “objective” we have: if a rule is acceptable to all when no one knows who they’ll be, it has a strong claim to fairness. In deploying Rawlsian AI, we strive for a world where, even if I end up being the denied loan applicant or the unsuccessful job candidate, I can acknowledge that the system treated me based on fair criteria and not because of irrelevant prejudices or structural biases. That doesn’t eliminate disappointment, but it preserves legitimacy and trust. People are more likely to accept outcomes, even unfavorable ones, if they believe the process was fair[63][50].
The stakes are high. AI is increasingly embedded in governance (smart cities, predictive policing), economics (automated trading, gig work platforms), and personal life (dating apps, news feeds). If these algorithms carry forward historical injustices or create new opaque inequities, they could deepen social divides, erode individual rights, and provoke justified backlash against technology. Conversely, if we succeed in aligning AI with ethical principles, we could harness AI to reduce injustice—making distribution of resources more equitable, decision processes more consistent and unbiased, and services more accessible. Imagine credit algorithms that extend credit in under-served areas responsibly, thus breaking cycles of poverty, or hiring algorithms that identify non-traditional talent and foster diversity, or medical AI that ensures treatments are effective across all ethnicities and not just the majority group. These are not utopian fantasies; they are within reach if we deliberately design for them. We cited real examples where when fairness was prioritized, innovations followed that benefited those who used to be left behind[39][41].
In closing, we recall the list of luminaries invoked at the outset—scholars like Euclid, Newton, Noether, Turing, etc. They each advanced human knowledge and capability, often by finding order and structure where none was obvious. In a way, what we propose is to introduce a moral structure into the frontier of AI: to shape the development of AI with the same rigor and aspiration that mathematicians use to prove theorems or scientists use to discover laws of nature. John Rawls can stand alongside those figures as a guide, helping ensure that our pursuit of technological advancement remains tethered to the pursuit of justice. As Rawls himself wrote, “Justice is the first virtue of social institutions” – and increasingly, AI algorithms are becoming social institutions in their own right. Therefore, they too must be guided by the virtue of justice[20][21].
By uniting the power of NAS with the wisdom of Rawls’s theory, we take a decisive step toward AI that anyone—regardless of race, gender, class, or creed—can view and say, “Yes, this is treating us fairly.” Achieving that level of objectivity and acceptance is a formidable challenge, but one worthy of our best efforts. The promise of AI is too great to be squandered by avoidable injustices. With general principles to remain general (as our user requested), our hope is that the ideas herein are adaptable across AI domains and resilient to changes in technology. They sketch a general philosophy and approach to keep AI ethically aligned with human values in an objective, principled way.
To echo a Rawlsian sentiment in the context of AI: We should design the algorithms that govern us as if we ourselves could be the most adversely affected by them. If we hold true to that maxim, we will go a long way toward ensuring that AI serves as a tool of empowerment and justice, rather than oppression or unfairness. This comprehensive law review analysis has aimed to demonstrate not only why that goal is crucial, but also how, through interdisciplinary synthesis and innovative technology, it is practically attainable. In the final analysis, aligning AI with justice as fairness is both an ethical obligation and a technically achievable objective—one that, if realized, will stand as a milestone in the ongoing project of integrating our highest moral ideals with our most advanced technological capabilities.
Sources:
· Leben, Derek. AI Fairness: Designing Equal Opportunity Algorithms (MIT Press 2025). Rawlsian inspiration for algorithmic justice[46][30].
· Sukthanker, Rhea et al. “On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition.” ICLR 2023. Demonstrating NAS finding fairer models[8][41].
· Sheng, Yi et al. “The Larger the Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices.” Montreal AI Ethics Institute (Oct 30, 2022). Introducing fairness-aware NAS (FaHaNa) and its results[9][10].
· Chen, Lin. “Fairness-Aware Classification Based on Rawlsian Veil of Ignorance” (2025). Translating Rawls’s veil into ML, showing improved fairness and accuracy[27][28].
· Grace, Jamie & Bamford, Roxanne. “‘AI Theory of Justice’: Using Rawlsian Approaches to Legislate Better on Machine Learning in Government.” Amicus Curiae Series 2, Vol.1 (Spring 2020). Applying Rawls to algorithmic governance[23][56].
· Weidinger, Laura et al. “Using the Veil of Ignorance to Align AI Systems with Principles of Justice.” PNAS 120(18) (2023). Studies showing people choose worst-off prioritizing principles behind veil[29].
[1] [2] [9] [10] [35] [36] [37] [38] [39] [42] [43] [54] [62] The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices | Montreal AI Ethics Institute
[8] [40] [41] [53] On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition | OpenReview
[12] [27] [28] [29] [32] [33] [49] [59] [61] Using the Veil of Ignorance to align AI systems with principles of justice | Request PDF
[30] [31] [45] [46] [47] [50] [63] In AI FAIRNESS, Dr. Derek Leben Proposes a Theory of Algorithmic Justice | Tepper School of Business
[34] Fairness in AI systems development: EU AI Act compliance and beyond - ScienceDirect
Comments