The Neural Architecture of Justice: Ensuring AI Fairness through Rawlsian Ethics and NAS

Don Hilborn
Jan 10
58 min read

Abstract

Artificial Intelligence (AI) systems increasingly shape decisions in employment, lending, healthcare, criminal justice, and other vital domains. These systems must not only be intelligent but also just – their actions should be ethical towards all human beings and defensible as fair by anyone affected. This article proposes a comprehensive framework for “ethical by design” AI by combining advances in Neural Architecture Search (NAS) technology with the philosophical vision of John Rawls’s A Theory of Justice. Drawing on classical ethical thought from Aristotle, Plato, Kant, and Confucius to modern perspectives from Rawls, we develop a theory of algorithmic justice inspired by Rawls’s principles of fairness[1]. We argue that NAS – an AutoML technique that automatically builds high-performing neural networks – can incorporate Rawlsian principles (like the “veil of ignorance” and the difference principle) as design criteria, yielding AI models that maximize accuracy and uphold fairness. Through this synthesis of moral philosophy and machine learning, AI models can be optimized to respect equal rights, equal opportunity, and benefit the least-advantaged, thereby ensuring their decisions would be considered ethical by all those impacted. After outlining the theoretical basis, we present a law review-style analysis: an Introduction to the problem of AI bias and the need for objective ethics; Background on Rawls’s theory and NAS technology; the core Argument for integrating Rawlsian ethics into NAS-driven AI design (with technical depth on fairness metrics and optimization); consideration of Counterarguments (including technical, philosophical, and legal challenges); and Policy Proposals for governance and oversight to operationalize this approach. We conclude that a “Rawlsian NAS” paradigm can transform AI development, embedding justice into algorithms from the ground up. This vision aligns with emerging standards and can guide regulators, courts, and developers toward trustworthy AI that is not only powerful but principled – a union of technological innovation and ethical rigor capable of meeting the Yale Law Review’s standards of depth and scholarship.

Introduction

What does it mean for an Artificial Intelligence to act “ethically” toward humans? As AI systems take on decisions once made by people – hiring an employee, approving a loan, diagnosing a patient, sentencing a defendant – society increasingly demands that these algorithmic decisions be fair and just. Concerns about AI bias and discrimination are now front-page news, from facial recognition systems that misidentify individuals of certain races to loan algorithms that disproportionately deny credit to marginalized groups. In essence, we face a pressing question: how can we ensure AI behaves ethically and treats each person fairly, especially when humans from all walks of life may be impacted by its actions? This question bridges technology, law, and moral philosophy. It calls for a framework that any affected person, looking objectively at the AI’s actions, would deem fair.

The pursuit of justice in human affairs is ancient. No idea in Western civilization has been more consistently linked to morality than the idea of justice,[2] and every major ethical tradition grapples with it. Plato’s Republic envisioned a just society where each individual plays the role most suited to their abilities, governed by wise philosopher-kings. Aristotle defined justice as giving each person what they deserve – “equals should be treated equally and unequals unequally”[3]. In his view, fairness meant treating like cases alike and different cases in proportion to relevant differences, a principle still echoed in modern anti-discrimination law. Confucius, in a parallel tradition, taught the Golden Rule of reciprocity: “Never impose on others what you would not choose for yourself”[4]. This ethos of empathy and impartiality appears across cultures as a guide to ethical conduct. Immanuel Kant, centuries later, formalized a test for moral action through his Categorical Imperative – act only on principles that you could will to be universal laws, and treat individuals always as ends in themselves, never merely as means. This Kantian demand for universality and respect for persons underlies the notion of procedural fairness in ethics[5][6]. Together, these thinkers laid a rich foundation: ethical action requires impartiality, respect, and a fair distribution of benefits and burdens.

Building on this foundation, John Rawls in 1971 reframed justice for a modern democratic society with his landmark work A Theory of Justice. Rawls asked: if we could design society’s rules from an objective standpoint – not knowing our own race, class, or gender – what principles of justice would we agree upon? He answered with his now-famous thought experiment of the “original position” behind a “veil of ignorance.” Behind this veil, Rawls argued, rational individuals stripped of personal biases would agree on fair principles for all[7]. Rawls’s thought experiment operationalizes the ancient Golden Rule and Kant’s universality in a novel way: by being ignorant of our own position, “we can more objectively consider how societies should operate,” free from self-interest[7]. The principles chosen under such conditions, Rawls contended, would constitute justice as fairness.

Rawls derived two fundamental principles. First, the Principle of Equal Basic Liberties: “Each person has an equal right to a fully adequate scheme of equal basic liberties, compatible with the same scheme for all,” ensuring fundamental rights and freedoms for every individual[8]. Second, the Principle of Social and Economic Inequalities, which has two parts: (a) Fair Equality of Opportunity – offices and positions must be open to all under conditions of fair opportunity (not just formal non-discrimination but genuine leveling of the playing field); and (b) the Difference Principle – inequalities are permissible only if they “are to the greatest benefit of the least advantaged members of society.”[8] In Rawls’s vision, a just society maximizes the position of those worst-off and guarantees equal fundamental rights and opportunities to all. By approaching hard social questions through this veil of ignorance and applying the two principles, “fairness, as Rawls and many others believe, is the essence of justice.”[9]

Translating these lofty philosophical ideas into the realm of AI, we arrive at a guiding intuition: an AI’s decisions should be considered fair and ethical if all persons affected – as if behind a veil of ignorance about who they will be – can agree to those decisions. In other words, if we would accept an AI’s decision-making process not knowing whether we’ll end up being the millionaire or the pauper, the accused or the victim, the majority or a minority, then that AI can be deemed just. This article argues that to achieve such impartial fairness, we should design and constrain AI from the start with Rawlsian principles in mind. Rather than relying on after-the-fact fixes or mere goodwill, we propose embedding fairness into the architecture of AI models using advanced machine learning techniques.

Enter Neural Architecture Search (NAS) – a cutting-edge technology in which algorithms automatically discover high-performing neural network architectures. NAS has already proven its prowess by designing neural networks that exceed human-designed models in accuracy for tasks like image recognition and natural language processing. Crucially, NAS can optimize multiple objectives, not just accuracy[10][11]. This means we can ask NAS to search for network architectures that perform well and meet ethical criteria – for example, minimizing bias or improving transparency – effectively baking our values into the model’s design. Recent research confirms that network architecture “profoundly influences fairness” in outcomes[12]. In fact, the first neural architecture search for fairness demonstrated that it is possible to automatically design neural networks that are Pareto-optimal on both accuracy and fairness metrics, outperforming traditional bias-mitigation methods[13][14]. In plainer terms, we do not always have to trade off fairness against performance – with clever multi-objective optimization, we can sometimes achieve both. This opens the door to ethical optimization: using NAS to find AI models that uphold fairness constraints by design.

Marrying Rawls with NAS yields what we might call “the neural architecture of justice.” Imagine an algorithm designer impartially considering model behavior from all perspectives (a Rawlsian original position for AI design), and then encoding fairness goals into an automated search process that explores countless model configurations to meet those goals. The end product would be an AI model not only evaluated for accuracy in its task, but also vetted for justice in its impact. For example, a lending AI could be designed to maximize predictive accuracy in assessing creditworthiness while also respecting a fairness rule that any predictive errors are evenly distributed across demographic groups (avoiding, say, higher false denial rates for minorities). Or an employment screening algorithm could be constrained to select candidates in a way that improves opportunities for historically disadvantaged groups – an echo of Rawls’s difference principle – while still hiring the most qualified. NAS can handle such multi-criteria goals by searching through architectures and hyperparameters, guided by a composite objective function or a constrained optimization approach that includes ethical measures.

The stakes for this approach are high. If successful, it could resolve some of the tension between AI efficiency and ethical legitimacy. It could provide a systematic, evidence-based method for building fairness into AI from the ground up, rather than retrofitting ethics afterwards. It might also offer a more rigorous standard for regulators and courts to evaluate AI systems. Indeed, regulators worldwide are moving in this direction. The U.S. National Institute of Standards and Technology (NIST), for example, now includes “Fairness with harmful bias managed” as one of the key characteristics of trustworthy AI, alongside safety, transparency, and privacy[15]. The European Union’s draft AI Act explicitly calls for AI systems (especially high-risk systems in hiring, credit, policing, etc.) to incorporate measures for bias detection, correction, and non-discrimination[16]. These policy developments underscore that fairness is no longer optional – it is becoming a legal and normative requirement for AI. Yet, principles alone, as ethicists caution, “cannot guarantee ethical AI” without concrete operationalization[17]. Our proposal responds to that challenge by suggesting an operational pathway: use the best of machine learning engineering (NAS and related techniques) guided by our best theories of justice (Rawls’s principles) to produce verifiably fair and objective outcomes.

Before we can fully develop this argument, a range of background topics must be clarified. We will first revisit key philosophical frameworks on ethics and fairness that inform our approach, from the ancients through Kant to Rawls (and we will see how Rawls synthesizes and innovates upon prior ideas). We will then explain the current landscape of AI ethics and “algorithmic fairness” – how biases manifest in AI, what technical definitions of fairness have been proposed, and why these often conflict or require value judgments. In tandem, we provide an accessible overview of Neural Architecture Search technology and its capabilities, since NAS is a central tool in our proposal. With this groundwork laid, the article’s central argument will be presented: how Rawlsian principles can be translated into design criteria for AI, and how NAS can implement those criteria to yield ethical AI models by design. We will explore illustrative scenarios and recent research findings that bolster this argument (including case studies of NAS-discovered fair models).

The article will also grapple with counterarguments and limitations. Can we really encode something as complex as Rawlsian justice into machine-learning objectives? What if different stakeholders disagree on what “fairness” means? Might a Rawlsian approach sacrifice too much accuracy, or conversely, could optimizing for certain fairness metrics produce perverse results? We address critiques from philosophers (e.g. Harsanyi and Nozick) who challenged Rawls, as well as practical challenges noted by computer scientists (such as the risk of “fairness gerrymandering” or the problem of unrepresented values). We also consider whether focusing on distributive fairness (Rawls’s forte) might overlook other moral dimensions like individual rights or virtue – perspectives championed by Kant or Aristotle – and how those might be integrated or at least not violated in our framework.

Finally, recognizing that technology alone is not a silver bullet, we turn to policy proposals. We outline how legislators, regulators, and industry standards bodies can promote the adoption of Rawlsian NAS principles. These include mandating fairness impact assessments for high-stakes AI (akin to an “ethical audit”), encouraging or requiring the use of multi-objective AutoML tools to search for fair solutions (particularly in domains like credit, employment, criminal justice where biases have well-documented harms), and creating certification programs for AI fairness (as some researchers have suggested, e.g. a seal indicating an AI system was independently audited and judged fair[18]). We also suggest governance structures – such as interdisciplinary ethics review boards within AI development teams – to simulate the impartial deliberation of Rawls’s original position when setting AI design goals. In short, we aim to bridge the gap between theory and practice, offering a roadmap for turning Rawls’s abstract principles into concrete requirements and tests that AI systems can pass.

This synthesis of ideas is ambitiously cross-disciplinary. It enlists the wisdom of philosophers like Rawls, Aristotle, Kant, and Confucius, and the ingenuity of mathematicians and scientists like Euclid, Newton, Gauss, Euler, Emmy Noether, Alan Turing, Srinivasa Ramanujan, and Terence Tao. Why invoke such a pantheon? Because ensuring AI’s ethical alignment is one of the great intellectual challenges of our time – a challenge that, like the grand problems tackled by those thinkers, demands both conceptual clarity and technical virtuosity. Euclid and Hilbert showed us the power of axiomatic systems; here we seek axioms for fair AI behavior. Newton, Gauss, and Euler built equations to describe nature’s laws; here we formulate objective functions to encode moral laws. Noether taught that symmetry yields conservation laws – by analogy, we seek symmetry in how an AI treats different groups, yielding a form of conserved fairness. Turing pioneered the idea of a test for machine intelligence (the Turing Test); we consider what a test for machine justice might entail. Ramanujan and Tao exemplify creative leaps in solving intractable problems – similarly, bridging human values and machine optimization may require new, ingenious solutions. By uniting perspectives from ethics, law, and computer science, we hope to contribute a comprehensive approach – one rigorous enough for academic scrutiny and practical enough for real-world impact.

In the pages that follow, we develop this approach systematically. The Background will ground readers in the necessary theory and technology. The Argument will then make the case for Rawlsian NAS in detail. We will support each step with current research and examples, citing academic literature and legal reasoning as appropriate (and preserving citations in the Yale Law Review format, with footnote-style references like this text[13] indicating source material). Counterarguments will be addressed candidly to acknowledge the approach’s limits and alternatives. We will then move to Policy Proposals that outline how this concept could be implemented in governance and practice. Finally, the Conclusion will reflect on the broader implications – how embedding justice into AI could transform our relationship with these powerful technologies, and why an interdisciplinary approach (such as that between Rawls’s philosophy and NAS’s technical capabilities) is essential for AI’s next evolution.

By aiming for generality in our discussion, we intend this article to be accessible and relevant to a wide readership – legal scholars, ethicists, technologists, and policymakers alike. Given the complexity of the topic, we prioritize clarity: sections are organized with informative headings, arguments are distilled into core points, and technical content (including mathematical explanations of fairness metrics or optimization algorithms) is included where it illuminates the issue, but without overwhelming readers who may not have a computer science background. The goal is a truly comprehensive law review article, of the caliber that could appear in the Yale Law Review, examining how we can ensure AI models act ethically – by design rather than by accident. In short, we explore how the combination of NAS technology and Rawls’s theory of justice provides a pathway to AI systems that anyone impacted, viewing objectively, would consider fair.

(The remainder of this Article proceeds as follows.)

Background

Philosophical Foundations of Justice and Fairness

Justice is a timeless concept in moral and political philosophy, intimately connected with the idea of fairness. Philosophers since antiquity have wrestled with defining justice and the ethical treatment of individuals within society. A brief survey of these ideas will help frame our approach to AI ethics:

Plato (428–348 BCE): In The Republic, Plato presents a vision of a perfectly just city (the “Kallipolis”). Justice, for Plato, is achieved when each class of citizens performs its proper role in harmony – rulers govern with wisdom, warriors protect with courage, and producers work with moderation – and no class usurps the role of another. At the individual level, justice mirrored this structure: a person is just when reason governs spirit and appetite within the soul. While Plato’s conception is not “justice as fairness” in the modern sense, it emphasizes order, balance, and the good of the whole. Notably, Plato considered justice a fundamental virtue, essential to a well-functioning society[2]. His work underscores that ethical governance requires impartial principles, albeit enforced by enlightened autocrats in his ideal city. In our context, one might say a just AI system should similarly balance competing objectives (like accuracy vs. fairness vs. utility) such that each objective has its proper place and no interest unfairly dominates – a kind of Platonic harmony among an AI’s priorities.
Aristotle (384–322 BCE): A student of Plato, Aristotle offered perhaps the earliest explicit definition of fairness. In the Nicomachean Ethics and Politics, he wrote that “justice means giving each person what he or she deserves”[19]. This notion is often paraphrased as “treat equals equally, and unequals unequally according to their relevant differences.” For example, Aristotle notes it is just to reward people in proportion to their contributions or merits, and unjust to base differences in treatment on irrelevant factors like lineage. He distinguishes between distributive justice (allocating benefits/burdens fairly across society, often proportionally) and rectificatory justice (remedying wrongs and enforcing voluntary transactions honestly). Importantly, Aristotle’s view implies a form of proportional equality – fairness isn’t always strict equality; rather, it’s aligned with reasoned criteria. This insight is embedded in many modern legal principles. For instance, paying two employees equally for equal work, regardless of gender, is just (gender being irrelevant to job performance), whereas difference in pay is justified if one works more hours or has higher output (effort or output being relevant differences)[20]. Aristotle also warned, “the worst form of inequality is to try to make unequal things equal,” cautioning against rigid egalitarianism that ignores relevant distinctions[21]. In AI ethics, Aristotle’s perspective reminds us that fairness may require nuanced treatment (e.g., algorithms might justly consider different contexts or needs – such as providing more resources to those with greater need, analogous to proportionality by need). However, when AI systems discriminate on bases like race or sex that are irrelevant to the task, Aristotle would deem that unjust – a point that aligns with current anti-discrimination norms[22]. We see then a continuity: an AI that, say, charges higher insurance premiums to individuals solely due to race (holding risk factors constant) would violate Aristotle’s basic fairness principle as well as legal standards.
Confucius (551–479 BCE) and Eastern Philosophy: While Western thought stresses abstract principles, Eastern ethical traditions, such as Confucianism, emphasize moral duties in context of relationships and virtues like benevolence (ren) and righteousness (yi). Confucius articulated a version of the Golden Rule centuries before Christ: “Do not do to others what you do not want done to yourself.”[4] This is essentially a principle of empathy and reciprocity. Unlike a structured social contract, Confucian fairness arises from cultivating virtue and proper conduct (propriety li) in each relationship (ruler-subject, parent-child, etc.). However, its implication for AI is significant: it suggests that AI should not treat users in ways that we (as designers or as users ourselves) would find objectionable if we were in their place. For example, if we would find it intolerable to be denied an opportunity by a biased algorithm, we ought not program algorithms to treat others that way. Confucian thought also values harmony and inclusion, implying that technologies should strive to avoid societal discord and should include all under heaven (tianxia) in their benefits. We see echoes of this in modern calls for inclusive design and avoiding biased harms. While Confucianism does not present formal principles like Rawls, it adds a perspective of relational ethics – focusing on the human impact and the cultivation of benevolent intent. An AI development process informed by Confucian ethics might emphasize the responsibility of developers (analogous to benevolent rulers) to care for users and those affected, ensuring the AI’s actions promote social harmony rather than division[23].
Immanuel Kant (1724–1804): Jumping forward, the Enlightenment gave us deontological (duty-based) ethics in the figure of Kant. Kant’s Categorical Imperative provides two formulations especially relevant to fairness: (1) Universalizability: “Act only according to that maxim whereby you can, at the same time, will that it should become a universal law.” If a rule for an AI’s behavior cannot be universalized (e.g., “the AI may lie to the user when convenient” fails because if all agents lied arbitrarily trust would collapse), then it’s not ethical. (2) Humanity as an end: “Act so that you treat humanity, whether in your own person or in another, always at the same time as an end, never merely as a means.”[24][25] In AI terms, this insists that individuals impacted by an AI must be treated with respect and dignity, not as mere data points or means to an end[6]. One can interpret this as a requirement for transparency and accountability – AI systems should not exploit or manipulate people (say, by maximizing engagement through dark patterns that treat users as means to profit), and they should be designed in a way that respects users’ autonomy (for instance, allowing users to contest decisions or understand the reasoning). Kantian ethics also undergirds the notion of procedural fairness: the idea that the process by which decisions are made should be one that everyone could accept as fair if applied universally. This resonates with Rawls’s original position, which is essentially a procedural test for fairness. Indeed, scholars often note Rawls’s philosophy has Kantian influences – the original position’s parties are rational and autonomous, concerned with rights and universality, much like Kantian moral agents[26][27]. In practice, a Kantian lens on AI leads to principles like non-deception, data privacy (treating personal information with respect for the individual, not exploiting it), and the right to explanation (since respecting someone as an end entails giving due account of decisions that significantly affect them). For example, the European Union’s AI ethics guidelines and data protection laws (GDPR) echo Kant by emphasizing human agency, oversight, and dignity in automated decisions.
The Link to Rawls – A Synthesis: Rawls can be seen as synthesizing elements of these prior views into a new framework for a democratic society. From Aristotle and classical republicanism, he retains the idea that justice is about fairness and reasoned principles for distribution. From the social contract tradition (Hobbes, Locke, Rousseau), Rawls borrows the idea of a hypothetical agreement as the basis of legitimate principles – but he crucially conditions it with the veil of ignorance to ensure true impartiality. From Kant, Rawls borrows the respect for persons and autonomy: the original position models free and equal rational beings choosing universal laws, much as Kant’s kingdom of ends imagines all individuals legislating moral law. From utilitarianism (Jeremy Bentham, John Stuart Mill), Rawls pointedly diverges – utilitarian ethics would have us maximize overall happiness, but Rawls notes this could permit severe inequalities or sacrifice of minorities if it raised the total sum of welfare. Rawls instead ensures a floor for the least advantaged (the difference principle) and inviolable equal basic liberties. In doing so, he attempts to correct what he saw as utilitarianism’s neglect of the distinction between persons (each individual’s rights and situation matter, not just the aggregate).

Rawls’s contribution, then, is a coherent theory of justice as fairness: a set of principles everyone could agree to in an impartial setting, guaranteeing equal rights and benefitting those in lesser positions. It is a contractarian theory (since it stems from a notional social contract) and a strongly egalitarian one, though not demanding strict equality of outcome (inequalities are allowed if they help those with less). It is also a principled (deontological) rather than purely consequentialist theory – it does not simply say “maximize good results” but rather “arrange society according to these principled constraints.” This has made Rawls’s theory profoundly influential in law, politics, and ethics. Many constitutions and international human rights charters reflect similar priorities of liberty, equality of opportunity, and concern for the marginalized.

Before moving on, it’s worth noting a point of convergence across these philosophies: the ethical viewpoint is often an impartial one. Plato’s philosopher seeks truth beyond personal bias; Aristotle’s judge applies proportional reasoning blind to irrelevant traits; Confucius’s rule of reciprocity swaps self and other; Kant’s imperative demands one consider one’s act as a universal act; Rawls’s veil explicitly blocks knowledge that could bias one’s judgment. This convergence on impartiality or objectivity is precisely what the user’s question highlights – the idea that an AI’s actions should be ethical “if viewed objectively by anyone impacted.” In philosophical terms, this is asking for a standpoint of the “impartial spectator” (to borrow Adam Smith’s term) or the “view from nowhere” in moral reasoning. Rawls’s original position is arguably the most developed version of that idea. Thus, it provides a very apt framework for evaluating AI decisions: we can ask, would all stakeholders accept this AI’s decision process if they did not know which stakeholder they will be? If yes, the process is fair. If not, there is a bias or unfairness to be addressed. This is the philosophical litmus test that will guide our technical and policy exploration.

The Rise of Algorithmic Bias and the Quest for Fair AI

Having surveyed the ethical foundations, we now turn to the problem domain in which we seek to apply them: contemporary AI systems and their propensity for bias or unfair outcomes. In recent years, a new field of research – often called “algorithmic fairness” or “fair ML (machine learning)” – has emerged, precisely because of numerous instances where AI systems behaved in ways that many would deem unjust[28]. To appreciate both the need for and the mechanics of a Rawlsian NAS approach, we must understand how and why AI can produce ethical problems, and what existing methods exist to address them.

Bias in, bias out: A maxim often cited is “algorithmic bias is human bias in disguise.” Most AI systems, especially those using machine learning, learn patterns from historical data. If those data reflect societal biases or inequities, an unconstrained AI will often propagate or even amplify them[29][30]. For example, if a hiring algorithm is trained on years of past hiring decisions at a firm where managers (perhaps unconsciously) favored male candidates, the algorithm may learn to associate male gender with “successful hire” and end up discriminating against female applicants – even if gender is not an explicit input, it can use correlated factors as proxies. Similarly, predictive policing algorithms trained on crime data may direct more patrols to minority neighborhoods, not purely because of need, but partly because historical over-policing in those areas generated more arrests (a feedback loop). These are real concerns documented in studies[31][30]. In Rawlsian terms, such outcomes often violate fair equality of opportunity and may harm the least advantaged groups – exactly what we aim to prevent.

Bias can enter AI in various forms: training data bias (skewed samples, historical prejudice, measurement errors), model bias (certain model architectures or features create disparate impacts), and deployment bias (the context of use causes uneven effects). A notorious example was COMPAS, a criminal recidivism risk algorithm, found to have much higher false positive rates for Black defendants than white defendants (i.e. labeling Black individuals as high risk incorrectly more often) – raising serious fairness and due process concerns. Another example: face recognition systems developed primarily on lighter-skinned faces had far higher error rates on darker-skinned faces (in some cases failing to identify Black women up to 34% of the time while almost never misidentifying white men)[32][33]. Such disparities can translate to real harms – from wrongful arrests to denial of services.

Definitions of fairness: To tackle these issues, researchers have proposed formal fairness criteria for algorithms. These definitions attempt to mathematize what it means for an algorithm to be “fair.” A few key ones include: (a) Demographic Parity – the decision (like loan approval) is independent of protected attributes like race or gender; e.g., the loan approval rate should be equal for different groups (or at least not disproportionately low for one group). (b) Equalized Odds – the error rates are equal across groups; for instance, in a binary classification, the true positive rate and false positive rate for a favorable outcome (like being labeled “low risk”) should be equal across groups. This was one point of contention in the COMPAS case – one could demand equal false positive rates for Black and white defendants as a fairness criterion. (c) Equal Opportunity (a special case of equalized odds) – require only that true positive rates are equal (so qualified individuals of any group have equal chance to be correctly recognized as such)[34]. (d) Calibration fairness – each risk score or prediction means the same thing regardless of group (if the model says someone has a 10% default risk, it should empirically be ~10% for any demographic). Sometimes these criteria conflict: it’s been proven that except in special cases, one cannot satisfy all of them simultaneously if base rates differ between groups (known as the “impossibility theorem” in fair ML). This means trade-offs are inevitable – a key point where ethical and political judgment comes in[35] (which definition of fairness to prioritize?). Derek Leben’s work, for example, highlights that “it is often impossible to satisfy all [fairness metrics] at once; organizations must decide which definitions of fairness to prioritize”[36][37]. This echoes Aristotle’s caution and illustrates why a Rawlsian approach (or any principled approach) is needed to choose between fairness notions.

Algorithmic fairness in practice: Alongside definitions, techniques for bias mitigation have been developed. These fall into three categories: pre-processing (altering the training data to remove bias, e.g. re-weighting or anonymizing), in-processing (changing the learning algorithm or adding regularization terms to penalize bias during training), and post-processing (adjusting model outputs to satisfy fairness criteria, like equalizing acceptance rates after initial predictions). Each approach has pros and cons. Pre-processing can ensure the model never sees biased information but may discard important context; post-processing can guarantee fairness criteria but might undermine accuracy or seem like an arbitrary quota if not done carefully. In-processing, such as adding a constraint that forces equalized odds, directly embeds fairness into the model’s objective function.

It is here that our interest in Neural Architecture Search intersects. Most bias mitigation research focuses on adjusting data or training, keeping the model architecture fixed. However, recent work suggests that the very design of a neural network (its depth, layer connections, etc.) can affect how it generalizes and whether it picks up spurious correlations linked to sensitive attributes[38][39]. For instance, certain architectures might inadvertently encode more information about protected attributes in their internal representations (making it easier for downstream decisions to be biased). Conversely, one might find architectures that inherently learn more “robust” features that are less correlated with sensitive traits, thereby reducing bias. In 2022, researchers R. Sukthanker, Samuel Dooley et al. conducted what they called the first large-scale search for fair architectures in the context of facial recognition[12]. Their FairNAS approach (Fair Neural Architecture Search) jointly optimized model architecture and hyperparameters with a multi-objective: maximize face recognition accuracy and minimize bias between demographic groups. Strikingly, they found that some neural network architectures consistently showed lower bias (e.g., similar error rates across genders and races) compared to others at similar accuracy[10][14]. They even discovered architectures that Pareto-dominated conventional fairness strategies – meaning these architectures achieved both higher accuracy and lower bias than models produced by prior bias-mitigation techniques[14]. This suggests a profound insight: the structure of the model can itself be a vehicle for fairness. Rather than treating the model as a neutral vessel and only tweaking training, we can search for a model that is predisposed to fairness.

To illustrate, the FairNAS study reported that their NAS-discovered models, when transferred to new datasets and even new sensitive attributes, maintained superior fairness properties[40]. One reason, they found, was that these architectures learned internal representations where protected attributes (like gender) were less linearly separable – essentially harder to detect – in the latent features[41]. In plain terms, the models weren’t encoding “male vs female” or “Black vs white” in an easily exploitable way in their decision-making process, focusing instead on task-relevant features. This is analogous to enforcing a kind of symmetry or invariance: ideally, a fair model would treat two individuals the same regardless of, say, their gender, which means the internal representation of the problem shouldn’t carry gender information that could skew outcomes. Emmy Noether’s famous theorem in math links symmetry to conserved quantities; here, demanding symmetry (invariance) of predictions under swapping sensitive attributes can be seen as a constraint leading to “conserved fairness.” Our approach will leverage this idea by explicitly searching for models that exhibit such invariances.

Finally, it’s worth noting that fairness in AI is not solely a technical issue but a socio-technical one. That means resolving it often requires combining technical measures with human and policy oversight. For example, simply balancing error rates might not suffice if the overall system is making decisions that lack transparency or violate human rights. Fairness is one aspect of what the AI ethics literature calls “trustworthy AI,” which also includes transparency, accountability, robustness, etc.[42][43]. Rawls’s first principle – equal basic liberties – reminds us that fairness cannot come at the expense of rights like privacy or freedom of expression. So, a fair algorithm shouldn’t, for instance, violate privacy to achieve statistical parity. Our framework thus situates fairness within a broader ethical context: an AI should first and foremost not transgress fundamental rights (a constraint aligning with Rawls’s priority of liberty), and within that space, it should be designed to promote equitable outcomes.

In summary, the problem backdrop is this: AI systems can encode and perpetuate social injustices unless explicitly guided otherwise. Multiple definitions of “fairness” exist, reflecting different value choices (equality of outcome vs equality of opportunity, etc.). Completely eliminating bias is often technically challenging and may involve trade-offs with accuracy or among fairness measures. The emergent consensus in research and policy is that some biases must be managed – hence the flurry of work in algorithmic fairness and the inclusion of fairness in frameworks like NIST’s and the EU’s. This is the arena in which we propose a Rawlsian-NAS solution: by clearly defining the kind of fairness we seek (inspired by Rawls’s impartial justice) and using NAS to find models that best achieve it, we aim to navigate the technical trade-offs in a principled way.

Neural Architecture Search (NAS): Automating Ethical Model Design

To appreciate how we can implement ethical principles in AI design, one must understand the tool we plan to use: Neural Architecture Search. NAS is a subfield of automated machine learning (AutoML) that focuses on discovering the optimal neural network architecture for a given task, without human trial-and-error in designing the model. In essence, NAS is like having a robot scientist that tries out myriad model designs and learns which one works best. Traditionally, machine learning experts would spend considerable effort hand-crafting neural network architectures (deciding the number of layers, how they connect, what type of layers – convolutional, recurrent, transformers, etc.). NAS automates this process, treating model design itself as a learning problem.

How does NAS work? In general, NAS algorithms have three components[44][45]:

Search Space: This defines what kinds of architectures can be explored. It could be all possible combinations of certain layer types up to a depth, or even more flexible representations (like a directed acyclic graph of operations). For example, a search space might allow different convolution filter sizes, various numbers of neurons, skip-connections, etc., covering millions (or more) possible networks. A larger search space increases the chance of finding a good architecture but is harder to search comprehensively[46].
Search Strategy: This is the algorithm that navigates the search space to propose new architectures to try. Common strategies include reinforcement learning (where a controller RNN generates architectures and is rewarded for good performance, gradually learning to generate better ones)[47], evolutionary algorithms (starting with a population of random architectures and iteratively “mutating” and “crossover” combining them, keeping the better “survivors” each generation)[48], and gradient-based methods (relaxing the architecture choices into a continuous optimization problem and using gradient descent to find an optimal mix). There are also random or grid search approaches, but those are often inefficient given the astronomically large design spaces.
Performance Estimation Strategy: For each candidate architecture, the NAS must estimate its performance on the task – typically by training the model on data and evaluating on a validation set. This can be very expensive (training each model from scratch). So methods like weight sharing (train a single over-parameterized network that can simulate many architectures) or using a smaller proxy dataset/training time are employed to speed this up[49]. Advances here have made NAS much more practical than in its early days; once it might take thousands of GPU-hours, now more efficient NAS can run in hours or days on modest hardware.

The outcome of NAS is an architecture often on par with or superior to human-designed ones for the task at hand. For instance, NAS was famously used by Google to discover novel convolutional network architectures (like NASNet, AmoebaNet) that achieved state-of-the-art image classification accuracy. A recent trend is Multi-Objective NAS, where the search is not just for accuracy but also other metrics like model size (for efficiency), latency (for speed), or energy consumption. For example, one might search for the smallest network that still achieves at least 99% of some reference accuracy.

Our interest lies in making fairness or ethical criteria an objective in NAS. Conceptually, this is straightforward: one can encode a fairness metric (say, negative disparity between groups) into the reward function or selection criteria. For example, a multi-objective NAS might have two goals: maximize accuracy and minimize the difference in false positive rates between protected groups. The NAS would then treat an architecture as “better” if it achieves a good balance of these, potentially using Pareto-optimality: an architecture is Pareto-dominated if another architecture is no worse on all objectives and strictly better on at least one[50]. The goal is to find the set of Pareto-optimal architectures – those where you cannot improve fairness without reducing accuracy or vice versa. Among these, a human or policy preference could then pick the desired trade-off (for instance, the most fair model among those that meet a minimum accuracy).

As referenced earlier, FairNAS (fairness-aware NAS) approaches have already been attempted in research[51]. A notable study reported that a joint NAS + Hyper-Parameter Optimization (HPO) search “indeed discovers simultaneously accurate and fair architectures” for face recognition[52]. These architectures, once found, were tested against standard bias mitigation techniques (like forcing the embedding to be similarity invariant across demographics) and were found to be superior in the accuracy-fairness trade-off[14]. Intriguingly, when these NAS-discovered architectures were further combined with traditional mitigation (e.g., adding a debiasing loss), the results improved even more[53] – implying that architecture search and other bias techniques are complementary and can be layered for greater effect.

The use of NAS for ethical aims is not limited to fairness. One could imagine NAS optimizing for transparency (e.g., searching for architectures that are more interpretable or amenable to explanation) or for robustness (to avoid disparate impact under adversarial conditions). In fact, a 2024 white paper by Hilborn suggests using NAS to promote multiple ethical dimensions in Large Language Models (LLMs) – it explores NAS for fair architectures, automated bias detection modules, and transparency optimization in models like GPT-3[54]. The idea is that NAS might design networks that inherently include sub-networks for monitoring bias or have structures that make the model’s decisions easier to interpret. While this is still speculative, it shows the broad potential of NAS as a tool for encoding ethical desiderata into AI, beyond just raw performance.

It is important to clarify that NAS is not magic – it will find correlations and structures based on the data and objectives we give it. If we misspecify the objective or provide biased data, NAS could still yield biased models (just as any ML could). The advantage NAS offers is flexibility and discovery: it might find non-intuitive architectures that a human designer wedded to conventional designs wouldn’t consider, yet that achieve a better fairness-performance mix. NAS essentially expands the solution space in the quest for fair AI. It may find, for example, that adding an auxiliary branch in the network that predicts a protected attribute and then subtracts its influence (an idea akin to adversarial debiasing) is beneficial – effectively re-discovering a known strategy, or perhaps an entirely new architectural motif that reduces bias. From a policy perspective, the use of NAS could also help rebut the notion that fairness always entails a huge cost to accuracy: by exhaustively searching, we can identify if there are “free lunches” – architectures that improve fairness with minimal accuracy penalty. The existence of such Pareto improvements, as demonstrated in some cases[14], is encouraging.

To ground this discussion, consider a simple mathematical framing of a NAS objective we might use: Suppose our task is binary classification (e.g., approving or denying a loan) and we have two demographic groups A and B in the data. We could define a utility function for NAS as:

$$ U(\text{architecture}) = \text{Accuracy}(\text{arch}) \;-\; \lambda \cdot |\text{FPR}_A(\text{arch}) - \text{FPR}_B(\text{arch})|, $$

where $|\text{FPR}_A - \text{FPR}_B|$ is the absolute difference in false positive rates between group A and B for the model of that architecture, and $\lambda$ is a tunable weight determining how strongly we care about fairness relative to accuracy. NAS will then seek to maximize $U$. A large $\lambda$ enforces near-equal FPRs (difference principle flavor) potentially at some accuracy cost; a small $\lambda$ prioritizes accuracy with only slight preference for fairness. By adjusting $\lambda$ or scanning across values, we could produce a family of architectures from which to choose. This is a simplified example – real setups might include multiple fairness metrics or constraints (e.g., ensure equal opportunity and also that neither group’s acceptance rate falls below a threshold). In multi-objective NAS, one often doesn’t combine them into one scalar but instead identifies the Pareto frontier of trade-offs. But conceptually, one can see how an ethical requirement becomes just another term in an optimization objective. What makes this powerful in NAS is that the search is over architectures and implicitly over learned parameters, so it might find an architecture that naturally balances these terms well for the given data distribution.

One might wonder: could we not just train a single architecture with a fairness-regularized objective (like the one above) and get a fair model? Why bother searching architectures too? Indeed, training with fairness constraints (a form of in-processing) is a known approach. The difference is, a fixed architecture might have inherent limitations in achieving the best trade-off. NAS can adjust the model capacity, representation, and processing in tandem with those constraints. It may allocate more resources (neurons) to harder-to-fit populations, or find that a certain layer type alleviates bias. It’s like giving the learning algorithm more degrees of freedom to solve a constrained problem. To use an analogy: if we only adjust the recipe (training procedure) but keep the pan the same (architecture), we might be limiting ourselves; NAS allows changing the pan shape too, perhaps yielding a more evenly cooked outcome.

State of the art and feasibility: By 2025, NAS is becoming more accessible. There are open-source NAS frameworks and even cloud AutoML services that perform architecture search. Ensuring that NAS itself doesn’t become a “black box” is important – but researchers have worked on interpretable NAS and simplifying searched models for understanding. The computational cost, while still significant for very large tasks, has dropped with techniques like one-shot NAS (which trains a single “supernet” that encompasses all architectures and can estimate any sub-network’s performance quickly). This means incorporating fairness objectives is quite feasible; one just needs a way to measure fairness during search, which entails computing metrics on validation data for each candidate model. That adds overhead, but not prohibitive – essentially, evaluate not just accuracy but also group statistics for each model. With modern parallel computing, NAS can evaluate many models simultaneously, and fairness metrics computation is trivial compared to a full model training cycle.

In sum, NAS offers a powerful method to search the solution space of AI models for ones that meet multi-faceted criteria. It shifts some burden from human intuition to computational exploration. Our thesis is that this paradigm is exactly what we need to integrate complex ethical principles like Rawls’s into AI: we articulate the principles as measurable criteria and let NAS discover the best ways to fulfill them under the hood. This approach stands in contrast to ad-hoc fixes or overly simplistic rules (like “just remove sensitive attributes”), striving instead for an optimized, principle-guided design.

Having established both the philosophical underpinnings of fairness (impartial justice) and the technical means (NAS) at our disposal, we can now delve into the core argument: how combining NAS with Rawlsian theory ensures AI models act ethically toward human beings, with actions that would be considered ethical by any objectively situated observer.

Argument

Rawlsian Justice as a Design Specification for AI

We begin by translating Rawls’s abstract principles into concrete design goals for AI behavior. The question is: What would it mean for an AI system’s decisions to satisfy Rawls’s principles of justice? If we can answer that, we have the ethical target that our NAS procedure should aim for.

1. Equal Basic Liberties: Rawls’s first principle mandates that each person enjoy equal fundamental rights and liberties, such as freedom of speech, thought, property, and due process, consistent with everyone else having the same. In the AI context, this principle translates to a constraint: AI systems should not infringe on or discriminate in ways that violate individuals’ basic rights. For example, an AI content moderation system must respect freedom of expression and not silence voices arbitrarily or based on viewpoint discrimination (within the bounds of law). A criminal justice AI must respect due process – meaning its risk scores or recommendations shouldn’t effectively deny someone their liberty without a fair and transparent rationale. In practical design terms, this could mean that certain kinds of decisions are off-limits to automate (e.g., decisions that would violate privacy or bodily autonomy) or that any automated decision affecting such rights must have robust human oversight and appeal mechanisms. It also implies non-discrimination: traits like race, ethnicity, religion, gender, etc., which are tied to fundamental liberty and equality, should not be the basis for disadvantageous treatment by the AI. This principle is partly why fairness in AI is demanded – discriminatory AI can impair people’s ability to participate in society equally (consider job or credit denial systematically against a group – it curtails liberty in a broad sense). For NAS, ensuring this principle might involve hard constraints: for instance, forbidding the use of certain features (like race explicitly), or requiring that the model’s decisions pass tests for disparate impact within a tolerance. In NAS terms, we could set a constraint that any candidate architecture that yields more than X% disparity in, say, loan approval rates between groups is disqualified (not further optimized). By pruning the search space of architectures that can’t achieve near-equal treatment, we enforce a baseline of respect for equal rights. This echoes the idea of algorithmic impact assessments some laws propose – ensuring AI doesn’t violate rights. Rawls gave lexical priority to this principle (it comes before considerations of socio-economic inequality), suggesting our design must satisfy this first before tuning other aspects like performance or utility. Concretely, a hiring AI, no matter how efficient, would be unacceptable if it systematically violated civil rights laws by excluding qualified candidates of a certain race; our NAS search would treat such bias as a showstopper.

2. Fair Equality of Opportunity: The first part of Rawls’s second principle says that offices and positions should be open to all under conditions of fair opportunity. In algorithmic terms, AI systems that allocate opportunities (jobs, education admissions, loans, etc.) should strive to give individuals an equal chance to be selected based on merit or relevant criteria, not skewed by arbitrary factors or unequal access. Fair equality of opportunity is stronger than formal equality (which would just say “no explicit discrimination”); it implies that efforts may be needed to counteract historical or systemic disadvantages. For instance, if an academic admissions algorithm only looks at SAT scores, but one group had worse access to test prep and thus lower scores, the outcome could be formally unbiased (it didn’t consider race) but still perpetuate inequality. Fair equality might require considering the context – maybe using a score that’s adjusted for socioeconomic background or providing alternate evaluation methods. In AI design, this could lead to approaches like calibrating scores for fairness or including fairness-aware features. For NAS, one approach is to incorporate an “equality of opportunity” metric – e.g., the true positive rate (TPR) for qualified individuals in each group should be equal[55]. If our AI is picking people for a promotion, fairness demands that among those truly qualified, the algorithm picks candidates at equal rates across demographics. In classification, this is sometimes called equal opportunity fairness (requiring $\text{TPR}_A = \text{TPR}_B$)[55]. We can embed this into the NAS objective or constraints. For example, if group A’s qualified-selection rate is 80% and group B’s is 60%, that’s a violation – NAS will be driven to find an architecture that narrows that gap (perhaps by better feature learning that identifies talent in group B that the initial model missed due to bias patterns). Another concept here is talent-neutrality: the AI’s ideal should be to evaluate individuals on their true qualifications without being clouded by stereotypes or spurious correlations. Rawls also spoke of preventing “careers open to talents” from being merely formal – society should genuinely enable all to compete. For AI, this could extend beyond the algorithm to the data: ensuring the training data is representative and not over-penalizing those from disadvantaged backgrounds. While NAS can’t change the data, it can encourage architectures that are robust to data biases (for instance, models that rely less on one problematic feature). In summary, fair opportunity in AI means designing systems that identify merit fairly and do not amplify existing inequities – a criterion we enforce through fairness metrics related to error rates and selections.

3. Difference Principle (Maximin Fairness): The second part of Rawls’s second principle, the difference principle, is perhaps the most distinctive. It permits inequalities only if they benefit the least advantaged. How would an AI’s decisions satisfy this? It suggests a form of maximin fairness in outcomes: we should evaluate how the worst-off group fares under the AI, and prefer decision rules that improve their situation. In algorithmic fairness literature, this aligns with the idea of minimizing the disadvantage of the worst-affected group (sometimes called Rawlsian fairness). One way to implement this is through a welfare function: assign a utility value to outcomes for individuals, perhaps related to well-being or benefit, and then maximize the minimum expected utility across groups. For instance, consider an AI allocating healthcare resources. A utilitarian AI might allocate where it gets the biggest health improvement per dollar (which might favor already healthy or wealthy populations with better adherence); a Rawlsian AI would ask, which allocation improves the health of the sickest or most deprived group as much as possible[56]. As a metric, one might use something like the lowest group accuracy or benefit and maximize that. In classification terms, maybe ensure that the false negative rate (missed opportunities) for the most disadvantaged group is minimized above all – even if it means some loss of accuracy elsewhere. A concrete NAS objective could be: maximize overall accuracy subject to the constraint that the least advantaged group’s accuracy is above some threshold or as high as possible. Identifying “least advantaged” in data can be tricky – it could be a demographic or intersectional group known to have faced historical injustice (e.g., minority women in tech hiring). Or if not pre-defined, one could check which group (by some sensitive attribute) has the worst error rate or outcome rate and then try to improve that. This is a more normative choice: it encodes a priority to help those who are currently doing worst under the model. For example, if an AI loan model has approval rates of 80% for group X and 50% for group Y, a difference principle approach would prioritize improving group Y’s rate (assuming group Y is genuinely disadvantaged in relevant ways), rather than just equalizing (which could be done by raising Y or lowering X). Notably, this can conflict with simple parity – Rawls doesn’t say cut down the top to level (what philosophers call “leveling down”); he says raise the bottom. In algorithmic terms, if making things fair means reducing opportunities for a better-off group with no benefit to the worse-off, Rawls would reject that as purely leveling down (which yields no gain in justice)[57]. Instead, one should seek architectures or decision rules that improve the outcomes of the lagging group without unduly harming others, possibly accepting some inequality if it’s the price of that improvement. A practical implication: sometimes strict equalization might lower overall accuracy which could hurt everyone, including the worst-off (e.g., if an algorithm becomes so constrained it’s less accurate in detecting true positives for the disadvantaged group itself). A Rawlsian perspective might prefer a slightly unequal but higher-benefit scenario for the disadvantaged. This nuance can be encoded by carefully choosing the objective (maximin of group utilities rather than absolute parity).

In sum, Rawls’s two principles give us a hierarchy of design goals: First, the AI must respect fundamental rights and not create outcomes that anyone would experience as an intolerable violation of their basic liberties or equal citizenship. Second, it should assure fair access to opportunities and resources, correcting for historical bias to the extent possible (equal opportunity). And third, it should focus on improving conditions for those who are worst off, ensuring that any gain in performance doesn’t come at their expense but rather accrues to their benefit if possible.

Now, how do we operationalize these as objective functions or constraints for NAS?

We have touched on this, but let’s formalize an approach: - Define sensitive attributes relevant to fairness (could be race, gender, etc., depending on context – or multiple attributes for intersectional groups). - For each attribute group (or combination), measure key performance metrics (accuracy, false positive rate, false negative rate, etc., depending on what fairness means in context). - Impose constraints for Principle 1: e.g., the model cannot use certain features (like a blunt constraint of no direct use of protected attribute, which is common practice but not sufficient alone), and the model’s decisions for any individual must pass a disparate treatment test (meaning individuals who are similar in relevant qualifications should have similar outcomes regardless of protected class). That latter is tricky to implement strictly, but can be approximated by requiring parity in error rates as a necessary condition. Also, ensure transparency/human-in-the-loop for rights-critical decisions as a design choice (though NAS doesn’t directly handle human oversight, that’s a structural/process guarantee rather than architecture). - Add fairness objectives for Principle 2: equal opportunity can be targeted by minimizing differences in TPR or FNR between groups. Suppose we pick equalized TPR (so qualified applicant selection is equal). We can define a term in the NAS reward: $- \sum_{g} | \text{TPR}g - \text{TPR}}} |$, essentially penalizing deviations from the mean true positive rate. Or even directly $- \max_{g,g'} | \text{TPRg - \text{TPR}|$ to penalize the worst gap. This pushes the search to architectures that treat groups similarly in finding positives. - Add minimax objectives for Principle 3: define some measure of utility for individuals or groups. In classification tasks, utility might be correlated with true positives (benefits received) minus false positives (harms like being wrongly flagged). We could focus on, say, the true positive rate of the most disadvantaged group and maximize that. Or if in lending, perhaps the approval rate for a group with historically low access to credit (assuming we don’t compromise sound lending too much). Mathematically, a term like $\min_{g \in G} {\text{TPR}_g - \alpha \cdot \text{FPR}_g}$ could be something to maximize (with some weight $\alpha$ if false positives are considered harmful). That effectively tries to lift the weakest-performing group’s balance of errors. Alternatively, if groups have different base rates, one might incorporate base rate in a utility – but that gets into the philosophy of whether we care about outcomes proportional to need or just parity. Rawls’s difference principle doesn’t require equal outcomes, just that whatever inequality exists helps the least advantaged. So one might allow a model that gives group A 80% and group B 70% approval if (a) group B is better off than they would be in any other arrangement (maybe other models gave them 60%), and (b) lowering A to 70% too doesn’t help B (leveling down). This is a scenario where strict parity is relaxed to benefit B more. Such scenarios are complex to evaluate automatically; it may involve simulating different decision thresholds or policies and seeing which yields a Pareto improvement specifically for B without too much cost to A.

An important practical note: to apply Rawls’s difference principle, we need to define who is “least advantaged” in the context of the algorithm. That could be a specific demographic group known to face bias (e.g., a racial minority, or perhaps the intersection of race and gender). Or it could be dynamically determined: e.g., during training, identify which group is getting the worst outcomes and treat that as target for improvement. The latter might get dicey if many fine-grained groups exist – one might accidentally harm another group. Typically, one selects a few salient categories in advance (like protected classes recognized in law).

Another angle is individual fairness vs group fairness. Everything above has been group-oriented (which Rawls’s theory largely is as applied here, though it’s about individuals behind the veil, practically you see effects by group). Individual fairness is the idea that similar individuals (in qualifications) should get similar outcomes. Rawls’s original position yields group-agnostic rules, which usually translate to group fairness constraints. But individual fairness can be complementary: one can attempt to ensure a metric of similarity is used and that the model doesn’t deviate from it. For NAS, implementing individual fairness might mean adding a regularization that penalizes output differences for inputs that are similar in non-sensitive attributes. This could help catch cases where within the same group there’s arbitrary disparity. However, defining a similarity metric is itself a deep problem and often domain-specific.

From Theory to Implementation: Let’s ground this with a hypothetical application example to illustrate how Rawlsian design would differ from a naive design:

Imagine a Hiring AI that screens candidates for a tech job. It uses an archive of past hiring data (resumes and outcomes) to learn what features predict “hireable.” Past bias: perhaps women and minority candidates were hired at lower rates due to bias, and maybe there is a pattern like candidates with certain names or backgrounds got fewer callbacks regardless of qualification. A standard ML model might pick up subtle correlations (like an applicant having participated in a “Women in Computing” club might correlate with fewer hires historically, and thus the model might score such resumes lower – a discriminatory outcome). How do we Rawls-ify this?

Equal basic liberties: No candidate should be summarily rejected due to a protected characteristic. So we instruct NAS that any architecture that directly encodes gender or race in its decision logic is invalid. More subtly, we might impose a constraint that if you swap the gender of a candidate in their resume text (names, pronouns changed) but keep qualifications same, the model’s output shouldn’t drastically change – a kind of counterfactual fairness test during evaluation. Ensuring that might require adversarial training or special layers. NAS could be tasked with including a sub-network that predicts gender from the input and then an adversarial component that minimizes that information in the main prediction (one known approach to enforce invariance[41]). We’d include this in the search: some architectures might have an “adversarial debias” branch and some not, and NAS can choose the one that best satisfies fairness vs accuracy.
Fair opportunity: We want the model to recognize talent equally. So we measure TPR: among truly qualified candidates (which we might approximate by those who got high performance reviews later or something in the data), what fraction did the model select, by gender or race. We aim for parity. If historically women were often overlooked despite equal or better qualifications, our training data might be skewed (maybe fewer women were labeled “hired” not because they lacked skill but bias). The model could still emulate that if unconstrained. Under Rawlsian training, we add a penalty if $\text{TPR}{women} < \text{TPR}$. NAS might find that to fix this, certain features or layers are beneficial (like placing more weight on objective skill indicators rather than patterns correlated with gender). Perhaps it finds a recurrent layer that reads through resume text capturing actual skill keywords outperforms a simpler bag-of-words model that was inadvertently picking up gendered language cues. Essentially, the architecture that better extracts relevant info and ignores irrelevant differences will win out. We also possibly include “calibration” – ensure the score meaning is same across groups (so the probability a candidate with score 0.8 is truly qualified is 80% regardless of group). NAS might discover an architecture with a group-specific calibration layer (like separate output biases per group) that can adjust for uneven base rates while keeping the main model shared – a technique sometimes used to equalize performance.
Difference principle: Say in our context the least advantaged group is Black female candidates (who had the lowest hire rate historically). Rawls would say the model should be chosen to improve their employment prospects as much as possible (without violating the above principles). This might mean our objective specifically tries to maximize accuracy for Black women or at least ensure they are not the ones with highest error. If the data is very imbalanced, NAS might allocate more capacity (neurons or specialized features) to handle that group’s resumes effectively (like ensuring the model doesn’t undersample patterns common in their resumes). If needed, it might even learn a positive bias – maybe the threshold for an interview is slightly lowered for this group if that yields more hires of qualified individuals among them (this resembles affirmative action in algorithmic form, which is controversial legally, but Rawlsian ethics might endorse it if it’s to compensate for inequalities in education or opportunity). A concrete approach: include a term maximizing the recall (true positive rate) for Black female candidates specifically. If the model can raise that without too much cost to overall performance, it will. NAS could find architectures that cater to features prevalent in that group (for example, an architecture might have a component that recognizes alternate forms of experience or education that typical models ignore but are more common in historically marginalized groups’ resumes).

It is crucial to stress that applying Rawls does not mean we simply impose quotas or superficial parity. It means we design the decision process to be justifiable to all – including those who might be on the losing end of a decision. If an AI rejects someone’s loan application, can we say that this rule would be accepted by that person if roles were randomly assigned behind a veil? If the rule was “we reject applicants below credit score X, regardless of who they are,” that might be acceptable if credit score is a fair measure everyone had equal chance to build. But if the rule was “we implicitly favor those from certain neighborhoods because our data had bias,” that would not be acceptable behind the veil. Rawlsian design pushes us to formulate rules that we could defend in front of anyone who is affected, as if we didn’t know who in that audience we or they are.

Objectivity by Anyone Impacted: Let’s directly connect to the phrasing in the prompt: an AI’s actions would be considered ethical if viewed objectively by anyone impacted by the act. That essentially describes Rawls’s original position reasoning – any stakeholder, not knowing if they’ll be the benefited or harmed party, would see the decision as fair. For our AI, this means no one would say “if I were on the other side of this decision, I’d see it as illegitimate.” For instance, in a fair hiring AI, a selected candidate trusts the process was fair (and doesn’t feel it was tilted in their favor unjustly), and a rejected candidate, while disappointed, can acknowledge that the criteria were job-related and applied equally (rather than suspecting bias or arbitrary exclusion). Achieving that level of legitimacy likely requires both transparency (people know what the criteria are) and fair criteria (the criteria themselves align with ethical principles).

Transparency and explainability are worth a brief note here. A law review audience would appreciate that a black-box AI, even if statistically fair, might not satisfy procedural justice – people care not just about outcomes but that they are treated with respect and can understand decisions. While our focus is Rawls’s distributive justice, one could argue a Rawlsian might also want the “rules of the game” (the algorithm’s logic) to be publicly known or at least knowable, akin to public principles of justice. This resonates with modern AI ethics demands for explainable AI (XAI). NAS can also incorporate transparency as a factor – e.g., preferring simpler architectures or ones that allow attribution (some researchers include a term for model size or layer sparsity to implicitly prefer simpler models[44]). However, simpler might conflict with fairness; sometimes a more complex model can de-bias data better. This again is a trade-off that requires judgment. For this article’s scope, we assume the primary measure of ethical success is fairness of outcomes, but a full deployment should consider explainability too. Later in Policy, we will mention the need for transparency as part of governance.

In conclusion of this sub-section: Rawls’s theory provides a multi-faceted specification for ethical AI behavior. It demands respect for rights (non-discrimination, due process), fairness in opportunity (avoiding systematic advantage for some over others in equivalent positions), and a focus on uplifting those who would otherwise fare worst. These translate into measurable criteria like equality of error rates, constraints against use of protected info, and targeted improvements for disadvantaged groups. By treating these as explicit objectives/constraints, we set the stage for using NAS to fulfill them. The Rawlsian approach distinguishes itself from other ethical frameworks by its commitment to impartial evaluation (the veil of ignorance test) and prioritization of the least advantaged rather than just aggregate welfare. In practice, this likely leads to more aggressive measures to correct bias than, say, a pure utilitarian trade-off would (a utilitarian might accept some disparity if overall accuracy is high, whereas Rawls might sacrifice some overall accuracy to ensure the worst-off group isn’t left behind – a pattern we’ll see in experiments and presumably adopt in design).

Neural Architecture Search Meets Rawls: Building Fairness into the Model

With the ethical target defined, we turn to the engineering strategy: using Neural Architecture Search to meet that target. We argue that NAS is a uniquely well-suited approach to align complex ethical objectives with model training for several reasons: it provides flexibility in finding solutions, it can handle multi-objective optimization naturally, and it reduces human bias in model design by exploring a vast space (which is ironically analogous to removing bias by going behind the veil – the algorithm doesn’t carry the same prejudices, it just follows the objectives). We now detail how the NAS process can be configured to integrate Rawlsian criteria, and why this approach is more powerful than alternative methods (or at least a powerful complement).

Formulating the Multi-Objective Optimization: In a Rawlsian NAS, we have at least two (often conflicting) goals: maximizing task performance (accuracy, utility) and maximizing fairness (as encoded by the measures from Rawls’s principles). We set this up as a multi-objective optimization problem. There are a couple of ways to do this in practice: - Use a weighted sum approach: combine the objectives into one scalar reward as I earlier gave an example. For instance, $Reward = Accuracy - \lambda_1 (\text{Disparity in TPR}) - \lambda_2 (\text{Disparity in FPR})$, etc. The weights $\lambda$ reflect how much fairness we demand relative to accuracy. - Use a constraint approach: treat accuracy as the primary objective but add constraints like Disparity < $\delta$. This is like a lexicographic optimization (satisfy fairness to a threshold, then maximize accuracy). - Use Pareto optimization: have NAS population-based and seek architectures that form the Pareto frontier of (Accuracy, FairnessScore), then a human or policy picks from that frontier based on acceptable trade-off.

A Rawlsian approach might lean towards constraint or lexicographic – recall Rawls said liberty is lexically prior to difference principle; similarly, one might say “ensure no severe unfairness, then maximize accuracy.” For instance, we might require that the model’s false positive and false negative rates differ by at most 5 percentage points between any groups (a policy choice), and then within that feasible set, pick the most accurate model. If no model meets that, perhaps relax constraints slightly until at least one does (reflecting how strict we can realistically be given data).

NAS can be adapted to either method. Many NAS algorithms are naturally suited to a single reward, so sometimes weighted sum is easier to implement (though one must try multiple weights to map out the trade-off curve). There are also evolutionary multi-objective NAS frameworks (e.g., NSGA-II algorithm can find Pareto-optimal sets of architectures).

Search Space Considerations: We might adjust the search space to include fairness-enhancing building blocks: - For example, allow architectures that have adversarial branches: a common trick in fair ML is to have the network simultaneously try to predict the target and not predict the protected attribute (by an adversary). We can include in the search space a module: “Adversary on Attribute X (yes/no)”. The NAS can choose to include an adversarial debiasing layer if it helps the multi-objective. If the data has a strong bias, likely including the adversary will improve fairness a lot at some cost to accuracy, and NAS might find an optimal balance. - Allow group-specific layers: e.g., a layer that processes inputs differently depending on a sensitive group (this might seem counterintuitive to fairness, but think of it like reasonable accommodation – perhaps an extra calibration for one group’s data). For instance, the search might have an option to include a branch that only triggers for group G’s inputs to adjust output score (like adding a constant or a different feature extractor). If the data suffers from underfitting one group, NAS might allocate more capacity (like one more layer) for that group. This is analogous to in human-designed models where you sometimes cluster by group and apply different strategies (which is done carefully to not violate fairness – but if it’s used to improve that group’s accuracy without harming others, it can be equitable). - Various layer types and regularizations that are known to affect generalization: e.g., NAS might decide to use more dropout (noise during training) which sometimes helps reduce overfitting to biases, or use batch normalization that can inadvertently create issues if different groups have different distributions. A Rawlsian NAS might discover that applying batch norm separately per subgroup yields fairer outcomes (because each group’s feature scaling is handled independently) – that’s something a designer might not consider a priori, but NAS could find if allowed. - Feature selection within architecture: NAS can implicitly choose which input features to emphasize by architecture connectivity. In fairness context, sometimes removing a problematic feature is necessary. If a feature (like ZIP code) is proxy for race and introduces bias, an architecture that down-weights or ignores that feature might be favored by the fair objective. NAS could effectively do feature selection by not routing that input into subsequent layers if it’s detrimental (some NAS search spaces include skip connections from input features or not). Over many trials, it might notice models that exclude ZIP code do better on fairness and only slightly worse on accuracy, so those get higher reward and propagate. - Loss function or output adjustments: The architecture search might also consider different loss functions or training regimes (some NAS definitions include hyperparameter search, e.g. learning rate, loss type). For fairness, maybe a custom loss like weighted loss for underrepresented groups, or a cost that is higher for misclassifying a disadvantaged group member, could be part of hyperparameter search. Actually, hyperparameter optimization (HPO) combined with NAS can tune such things. The NeurIPS 2022 paper indicated they did NAS+HPO jointly for fairness[58][59], which is wise because just architecture might not suffice, sometimes adjusting the training process (like re-weighting examples) is needed. The architecture and training are interlinked in the final outcome.

Why NAS instead of just training a single model with fairness constraints? We hinted at this: because the space of solutions (architectures and parameter combinations) that satisfy fairness is large and non-convex. A given architecture might not be able to achieve fairness no matter how you train it (for example, a small network might lack the capacity to represent a fair decision boundary if the data bias is complex). Or one architecture might inherently encode a bias (e.g., a certain arrangement might always use a feature in a certain way). NAS can find an architecture that better suits the fairness-constrained optimum. It’s akin to giving the model a chance to restructure itself to do a better job at fair classification than a fixed architecture possibly could.

One intuitive example: suppose distinguishing qualified vs unqualified applicants is easier for men than women in the data because women’s qualifications are signaled in more varied ways (perhaps due to different typical backgrounds). A standard model might tune to the majority (men) and do poorly on women, a phenomenon known as “distributional bias”. A specialized architecture might, say, have some attention mechanism that picks up multiple types of signals that help with women’s resumes. NAS could find that if it’s allowed to try an attention layer vs a simpler dense layer. The fairness objective indirectly pushes it to find something that works for both distributions. A human designer might not realize the need for that extra complexity or might not know how to design it exactly; NAS explores options.

Performance vs Fairness trade-off and Rawls’s stance: We must acknowledge that requiring fairness can sometimes reduce raw accuracy or utility. For example, if the historical data suggests a correlation that is “useful” for prediction but is unfair (like zip code correlating with loan default but redlining concerns make using it unfair), enforcing fairness might drop accuracy a bit. Rawls would say that’s acceptable if it protects rights or improves the situation of the disadvantaged. In implementing NAS, this means we might intentionally sacrifice some accuracy to gain fairness – i.e., set the $\lambda$ weight such that a small accuracy drop is worth a large fairness gain. The result might be a model that is, say, 2% less accurate overall than the unconstrained model, but has no disparate impact. In a Rawlsian calculus, that’s a better model because those 2% likely correspond to not “exploiting” a bias that harmed some group. Many companies and regulators already accept a bit of accuracy loss as the cost of fairness (as long as performance remains within acceptable range for use).

Interestingly, sometimes improving fairness can even improve overall performance – especially if biases were causing the model to under-utilize some predictive signals. For example, a model might lazily rely on a correlating feature and ignore more relevant complex features, leading to suboptimal predictions for everyone. By forcing it not to use that lazy route (like a bias trigger), it may learn a deeper, more general pattern that increases accuracy across the board. Researchers have observed cases where bias mitigation improved accuracy on minority groups and didn’t hurt (or even helped) majority groups – a win-win[60]. That’s an ideal scenario that NAS could discover (like a better architecture that actually generalizes better once not overfitting bias). The quick review notes found that applying bias mitigation to a NAS-found architecture improved fairness and presumably kept good accuracy[53] – meaning there’s synergy.

Verification and Testing of Fairness: After NAS produces candidate models (often it will produce one or a few top models), we must rigorously test them on a separate test set or through simulation to ensure they indeed meet fairness requirements not just on training/validation but truly in practice. One concern in machine learning is overfitting – might a model overfit to fairness metrics on validation? Possibly, if it specifically optimized to those groups and the distribution shifts, some fairness could degrade. We should test the selected model on fresh data or via cross-validation for stability. Additionally, we could perform stress tests: e.g., the counterfactual check – flip a sensitive attribute on a test input and see if the decision changes. For high-stakes uses, regulators might demand such evidence. A Rawlsian model ideally passes these tests because it was designed to not depend on those attributes. (The Ethics Unwrapped note effectively says veil of ignorance helps us consider rules more objectively[7]; we can simulate a veil by randomly shuffling group labels in test data – a fair model should perform similarly under that shuffle, since if labels are random, a fair model shouldn’t care; an unfair model’s performance would drop if group labels are essential to its decision logic.)

Continuous Improvement: Even after deploying a Rawlsian NAS-designed model, monitoring is key. If downstream outcomes reveal new biases (maybe a particular subgroup not initially considered is being treated unfairly), one should adapt. The Rawlsian approach is dynamic: if some group is suffering unexpectedly, they effectively become “least advantaged” in this context, and we should consider adjusting. This suggests a feedback process where model and objectives are updated as we learn more about impacts. In policy terms, it’s about iterative refinement – which we will propose later (e.g., periodic fairness audits and model updates using NAS with updated objectives if needed).

Comparison to other methods: To highlight the novelty, consider some alternatives: - Manual model design + post-processing: One might train a model and then adjust its threshold per group to equalize outcomes (a method known as “group calibration” or “balance adjustment”). This can achieve fairness metrics but is somewhat ad-hoc and can cause efficiency loss (maybe rejecting some well-qualified majority applicants to equalize rates). Rawls might prefer optimizing the system in whole rather than such blunt adjustments – NAS tries to build a model that inherently treats groups fairly so you don’t need heavy post-hoc tweaks. That likely leads to better individual fairness and consistency. - Pre-processing fixes: One might try to “repair” the dataset to remove biases (like re-weighting or generating synthetic data for minority groups). This is useful and could be used alongside NAS. But pre-processing can’t easily fix all issues and might distort real patterns. NAS working directly on raw data (with objectives) might find a more nuanced solution (like focusing on relevant features for each group) than simply oversampling or reweighting, though those can be incorporated in training. - Other ethical frameworks: For contrast, a utilitarian-trained AI might just maximize overall accuracy or profit, possibly subject to a mild constraint to not break the law. That could lead to biased outcomes if that improves accuracy (e.g., targeting service to lucrative demographics only). A strict equalitarian approach might enforce parity even when it harms all (leveling down scenario) – e.g., an AI that randomly selects to ensure equal success rates might ignore merit significantly. Rawls gives a balanced approach: not ignoring utility (we still want effective AI) and not blindly equalizing at all costs, but focusing on lifting the underprivileged in a rational way. We believe this yields decisions that are more robustly justifiable – which in turn can foster public trust in AI. A community is more likely to accept AI decisions if they see that the system is designed to be fair and even particularly attentive to not harming the vulnerable.

To crystallize the argument: NAS empowered by Rawls’s principles can find AI models that are objectively fair (in the sense any stakeholder would accept), by exploring vast design possibilities and optimizing for both performance and fairness. This method reduces the need for trial-and-error by human designers who might unconsciously embed their own biases or miss creative solutions. It also provides a quantitative way to implement ethical goals that otherwise remain abstract. The synergy is such that Rawls provides the destination (ethical justice), and NAS provides the vehicle to get there by traversing the solution space of model designs.

Case Study Examples: Rawlsian NAS in Action

It may help to illustrate the preceding concepts with one or two hypothetical (or based on real research) case studies, showing how a Rawlsian NAS approach would solve a particular ethical AI problem. We draw on known examples to make this concrete:

Case Study 1: Fair Facial Recognition (Reducing Demographic Bias).Problem: A facial recognition system has high accuracy on lighter-skinned male faces and poor accuracy on darker-skinned female faces (a documented real issue[32]). This bias is dangerous if used in law enforcement (leading to false accusations against minority women) and unjust in any use.Conventional Attempt: Collect more diverse data and fine-tune the model; use a bias mitigation layer or adjust threshold for the poorer group. It might improve somewhat but core architecture might still be optimized for the majority.Rawlsian NAS Solution: Define two groups: (A) light male (the historically advantaged in this context), (B) dark female (the least advantaged in performance outcomes). Rawlsian objective: maintain high overall identification accuracy while maximizing the true match rate for group B, subject to not overly increasing false matches for B either. In practice, they likely enforce equalized false negative rates between A and B (so B’s recognition recall goes up). NAS search space: various CNN backbones, with options for e.g. adding more convolution filters for certain complexion ranges (if such a thing can be learned), or an adversarial head that tries to remove gender/race info from embeddings. After searching, suppose NAS finds an architecture that includes a specialized feature extractor that focuses on skin-tone invariant features (like shapes of facial features rather than color contrasts) plus an adversarial module that ensures the face embedding can’t easily tell race or gender. This architecture might not have been obvious – maybe it uses an innovative layer that normalizes lighting differences (benefiting darker skin detection) which previous models lacked. NAS picks it because it achieved, say, 95% accuracy for both A and B on validation, whereas the baseline had 98% for A but 80% for B. In deployment, this model significantly reduces the disparity: any person, regardless of skin tone or gender, has an equal and low chance of false non-recognition or misidentification. Objectively, those being surveilled or recognized can feel more confident the system isn’t stacked against them. This addresses Rawls’s fairness (least advantaged group saw a huge benefit, with minimal cost to the rest). Indeed, the quick review of a similar study reported architectures discovered that “designing inherently fair architectures is more effective than post-hoc mitigation”[14] – presumably our case study’s result is in line with that.

Reflection: Behind a veil of ignorance, one would prefer this Rawlsian fair face-recognition model over the biased one: you would not risk being in the disadvantaged group with the biased model, so you’d choose the model that treats all groups equally. NAS helped find that model by exploring architectures humans hadn’t tried (like non-traditional layers focusing on invariances).

Case Study 2: Fair Credit Lending (Maximizing Benefit to Least Advantaged).Problem: An AI credit scoring system is used to approve personal loans. Historically, it’s harder for people in low-income neighborhoods (often minorities) to get approved, partly due to lower credit scores – which in turn partly reflect historical lack of access to credit (a vicious cycle). A pure utilitarian AI might mostly approve those with high credit scores (who tend to be better-off) and deny those with low scores, maximizing bank profit but perpetuating inequality.Rawlsian Approach: Define the “least advantaged” here as applicants from historically redlined neighborhoods with decent but not great financials. A Rawlsian lens might say: if we can extend credit to some of these individuals in a responsible way, we should, even if their scores are a bit lower – provided this doesn’t threaten the bank’s stability. So how to formalize? Perhaps we set a constraint: the approval rate for the lowest-income quartile should be maximized as long as default risk remains within acceptable bounds. Or we incorporate a welfare notion: giving a loan to a creditworthy poor applicant has more social value than to an equally creditworthy rich applicant (since the poor one benefits more from access – this is outside typical ML, but Rawls might argue for it in principles of justice context, akin to difference principle favoring helping the least well-off). We could encode that by weighting errors: false negative (denying a deserving applicant) in a poor group is more costly than false negative in a rich group. So the objective might be: minimize weighted error, where weight is higher for errors on disadvantaged. NAS then searches for a model architecture that, for example, incorporates not just traditional credit score but also more nuanced indicators of reliability that might appear in alternative data (like rent payment history, etc.) that correlate with good payers among the poor. Maybe it finds an architecture that merges credit bureau data with other public records effectively. The resulting model might approve, say, 20% more loans in poor areas while keeping overall default rates about the same (perhaps by more smartly evaluating risk). That means the pie of credit is more fairly distributed, improving the situation of the previously least served population. According to Rawls, this is a more just outcome because those who were least advantaged (credit-starved communities) are better off and no one is worse off in absolute terms (the bank might even discover new customers).Check: Would anyone object behind veil? Likely not: everyone would accept criteria that doesn’t unduly punish someone for living in a poor area if they show other signs of creditworthiness. The improved model might have a structure, say an ensemble architecture discovered via NAS, that human loan officers wouldn’t normally conceive, blending traditional and non-traditional features.

This second case highlights the difference principle: proactively improving outcomes for a disadvantaged group. It’s more controversial perhaps – some might say “isn’t that affirmative action for credit?” In a Rawlsian justification, if those loans are sustainable, it’s simply correcting for an unfair disadvantage (lack of collateral or co-signer that others have due to historical wealth gaps). Law might permit some of this under special community lending programs. A fully automated credit AI likely wouldn’t do this unless told to – it would just follow data correlations. Rawlsian NAS instructs it to consider fairness as well.

Technical Depth – Mathematical Example: To showcase the depth, we could include a brief mathematical formulation already touched upon: Suppose we have groups $G \in {1,\dots,k}$ (e.g., k demographic groups). Let $f_\theta(x)$ be the model (with parameters $\theta$ and architecture defined by NAS choices). Let the task be classification with true label $y$. We can define a loss function: $$ \mathcal{L}(\theta) = \mathbb{E}{(x,y,g)}[\ell(f\theta(x),y)] + \sum_{g=1}^k \alpha_g \Delta_g(\theta). $$ Here, $\ell$ is standard loss (say cross-entropy), and $\Delta_g(\theta)$ is a penalty for unfairness regarding group $g$. For example, $\Delta_g$ could be the squared difference between $f$’s false positive rate on group $g$ and the overall false positive rate, or between group $g$ and a target value. $\alpha_g$ are weights (with higher weight for groups that need more attention – difference principle might set $\alpha$ higher for worst-off group). If we specifically want to maximize the minimum true positive rate, we could encode that via a differentiable approximation or use a min-operator outside the gradient (NAS with evolutionary strategies could handle non-differentiable objectives by treating it as reward). NAS then tries to minimize $\mathcal{L}$ by varying architecture $\mathcal{A}$ and weights $\theta$. Actually, NAS typically has an outer loop for architecture and inner loop for training $\theta$. We could incorporate fairness into the outer loop score: e.g., after training each candidate architecture to minimize cross-entropy, we evaluate fairness metrics on a validation set and compute a score like $$ Score(\mathcal{A}) = Acc_{\text{val}}(\mathcal{A}) - \lambda_1 \max_{g,g'}|\text{FPR}g - \text{FPR}}| - \lambda_2 \max_{g,g'}|\text{TPRg - \text{TPR}|. $$ NAS aims to maximize this Score by picking $\mathcal{A}$. This captures equalized odds (both FPR and TPR differences penalized). If $\lambda$ are very large, only architectures with nearly equal rates survive. If too small, we mostly pick by accuracy. The right balance ensures we search in the “fair models” region of space.

Interestingly, one might integrate fairness into training itself via gradient-based NAS (some recent differentiable NAS allow architecture parameters to be continuous and learned). In that case, fairness constraints could even be applied during training so that architecture gradients encourage fairness. That’s advanced but on the horizon.

Robustness of Solutions: One important argument is that a model intentionally optimized for fairness is likely more stable legally and socially. If an AI is challenged in court for discrimination, having a demonstrably fairness-optimized design is a strong defense (provided it’s consistent with human rights laws, which often allow proactive fairness measures in algorithmic design, especially if race is not explicitly used or if used in a justified way such as to eliminate bias). Also, from a business perspective, fair AI avoids reputational damage and regulatory fines.

Addressing Potential Critiques: We should foresee some counterarguments even in the argument section: - “Isn’t this just algorithmic affirmative action that could backfire?” Rawlsian response: It’s not about giving unqualified individuals a free pass; it’s about ensuring the truly qualified are not overlooked due to irrelevant biases and that the benefits of the system reach those who need them most. We are optimizing for fairness consistent with maintaining effectiveness. Affirmative action in Rawls’s view can be seen as a temporary measure to correct inequality of opportunity – similarly, our model corrects for historical bias in data to give everyone a fair shot. - “What if data doesn’t allow fairness? (Trade-off is too steep)” – Then perhaps the task itself is structured unjustly. But technologically, if fairness constraints make accuracy unacceptably low, one might decide AI is not ready to be used in that domain without improvements. That’s a policy decision: better to have no AI than a high-accuracy but biased AI in high stakes? Or perhaps partial use with human oversight. Rawlsian design helps reveal these conflicts clearly so decision-makers can act. For instance, if NAS cannot find any architecture that is fair and accurate enough, that is a sign the problem might be in data or context – like maybe the input features themselves carry societal bias that cannot be circumvented. The solution might then be to gather better data or redesign the application.

· “Computational cost and complexity:” indeed, NAS is heavy. However, the cost is mostly during development. Once a fair architecture is found, deploying it is same as any model. Given the importance of fairness, investing computation in search is warranted (plus computing cost is cheaper than social cost of unfair AI incidents). Also, as noted, new NAS methods are more efficient. We can cite how one NAS fairness study did large-scale but within academic cluster means[12].

Alright, with argument articulated, we will next cover counterarguments more systematically (critiques, limitations) and then policy proposals.

[1] AI Fairness: Designing Equal Opportunity Algorithms - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials

https://freecomputerbooks.com/AI-Fairness-Designing-Equal-Opportunity-Algorithms.html

[2] [3] [19] [20] [22] Justice and Fairness - Markkula Center for Applied Ethics

https://www.scu.edu/ethics/ethics-resources/ethical-decision-making/justice-and-fairness/

[4] Golden Rule - Wikipedia

https://en.wikipedia.org/wiki/Golden_Rule

[5] [24] [25] Kant's Categorical Imperative and the Ethics of AI: A Transcendental Approach - CyberNative.AI: Social Network & Community

https://cybernative.ai/t/kants-categorical-imperative-and-the-ethics-of-ai-a-transcendental-approach/12786

[6] (PDF) Volume 7 (2024) Artificial Intelligence and Responsibility

https://www.researchgate.net/publication/387313449_Volume_7_2024_Artificial_Intelligence_and_Responsibility

[7] [9] [56] Veil of Ignorance - Ethics Unwrapped

https://ethicsunwrapped.utexas.edu/glossary/veil-of-ignorance

[8] [17] [26] [27] [28] [57] Reconstructing AI Ethics Principles: Rawlsian Ethics of Artificial Intelligence - PMC

https://pmc.ncbi.nlm.nih.gov/articles/PMC11464555/

[10] [11] [12] [13] [14] [32] [33] [38] [39] [40] [41] [50] [53] [55] [58] [59] [60] [Quick Review] Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition

https://liner.com/review/rethinking-bias-mitigation-fairer-architectures-make-for-fairer-face-recognition

[15] [42] [43] US Expands Artificial Intelligence Guidance with NIST AI Risk Management Framework - cyber/data/privacy insights

https://cdp.cooley.com/us-expands-artificial-intelligence-guidance-with-nist-ai-risk-management-framework/

[16] Recital 27 | EU Artificial Intelligence Act

https://artificialintelligenceact.eu/recital/27/

[18] [PDF] Pioneering new approaches to verifying the fairness of AI models

https://www.turing.ac.uk/sites/default/files/2023-06/pioneering_new_approaches_to_verifying_the_fairness_of_ai_models_0.pdf

[21] Aristotle's quote reflects his belief that justice does not mean treating ...

https://www.facebook.com/edmundoseikuffour/posts/aristotles-quote-reflects-his-belief-that-justice-does-not-mean-treating-everyth/32633132429664817/

[23] Benevolence Beyond Code: Rethinking AI through Confucian Ethics

https://3quarksdaily.com/3quarksdaily/2025/04/benevolence-beyond-code-rethinking-ai-through-confucian-ethics.html

[29] [30] [31] [34] [35] [36] [37] “AI Fairness: Designing Equal Opportunity Algorithms” with Professor Derek Leben - Bridging the Gaps

https://www.bridgingthegaps.ie/2025/11/ai-fairness-designing-equal-opportunity-algorithms-with-professor-derek-leben/

[44] [45] [46] [47] [48] [49] What Is Neural Architecture Search? | Coursera

https://www.coursera.org/articles/neural-architecture-search

[51] [PDF] 1 Ethical AI Is All You Need Don Hilborn dhilborn@stcl.edu Abstract ...

https://papers.ssrn.com/sol3/Delivery.cfm/4919414.pdf?abstractid=4919414&mirid=1

[52] [PDF] On the importance of architectures and hyperparameters for fairness ...

https://openreview.net/pdf?id=NpuYNxmIHrc

[54] Ethical AI Is All You Need Don Hilborn by Donald Hilborn :: SSRN

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4919414