Towards Just AI Systems: Rethinking Rawlsian Algorithmic Fairness

In this post, I advance 3 ideas. Firstly, I disentangle the two Rawlsian principles that currently animate “fair‐ML” work, signposting an emerging issue with how the difference principle is applied. Secondly, I critique the aggregation premise; the belief that if each model maximises the minimum, society will too; both a logical counter‐example and a probabilistic argument grounded in risk theory reveal that local maximin can lower the global floor. Finally, I consider some counter-responses to further elucidate the issues with using Rawls in this domain and sketch how situational, institutional, and systemic interventions (often outside the optimiser) are needed to secure genuine algorithmic justice; necessary awareness to move AI governance forward with confidence.

As we all know, the pace at which human decisions are being replaced by automated systems has accelerated rapidly in recent years. As Artificial Intelligence is said to become more Agentic in 2025, perhaps we can expect this rate of change to continue increasing. Among the many risks of AI decision-making is unfairness; there are countless examples of both unsophisticated Machine Learning (ML) models from the 2010’s exhibiting gender bias[1] and highly sophisticated Large Language Models developed in the last 2 years reinforcing racial stereotypes[2]. However, our attention towards these concerns have wavered as some more ominous threats animate the horizon for AI Safety. Perhaps it’s worth revisiting.

2. Rawlsian Algorithmic Fairness

2.1 Theory

The application of Rawls’s ‘A Theory of Justice’ (1971) to analysis of algorithmic fairness relies on two principles;

  1. (Equal Basic Liberties): Each person is to have an equal right to the most extensive total system of equal basic liberties compatible with a similar system of liberty for all.

  2. Social and economic inequalities are to be arranged so that they are both:

    1. to the greatest benefit of the least advantaged, consistent with the just savings principle, (Also known as the difference principle or ‘maximin’, short for maximising the minimum) and,

    2. attached to offices and positions open to all under conditions of fair equality of opportunity.

These two principles are lexically ordered, meaning the first principle has priority. If a proposed distribution of goods or opportunities would, for instance, diminish people’s sense of freedom and political equality, then it cannot be justified, even if it would benefit the least advantaged economically. To justify these principles, Rawls invokes the seminal thought experiment known as the original position where rational agents are asked to govern a society behind a veil of ignorance; motivating each party to select rules that maximise the primary goods available to the least advantages groups in society. Rawls argues that such reasoning would produce his two principles.

2.2 Landscape of Rawlsian Algorithmic Fairness

Many proposals in algorithmic fairness gesture toward Rawlsian principles. However, it is crucial to distinguish which of Rawls’s two principles they are implicitly invoking. Unsurprisingly, much of the literature focuses on group parity metrics; equalising error rates, acceptance rates, or outcomes across demographic lines. Framed Rawlsianly, these efforts are best understood as instantiations of the first principle; the guarantee of equal basic liberties and opportunities. This principle enjoys being distributively stable which (as shown in section 2) is a key formal advantage; if it holds in each case, it holds in aggregate. In this respect, the first principle can be reasonably applied across many algorithms attempting to be fair.

The real philosophical difficulty lies not with Rawls’s first principle but with the second, specifically clause 2(a), the difference principle, which requires that any inequality improves the situation of the least advantaged. Recent work offers engineers a technical “maximin” toolkit[3], allowing them to bake this rule into almost any machine‐learning optimiser (see Heidari et al. 2019[4]). Yet, unlike the liberty principle, the difference principle lacks an aggregation property; satisfying it in each isolated model does not guarantee it is satisfied overall. The next section criticises these ambitious efforts. They are still valuable (for they advance the first principle) but they overclaim when they present local maximin constraints as a full realisation of Rawlsian justice in algorithmic fairness.

3. Fallacy of Composition

The hope of the ‘Rawlsian Camp’ (Heidari et al. 2018, Leben 2017, Peng 2020, Shah et al. 2021)[5] is straightforward; if every separate algorithmic decision we design already satisfies Rawls’s difference principle, then the whole society, taken as the sum of those decisions, will satisfy the difference principle too. I call this the aggregation premise. It feels attractive because it follows familiar precedents. We can see how equality plainly aggregates; when every transaction is equal, the sum remains equal. The same goes for property rights. There is a clear intuition that a system is safe so long as each component is safe. Property‐rights theories show the same logic; if every transfer is legitimate, the resulting holdings are legitimate. we’re asking whether fairness in each individual case (constituent situation) is enough to ensure overall fairness; that is, whether it’s sufficient.

Thus, I firstly argue that a strong version of this premise is demonstrably false; secondly, that a weaker, probabilistic version is implausible once we think clearly about risk, covariance and the shape of disadvantage. Notably, both of these arguments are motivated by the fact that Rawls himself never intended the principle to be applied piecemeal in the first place.

3.1 Against Aggregation

Strong aggregation can be stated precisely as, if the difference principle is satisfied in every constituent situation, it is therefore satisfied in the aggregate. Consider two students, Dana and Eli, who will receive stipends from two independent scholarship rounds held a month apart. In each round the committee must pick between an unequal and an equal award schedule.

Round 1:

  • Unequal option X1 : Dana 2 , Eli 5

  • Equal option Y1 : Dana 2 , Eli 2

Dana is the worst‐off in X1 (she gets 2 while Eli gets 5). Because the extra three units to Eli do nothing for Dana, the difference principle prefers the equal schedule Y1.

Round 2:

  • Unequal option X2 : Dana 6 , Eli 2

  • Equal option Y2 : Dana 2 , Eli 2

Here Eli is the worst‐off in X2. The four extra units awarded to Dana do not help him, so the principle again favours the equal schedule Y2. Thus, taken one round at a time, maximin twice instructs the committee to choose the equal distributions Y1 and Y2. Each student would then finish the year with an aggregate of

Y1 + Y2 =(2+2, 2+2)=(4, 4).

Now imagine the committee had instead selected the locally rejected unequal options X1 and X2. The combined outcome would be

X1 + X2 = (2+6, 5+2)=(8, 7).

In this aggregate, the least advantaged student is Eli with 7 units; better than the worst‐off student under the pair of “fair” decisions, who has only 4. Locally applied maximin therefore blocked inequalities that, in combination, would have raised the floor of advantage. Therefore, by counterexample, satisfying the difference principle in every constituent situation does not guarantee it is satisfied (or maximised) in the aggregate, and the “Strong Aggregation” assumption is false.

3.2 Against Moderate Aggregation

The Rawlsian Camp, that are applying these arguments across systems for AI fairness, may respond with a modest thesis; “in realistic settings local maximin very likely yields global maximin”. This claim for some moderate aggregation argues that if each algorithm follows maximin, the difference principle will probably hold in the aggregate. That is an empirical claim about real‐world risk patterns. Imagine two lotteries, L1 and L2, each describing the pay‐offs an algorithm may generate.

In L1 an individual has;

  • a 20% chance of 1

  • a 60% chance of 2

  • a 20% chance of 5

(expected-value 2.4).

In L2 the same probabilities are attached to 1, 2, and 3 (expected-value 2).

Both lotteries share the same worst outcome (1). Under the difference principle, inequality is justified only if it raises that floor. Here it does not, so a single application of maximin prefers the more even L2 (expected value 2). Now let society run ten independent draws from whichever lottery is embedded in its algorithms. Independence means averages converge; aggregate of ten L1’s ≈ total 24 units, aggregate of ten L2’s ≈ total 20 units. More telling is the worst typical realisation. The mean of the sum grows with n; the standard deviation grows only with √n. After ten draws, a “bad‐luck” case (one standard deviation below the mean) leaves the worst‐off with about 24 − 4.3 ≈ 19.7 units under L1, but only 20 − 2 = 18 under L2. Aggregation therefore reverses the local verdict; the long‐run prospects of the least advantaged improve when we accept the locally disfavoured inequality.

The upshot is clear. Because repeated, independent decisions tighten the distribution around the higher mean, local maximin ignores a statistical fact that ultimately matters to Rawls’s ‘normal‐range’ least advantaged. Even a moderate aggregation claim fails, and with it the hope that sprinkling maximin constraints over individual algorithms will probably secure justice in the whole system.

3.3 Rawls Never Wanted Case‐by‐Case Maximin

This is not a particularly surprising result as Rawls explicitly argues that the difference principle regulates the basic structure, not every allocation. Rawls explicitly excludes pathological tail cases; he speaks of the least advantaged “within the normal range” (Rawls 1971). If a disastrous outcome occurs only once in ten million trials, it lies outside that range. Statistical moments, not theoretical minima, therefore matter for Rawlsian evaluation. He warns against “administrative allotment” on a case‐by‐case basis because such micromanagement would paralyse social cooperation. Nozick, a critic, makes the same point; patterned principles require constant redistributions to preserve their pattern after voluntary exchanges. Thus Rawls confines maximin to constitutional design, progressive taxation, and the broad distributive schemes of a welfare state; not to each micro‐transaction or algorithmic score. When AI researchers relocate the principle into the bowels of isolated ML models, they stretch Rawls beyond recognition.

4. Responses and Moving Forward with Algorithmic Fairness

4.1 Defence of Rawls?

A Rawlsian might concede that the aggregation critique works only when multiple, loosely coupled systems divide society into countless local contexts. The difference principle is still useful when considering the how the broader digital landscape is already birthing single algorithms whose jurisdiction is so comprehensive that they function as institutions, not components. Imagine a national “Allocation AI” deployed by a future welfare‐state to (i) set every citizen’s income‐tax rate, (ii) disburse means‐tested transfers in real time, and (iii) automatically adjust health‐insurance premiums against those transfers. The engine ingests live payroll, spending, and health data, then produces a complete tax‐and‐benefit vector for each household every month. Citizens cannot opt out; the legislature sets the code’s objective function much as it now enacts a tax code. Distributive impact is immediate and society‐wide; there is no longer a gap between local and global contexts, because all relevant transfers run through the same optimisation loop.

In such a regime the algorithm is part of what Rawls calls the basic structure; “the major social institutions that distribute fundamental rights and duties and shape the division of advantages” (Rawls 1972). Rawls insists that the difference principle applies precisely at this institutional level and not at the level of isolated transactions. Therefore, says the defender, embedding maximin directly into the Allocation Engine’s loss function is a textbook application of Rawlsian justice: once the legislature certifies that the optimiser lifts the lowest life‐prospects subject to liberty constraints, there is nothing further to aggregate. The putative composition fallacy disappears because the digital Leviathan is itself the encompassing structure Rawls had in mind.

4.2 Counter-Response

This manoeuvre fails on conceptual and empirical grounds. Conceptually, Rawls’s basic structure is not a single monolith but a set of institutions whose joint effect on citizens’ life‐prospects must be assessed “as one scheme” (§3). Designing any single component (digital or otherwise) to satisfy maximin therefore leaves open how its outputs interact with taxation, housing, labour markets, and other AI systems. Empirically, large‐scale algorithms exhibit positive covariance; shocks in credit scoring propagate to hiring platforms, which in turn condition insurance premiums[6] (Dokko, Li&Hayes, 2015). Because the variance of correlated mechanisms adds, a maximin‐tuned tax model can be neutralised (or inverted) by a correlated recruiter. Thus the non‐aggregation problem simply re‐appears one level up: between basic‐structure components rather than between micro‐models. To claim otherwise is to ignore the networked character of modern socio‐technical institutions. Even if this is not true, such an argument concedes that a significant portion of Algorithmic Fairness strategies implemented today (such as Heidari et al. 2019) do not satisfiably adopt Rawlsian theory.

4.3 Implications for the Rawlsian Method

The impasse exposes a structural limit in Rawls’s apparatus. His principles assume a well‐bounded basic structure whose elements can be treated as a single deliberative unit. Contemporary AI instead yields modular, rapidly re‐composable infrastructures whose distributive effects emerge only at the level of their covariance patterns. Rawls’s theory therefore under‐specifies guidance for contexts where institutional boundaries blur and systemic risk is endogenous. Justice now depends less on the ethics of single algorithms than on how those algorithms co‐move.

What, then, should guide post‐Rawlsian fairness work? Consider three commitments that are likely relevant. Firstly, we must treat socio-technical harms the way epidemiologists treat contagion: map transmission paths, measure covariance, and model tipping points. Only with that picture can we identify which inequalities are harmless noise and which trigger spirals of cumulative disadvantage. Secondly, because no single optimiser governs the whole field, remedies will be heterogeneous; labour law to curb exploitative scheduling algorithms; public data trusts to counteract surveillance rents; procurement rules that forbid opacity in welfare adjudication. The criterion is not theoretical purity but the concrete leverage each measure offers against systemic risk. Thirdly, it’s likely that if interactions are fluid, oversight must be continuous and participatory. Affected communities need standing channels to surface new failure modes and to demand course‐corrections as models drift. Fairness becomes an ongoing practice of contestation rather than a one‐off mathematical guarantee.

Importantly, what these kinds of commitments show is that we do not abandon Rawls’s concern for the least advantaged; we relocate it from the interior of an optimiser to the public architecture that shapes how optimisers are built, combined, and restrained. In short, justice in the age of agentic AI is less a matter of perfecting local rules and more a matter of steering a dynamic, interconnected system toward humane ends.

5. Conclusion

Rawls gave political philosophy a luminous target; organise society so that the worst‐off have most reason to hope. But when fairness is delegated to scattered learning systems, his difference principle loses its grip; maximin logic that is sound for a unitary “basic structure” misfires in a lattice of mutually‐conditioning algorithms. Our counter‐examples show that local compliance can depress the global floor, and probabilistic analysis reveals that risk co‐variance, not isolated minima, determines who is actually worst‐off. Rawls himself warned against case‐by‐case maximin; the digital age vindicates that caution. Local maximin constraints therefore offer, at best, a moral mirage. Future work in AI ethics must therefore shift from optimiser‐level tweaks to architecture‐level governance; mapping dependency chains, legislating data institutions, and ensuring democratic recourse. These tasks return us to Rawls’s animating concern, protecting the least advantaged, but relocate the struggle to the socio‐technical terrain where disadvantage now arises. Justice demands nothing less.

Bibliography

Abid, A., Farooqi, M., & Zou, J. (2021). Persistent anti-Muslim bias in large language models. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 298–306). https://​​doi.org/​​10.1145/​​3461702.3462624

Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. ArXiv:1607.06520 [Cs, Stat]. https://​​arxiv.org/​​abs/​​1607.06520

Dokko, J., Li, G., & Hayes, J. (2015). Credit scores and committed relationships (Finance and Economics Discussion Series 2015‐081). Board of Governors of the Federal Reserve System. https://​​doi.org/​​10.17016/​​FEDS.2015.081

Franke, U. (2024). Rawlsian Algorithmic Fairness and a Missing Aggregation Property of the Difference Principle. Philosophy & Technology, 37(3). https://​​doi.org/​​10.1007/​​s13347-024-00779-z

Heidari, H., Ferrari, C., Gummadi, K. P., & Krause, A. (2018). Fairness behind a veil of ignorance: A welfare analysis for automated decision making. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. Retrieved from https://​​dl.acm.org/​​doi/​​abs/​​10.5555/​​3326943.3327060

Heidari, H., Loi, M., Gummadi, K. P., & Krause, A. (2019). A moral framework for understanding fair ML through economic models of equality of opportunity. In Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency (pp. 181–190). ACM. https://​​doi.org/​​10.1145/​​3287560.3287584

Hurley, M., & Adebayo, J. (2023). The Interdependence of Algorithmic Systems and the Compounding of Harm: A Study in Risk Covariance. Journal of AI Ethics

Leben, D. (2017). A Rawlsian algorithm for autonomous vehicles. Ethics and Information Technology, 19(2), 107–115. Nozick, R. (1974). Anarchy, State, and Utopia. New York: Basic Books.

Peng, K. (2020). Affirmative equality: A revised goal of de-bias for artificial intelligence based on the difference principle. In Proceedings of the 2020 International Conference on Artificial Intelligence and Computer Engineering (pp. 15–19). IEEE.

Procaccia, A. (2019). AI researchers are pushing bias out of algorithms. Bloomberg Opinion. Retrieved from https://​​www.bloomberg.com/​​opinion/​​articles/​​2019-03-07/​​ai-researchers-are-pushing-bias-out-of-algorithms

Rawls, J. (1971). A theory of justice. Cambridge, MA: Harvard University Press. Schmidtz, D. (2006). Elements of justice. Cambridge: Cambridge University Press.

Shah, K., Gupta, P., Deshpande, A., & Bhattacharyya, C. (2021). Rawlsian Fair Adaptation of Deep Learning Classifiers. ArXiv (Cornell University), 936–945. https://​​doi.org/​​10.1145/​​3461702.3462592

  1. ^

    Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. ArXiv:1607.06520 [Cs, Stat]. https://​​arxiv.org/​​abs/​​1607.06520

  2. ^

    Abid, A., Farooqi, M., & Zou, J. (2021). Persistent anti-Muslim bias in large language models. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 298–306). https://​​doi.org/​​10.1145/​​3461702.3462624

  3. ^

    Practical fair‐ML libraries usually bundle three classes of methods: (i) pre‐processing routines that reweight or transform the data so that statistical constraints hold before learning (Feldman et al. 2015; Calmon et al. 2017); (ii) in‐processing wrappers that add convex/​Lagrangian constraints or adversarial regularisers to the loss, thereby enforcing parity during training (Zafar et al. 2017; Donini et al. 2018; Madras et al. 2018; Heidari et al. 2018); and (iii) post‐processing layers that shift scores or thresholds to equalise error or acceptance rates after training (Hardt et al. 2016; Agarwal et al. 2018). All three styles export Rawls’s society‐wide test into component‐level code and thereby rely (often implicitly) on the aggregation premise I will dispute.

  4. ^

    Heidari, H., Loi, M., Gummadi, K. P., & Krause, A. (2019). A moral framework for understanding fair ML through economic models of equality of opportunity. In Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency (pp. 181–190). ACM. https://​​doi.org/​​10.1145/​​3287560.3287584

  5. ^


    Heidari, H., Ferrari, C., Gummadi, K. P., & Krause, A. (2018). Fairness behind a veil of ignorance: A welfare analysis for automated decision making. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. Retrieved from https://​​dl.acm.org/​​doi/​​abs/​​10.5555/​​3326943.3327060

    Leben, D. (2017). A Rawlsian algorithm for autonomous vehicles. Ethics and Information Technology, 19(2), 107–115. Nozick, R. (1974). Anarchy, State, and Utopia. New York: Basic Books.

    Peng, K. (2020). Affirmative equality: A revised goal of de-bias for artificial intelligence based on the difference principle. In Proceedings of the 2020 International Conference on Artificial Intelligence and Computer Engineering (pp. 15–19). IEEE.

    Shah, K., Gupta, P., Deshpande, A., & Bhattacharyya, C. (2021). Rawlsian Fair Adaptation of Deep Learning Classifiers. ArXiv (Cornell University), 936–945. https://​​doi.org/​​10.1145/​​3461702.3462592

  6. ^

    Dokko, J., Li, G., & Hayes, J. (2015). Credit scores and committed relationships (Finance and Economics Discussion Series 2015‐081). Board of Governors of the Federal Reserve System. https://​​doi.org/​​10.17016/​​FEDS.2015.081

No comments.