Accountable Automated Decision-making: Some Challenges

In a new paper to be published in the Australian Journal of Administrative Law & Practice, entitled “Artificial Administration: Administrative Law, Administrative Justice and Accountability in the Age of Machines“, I bring together much of my previous scholarship on the topic of automation in public administration. In Parts II and III of the paper, I describe the requirements of lawfulness (administrative law) and morality (administrative justice), and in Part IV, I describe the different ways in which an account might be given – through legislative requirements, soft law or policy obligations, or judicial review – and set out how these mechanisms measure and sanction failures to meet the lawfulness and morality yardsticks. I conclude that each of the existing mechanisms has important limitations.

Hard Law

Statutory Bases

There are very few legislative requirements relating to the use of technology in public administration. Over the years, legislatures in Canada[1] and the UK[2] have provided for a statutory basis for automation in certain cases. The concern here seems to be that the use of a computer to make a decision would be a breach of the principle that a statutory power cannot be sub-delegated without statutory authority to that effect. The utility of such provisions is somewhat questionable: even if automation is provided for in statute, a decision-maker must comply with the other principles of administrative law. One intriguing possibility, as in Pintarich v Deputy Commissioner of Taxation,[3] is that where a statute provides for a “decision” to be made, it might be argued that a decision taken by a machine is no decision at all. The majority of the Full Federal Court found that there had been no “decision,” as the production of a computer-generated letter did not involve “a mental process of reaching a conclusion.”[4] Statutory authorisation for the use of automated technology would undercut this sort of analysis but merely putting it on a statutory footing does not address the requirements of administrative lawfulness and does not speak to administrative justice at all.

The General Data Protection Regulation

A more far-reaching legislative intervention is the development of the GDPR in the European Union. As a European Union regulation, the GDPR is directly effective in European Union member states. The GDPR is general in nature: it applies to private enterprise and the public sector alike, imposing restrictions on the use of personal data and the deployment of automated decision-making. Some of the restrictions are procedural, others more substantive.

Procedurally speaking, the GDPR imposes a series of restrictions on the use of personal data with corresponding individual rights to ask for the rectification of errors and the portability of data. As such, for the use of technology to be lawful in the European Union, decision-makers must comply with the procedural requirements of the GDPR.

Substantively speaking, article 22(1) creates a right not to be subjected to an automated decision: “The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her”. This requirement imposes substantive limitations on the use of technology. But the requirement is limited in scope: only decisions based “solely” on automated processing are prohibited and then only when they produce “legal effects”.[5] Moreover, article 22(2) creates exceptions, one for situations where the consent of the subject has been received and another where an automated decision “is authorised by Union or Member State law to which the controller is subject and which also lays down suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests”. Evidently, a wide variety of safeguards could qualify as suitable measures. There is no free-standing right to an explanation of an automated decision, short of high-level information about the operation of the system that produced the decision, still less to a decision taken by a human.[6] There are debates about how best to resolve the ambiguities in these provisions of the GDPR,[7] with EU-level working groups[8] and member states[9] hard at work in providing further specification to article 22. For the moment, however, although the GDPR is hard law, it does not impose particularly hard constraints on the use of technology in public administration.

So far at least, as a means of imposing meaningful accountability on the use of technology in public administration, hard law has come up short. Of course, existing hard law could be modified and a general code adopted for the regulation of use of technology in public administration. But even at their best, general administrative law codes need to be supplemented by additional requirements not foreseen by their drafters.[10] At their worst such codes can contribute to the ossification of decision-making processes by imposing requirements rendered obsolete by technological and other advances.[11] Hard law is static law and, as such, a poor fit for an inherently dynamic phenomenon.

Soft Law

Soft law is another possible means of ensuring accountability. Soft law owes its name to its ambiguous (and asymmetrical) legal status, ‘soft’ in that it is non binding but ‘law’ in that it is nonetheless intended to guide conduct. Typical examples of soft law are policies, operational manuals, training guides and other means of institutionalising decision-making processes. These can be directed at uses of technology in public administration.

The Directive on Automated Decision-making

The most high-profile example of soft law being used to regulate the use of technology in public administration comes from the Canadian federal government, where the Treasury Board has developed a non-binding Directive on Automated Decision-making.[12] The operation of the DADM relies heavily on the Algorithmic Impact Assessment. This tool is a questionnaire that seeks to determine an automated decision-system’s impact in order to calibrate the operation of the DADM. The questionnaire includes 48 risk-based questions and 33 mitigation-based questions. The AIA comprises questions to assess areas of risk related to the project, system, algorithm, decision, impact, and data used. The AIA also assesses any mitigation measures in place that manage the risks, such as internal and external consultation, as well as de-risking measures related to data quality, procedural fairness, and privacy rights. Assessment scores are based on factors such as the complexity of the system’s design, algorithm, decision type, impact, and data.

The AIA determines the impact level of an automated decision-system, specifically by measuring how the system affects four areas: the rights of individuals or communities; the health or well-being of individual or communities; the economic interests of individuals, entities, or communities; and the ongoing sustainability of an ecosystem. There are four impact levels that identify the significance of decisions made by the algorithm. Level I decisions will likely have little to no impact on the aforementioned four areas, and such decisions will often lead to impacts that are reversible and brief. Level II decisions will likely have moderate impacts, and the decisions will often lead to impacts that are likely reversible and short term. Level III decisions will likely have high impacts, and the decisions will often lead to impacts than can be difficult to reverse and are ongoing. And lastly, level IV decisions will have very high impacts, and such decisions will often lead to impacts that irreversible and are perpetual.

After the AIA tool determines the impact assessment level, the DADM then imposes specific requirements that reflect the significance of the decision. More significant systems will have a correspondingly greater risk than a simple, minor system. Consequently, the DADM imposes more stringent requirements for more significant algorithms. There are seven impact level requirements: peer review; notice; human-in-the-loop for decisions; explanation; training; contingency planning; and approval for the system to operate.

Each requirement imposes different obligations depending on the level of impact. For example, where more impactful decisions are to be taken, notice to this effect must be provided on the program or service website. To take another instance, whereas low-impact decisions can be taken without human intervention, high-impact decisions “cannot be made without having specific human intervention points during the decision-making process; and the final decision must be made by a human”. One last example is that individuals must be provided with a “meaningful explanation” of the decision: for low-impact decisions, an explanation in a general FAQ document for “common decisions results” will suffice but at the higher end of the impact scale the meaningful explanation is to be “provided with any decision that resulted in the denial of a benefit, a service, or other regulatory action”.

Plainly, the DADM provides a helpful framework. However, the framework is limited in its ability to enforce the principles of administrative law and administrative justice.[13]

First, the DADM does not ensure lawfulness. For one thing, the “meaningful explanation” requirement applies only when a negative decision is being rendered, not to positive decisions. Inasmuch as positive decisions are tainted by bias or a breach of the principles of administrative law, the DADM does not offer an accountability mechanism. For another thing, the DADM does not “anticipate” sub-delegation issues or other issues relating to the requirement to have a human decision-maker in or on the loop.[14] In addition, the focus on notice as a procedural requirement understates the obligation to provide an individual with a meaningful opportunity to respond.

Second, the DADM does not necessarily comply with the principles of administrative justice. There is no a general requirement for human decision-making, save in cases of higher impact.[15] Yet there could well be instances where administrative justice requires the exercise of professional treatment or moral judgment  from a discretionary decision but the DADM treats the decision as having low impact: the Robo-debt scandal in Australia is an example – as the consequences of the decision were reversible the use of the algorithm might not have qualified as high-impact for DADM purposes. Furthermore, even allowing for the DADM to be tweaked to extend the requirement for human decision-making, as it currently stands the DADM does not address the risk of automation bias[16] (although it does seek to weed out system-level bias through its peer review requirements[17]).

Third, there are issues of scope. The DADM only applies to new processes introduced by the federal government: it does not apply retrospectively or retroactively and does not apply provincially. Moreover, the DADM only applies to “external services”, not to internal uses. Perhaps most importantly, the DADM is not binding and no sanctions are provided for non-compliance with its terms. This is a feature of all soft law.

Some of these shortcomings could be remedied by amendments to the DADM. But problems would remain. For one thing,although the DADM is inherently flexible and dynamic, these gains are offset by the absence of a binding mechanism for enforcement. For another, the DADM and any similar instrument will inevitably be general in its scope, applying in an across-the-board fashion to a wide range of technologies over a wide range of government departments: the advantage of its general applicability is offset by a corresponding lack of specificity.

Judicial Review

Let us begin with a positive story about judicial review as an accountability mechanism. In Barre v Canada (Citizenship and Immigration),[18] the Federal Court of Canada considered a judicial review application of a decision vacating the refugee status of two individuals. The decision-maker had found that the applicants were not Somalis, as they had claimed, but rather Kenyans who had initially entered Canada on study permits under different names. The process of vacating their refugee status was initiated by the Minister of Public Safety and Emergency Preparedness, who introduced evidence of photo comparisons between the applicants and the Kenyan students. The applicants objected, on the basis that the comparisons were probably based on the use of facial recognition technology that is notoriously unreliable when used on Black women and other women of colour. The decision-maker was not swayed by the objections and vacated their refugee status. Go J quashed the decision, mostly because the decision-maker had offered inadequate reasons for refusing to inquire into the Minister’s alleged use of facial recognition software.[19] This was particularly problematic because of the reliance on software liable to misidentify women of colour.[20] Accordingly, admitting the photo comparisons into evidence without qualification was a fatal error.

Here, judicial review worked effectively in ensuring the procedural fairness and substantive reasonableness of a decision-making process in which technology was a component. This is judicial review operating as it should, to enforce minimum standards of lawfulness in public administration.

Scope of Judicial Review

However, the range of cases in which a reviewing judge will ride to the rescue is far from boundless. One immediate difficulty with judicial review as an accountability mechanism for the use of technology is that the scope of judicial review is limited. Evidently, judicial review is about administrative law, not administrative justice, and is thereby concerned with a relatively narrow range of issues. In addition, where technology is producing positive decisions only (like the Canadian immigration system), there will be nothing to judicially review, as the successful applicant has already received status.

Moreover, decisions which do not affect an individual’s rights, interests or privileges escape judicial review entirely. For example, the use of technology to better direct scarce enforcement resources would be unreviewable save in egregious circumstances: when a regulator ‘opens a file’ on a person suspected of breaching their regulatory obligations no enforceable administrative law obligation arises; fairness only kicks in when concrete steps towards a sanction are taken. Judicial intervention is only plausible in situations where there is a pattern of evidence supporting the proposition that enforcement is being conducted in a way that breaches rights, perhaps because there is targeting of a particular group. Such cases are rare, if only because the means of proof are typically inaccessible.

Judicial Review is Based on the Record

This leads to the second major limitation of judicial review: the content of the record. Judicial review is conducted on the basis of the information before the decision-maker at the time of decision. Yet as far as the use of technology is concerned, a great deal of relevant information will not be before the decision-maker and, therefore, not before the court.

In order to ensure lawfulness, the administrative record must give sufficient details of the relationship between the human decision-maker and the machine in question (out of, on, or in the decision-making loop). It is useful to consider, again, the use of artificial administration to automate approvals for temporary residence visas and sponsorship in Canada. Recall that applications are triaged into different groups, with low-complexity files being approved for eligibility automatically but higher-complexity files requiring human review. It is very difficult to know whether officers conducting human reviews are likely to be influenced in their review by the presence of upstream automation. There is no obvious trigger for confirmation bias but officers may, over time, come to recognise the features of low-complexity files and conceivably focus greater attention on medium- or high-complexity files.[21] If officers are now receiving higher-complexity files for decision, they are apt to apply a higher level of scrutiny to those files than they did before, with their priors favouring close analysis at the least and potentially even rejection. But none of this information would be put before a reviewing court. The judges would simply not be in a position to enforce the principles of administrative law. Moreover, they may not have the necessary expertise to do so.[22]

Judicial Review Remedies

A third shortcoming of judicial review is that remedies are limited. Consider again the story of examination grading in the United Kingdom. Judicial review was unlikely to provide a meaningful remedy for the shortcomings of Ofqual’s algorithm. Individual claims for judicial review would, at best, have resulted in individualised remedies which could only be declaratory, as quashing the results given would have been of no practical benefit to the student concerned. Similarly, any relief relating to broad issues of systemic unfairness could only have been declaratory. An erstwhile claimant could also have joined a university to the claim, but a reviewing court would be singularly unlikely even in these unusual circumstances to issue a mandatory order to a university to admit a student, still less a class of students.

In the final section of the paper, I discuss how a specialized watchdog could respond to the challenges of ensuring lawful and moral use of technology in public administration.

[1] Immigration and Refugee Protection Act, SC 2001, c 27, s. 186.1.

[2] Social Security Act 1998, s. 2(1).

[3] (2018) 262 FCR 41; [2018] FCAFC 79.

[4] Ibid., at para. 140.

[5] Michael Veale and Reuben Binns, “Is that your final decision? Multi-stage profiling, selective effects, and Article 22 of the GDPR” (2021) 11 International Data Privacy Law 319.

[6] Sandra Wachter, Brent Mittelstadt and Luciano Floridi, “Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation” (2017) 7 International Data Privacy Law 76.

[7] See e.g. Gianclaudio Malgieri and Giovanni Comandé, “Why a Right to Legibility of Automated Decision-Making Exists in the General Data Protection Regulation” (2017) 7 International Data Privacy Law 243.

[8] Michael Veale and Lilian Edwards, “Clarity, Surprises, and Further Questions in the Article 29 Working Party Draft Guidance on Automated Decision-Making and Profiling” (2018) 34 Computer Law & Security Review 398.

[9] Gianclaudio Malgieri, “Automated Decision-Making in the EU Member States: The Right to Explanation and Other “Suitable Safeguards” in the National Legislations” (2019) 35 Computer Law & Security Review 13.

[10] Harlow and Rawlings, above n.2.

[11] Thomas McGarity, “Some Thoughts on ‘Deossifying’ the Rule-making Process” (1992) 41 Duke Law Journal 1385.

[12] See also, from an earlier era, the Australian government’s, Automated Assistance in Administrative Decision-Making: Better Practice Guide (Commonwealth of Australia 2007). See generally Janina Boughey and Katie Miller above n.1.

[13] Teresa Scassa, “Administrative Law and the Governance of Automated Decision-Making: A Critical Look at Canada’s Directive on Automated Decision-Making” (2021) 54 University of British Columbia Law Review 251.

[14] Ibid., at p. 278.

[15] Ibid., at p. 288.

[16] Ibid., at pp. 279-281.

[17] Ibid., at pp. 281-283.

[18] 2022 FC 1078.

[19] Ibid., at paras. 43-55.

[20] Ibid., at para. 56.

[21] Algorithmic Impact Assessment – Advanced Analytics Triage of Visitor Record Applications, at pp. 6-7.

[22] See also Rebecca Williams, “Rethinking Administrative Law for Algorithmic Decision Making” (2022) 42 Oxford Journal of Legal Studies 468.

This content has been updated on May 15, 2023 at 21:42.