Algorithms, Administrative Law and Administrative Justice

On three occasions this summer the governmental bodies in the United Kingdom have withdrawn or substantially modified decision-making processes which relied on the production of automated decisions by algorithms. In the areas of immigration, welfare and benefit administration, and examination grading, the bodies involved have effected u-turns. Taken together, the three episodes demonstrate the limitations imposed on ‘artificial administration‘ by administrative law and administrative justice. From an administrative law standpoint, the constraints of rationality/reasonableness and fairness restrict the ends to which algorithms can be put. And viewed from an administrative justice perspective, over-reliance on automated decision-making processes removes the individualized exercises of context-sensitive discretion which are vital to the legitimacy of governmental decisions.

The immigration streaming tool’s operation is described as follows by Rafe Jennings:

The “streaming tool” was an algorithmic system designed to categorise visa applications with reference to how much scrutiny each application needed. It would assign an application a red, amber, or green rating: red indicated that the application’s case worker ought to spend more time applying scrutiny, and would have to justify approving the application to a more senior officer. Applications with a red rating were much less likely to be successful than those rated green, with around 99.5% of green being successful but only 48.59% of red. 

Use of the streaming tool was attacked for breaching the Equality Act 2010 (as it discriminated on the basis of nationality by assigning some nationalities to the red rating) and for common law irrationality. Indeed, these grounds were mutually reinforcing: the more applications were assigned (on the basis of nationality) the more the tool became hard-wired to assign certain nationalities to the red rating. And once there, confirmation bias kicked in, with officers subjecting the red-rated applications to greater scrutiny. As Jennings explains:

The vicious circle present in the streaming tool “produce[d] substantive results which [were] irrational”. Because visa application refusals were considered to be adverse events, and those same adverse events fed into the algorithm’s decision making, certain nationalities were locked onto the EANRA “suspect” list. This further increased the number of adverse events associated with that nationality, in turn contributing to its position on the EANRA list. As such, the algorithm would class applications as high risk merely because it had done so in the past. Foxglove argued that this constituted irrationality. 

Interestingly, the Canadian government has been using an algorithmic system for some types of temporary resident visa. An initial distinction is made between complex and non-complex cases, with complex cases assigned for ‘traditional’ decision-making. The non-complex cases are triaged by an algorithm. First, applications identified as straightforward (low complexity) are approved automatically (as to eligibility; there is a separate process for admissibility). Second, medium complexity and high complexity files are sent to officers for traditional processing. An important distinction between the UK and Canadian processes is that in Canada the labels applied are based on complexity, not a probablistic judgement about the likelihood the application will be successful. As such, in the Canadian process there is no obvious trigger for confirmation bias. Of course, if the facts were to reveal that the complexity determination had an impact on the level of scrutiny exercised by officers, this would be potentially problematic when measured against the demands of justification and responsiveness imposed by the Supreme Court of Canada’s conception of reasonableness review.

The welfare and benefit controversy is explained in a story in the Guardian entitled “Council scrapping use of algorithms in benefit and welfare decisions“:

The Guardian has found that about 20 councils have stopped using an algorithm to flag claims as “high risk” for potential welfare fraud. The ones they flagged were pulled out by staff to double-check, potentially slowing down people’s claims without them being aware.

Previous research by the Guardian found that one in three councils were using algorithms to help make decisions about benefit claims and other welfare issues.

Research from Cardiff Data Justice Lab (CDJL), working with the Carnegie UK Trust, has been looking at cancelled algorithm programmes.

According to them, Sunderland council has stopped using one which was designed to help it make efficiency savings of £100m.

Their research also found that Hackney council in east London had abandoned using data analytics to help predict which children were at risk of neglect and abuse.

The Data Justice Lab found at least two other councils had stopped using a risk-based verification system – which identifies benefit claims that are more likely to be fraudulent and may need to be checked.

One council found it often wrongly identified low-risk claims as high-risk, while another found the system did not make a difference to its work.

These are, to put it mildly, not very encouraging episodes for proponents of leveraging technology to improve administrative decision-making. Note that the problems identified here relate principally to accuracy and efficiency, with algorithms being withdrawn because they made erroneous determinations or made no difference to outcomes. But consider also the irony of putting humans in the loop — so that outcomes are not entirely determined by a machine — only for this to slow down the decision-making process and cause hardship to individual claimants. Of course, an alternative interpretation of the report is that many claimants who otherwise might have been investigated by an officer and thus subject to payment delays were cleared by the algorithm. Nonetheless, these episodes confirm that any integration of algorithms into existing decision-making structures must be done with great care, due attention being paid to the challenge of striking a balance between (machine) automation and (human) discretion.

Lastly, the examination grading episode illuminates the constraints of administrative law but also its limitations in terms of providing effective remedies and thus highlights the utility of thinking about automation and algorithms in administrative justice terms. Briefly, with in-person examinations rendered impossible by the COVID-19 pandemic, a means had to be found of grading students at GCSE and A-Level. The stakes were high, as these results have lasting influence on employability and students’ self-esteem. University entrance is also largely determined by A-Level results. Accordingly, the Secretary of State for Education directed Ofqual, the exams watchdog, to develop a grading algorithm. Already in late July, Joe Tomlinson and Jack Maxwell identified problems with Ofqual’s approach:

[T]here are credible concerns that the Ofqual model might systematically disadvantage particular types of students. High achievers in historically low-performing schools, or members of groups that systematically outperform their teachers’ expectations in exams (e.g. students from low-income families or ethnic minority students), might be given lower grades than they deserve. For schools with typically small cohorts and fluctuating grades from year to year, such as small special educational needs providers, historical data might not be a reliable predictor of future performance. This kind of disparate impact could ultimately call the legality of the model into question.

As Tomlinson and Maxwell noted, Ofqual would have to disclose further details of its algorithm. So it did, but this only fuelled further criticism. When the algorithmic grades were produced, criticism turned to consternation (see here and here). Claims for judicial review were threatened on various grounds; and one could certainly see how the algorithm could lead to systemic unfairness in the sense that the algorithm “creates a real risk of a more than minimal number” of irrational decisions (R (BF (Eritrea)) v Secretary of State for the Home Department [2019] EWCA Civ 872, [2020] 4 WLR 38, [63] (Underhill LJ). Eventually, the government relented and modified the system for calculating grades; universities, too, in conjunction with the government made efforts to make more places available. The appeals system was, for many observers, the key, as it would allow corrections to be made at the margins where the algorithm had produced inaccurate grades.

Two observations are appropriate. First, judicial review was unlikely to provide a meaningful remedy for the shortcomings of Ofqual’s algorithm. Individual claims for judicial review would, at best, have resulted in individualized remedies which could only be declaratory (as quashing the results given would have been of no practical benefit to the student concerned). Similarly, any relief relating to broad issues of systemic unfairness could only have been declaratory. An erstwhile claimant could also have joined a university to the claim, but a reviewing court would be singularly unlikely even in these unusual circumstances to issue a mandatory order to a university to admit a student, still less a class of students. (I leave aside the remedies available under data protection law, with which I am not very familiar).

Second, what was really required was an element of individualized discretion, in the form of an appeals system where a human (or panel of humans) would look at the circumstances of individual cases where aberrant grades were produced and, if appropriate, substitute its judgement for that of the algorithm. This, I would say, is a matter of administrative justice, going to the legitimacy of the decision-making process. Even if Ofqual could ‘get away with it’ in administrative law terms, the system it created was deficient in administrative justice terms. Failing to provide an appeal of algorithmic determinations of life-changing decisions where humans could exercise individualized discretion proved fatal to the legitimacy of the process.

Indeed, administrative justice concerns could also be raised about the immigration streaming tool and the welfare and benefit decisions. In both of these instances, important decisions were to be taken and one of the problems with the automation of the processes was that humans — and individualized discretion — were either taken out of the loop entirely or their involvement tainted by confirmation bias.

Accordingly, the lesson of these episodes is that decision-makers attracted by the lure of quicker, more efficient algorithmic-powered processes should be attentive both to the constraints of administrative law and the demands of administrative justice. Exercises of governmental power must be acceptable in administrative law terms and should be legitimate in administrative justice terms.

This content has been updated on August 26, 2020 at 02:31.