Research Paper

profilereddygs17
Article-1.pdf

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWIm… 1/14

Title:

Database:

Prediction, persuasion, and the jurisprudence of behaviourism By: Frank Pasquale, Glyn Cashwell, 17101174, , Vol. 68, Issue 1

ProjectMUSE

Prediction, persuasion, and the jurisprudence of behaviourism

There is a growing literature critiquing the unreflective application of big data, predictive analytics, artificial intelligence, and machine-learning techniques to social problems. Such methods may reflect biases rather than reasoned decision making. They also may leave those affected by automated sorting and categorizing unable to understand the basis of the decisions affecting them. Despite these problems, machine-learning experts are feeding judicial opinions to algorithms to predict how future cases will be decided. We call the use of such predictive analytics in judicial contexts a jurisprudence of behaviourism as it rests on a fundamentally Skinnerian model of cognition as a black-boxed transformation of inputs into outputs. In this model, persuasion is passé; what matters is prediction. After describing and critiquing a recent study that has advanced this jurisprudence of behaviourism, we question the value of such research. Widespread deployment of prediction models not based on the meaning of important precedents and facts may endanger the core rule-of-law values.

artificial intelligence; cyber law; machine learning; jurisprudence; predictive analysis

I Introduction A growing chorus of critics are challenging the use of opaque (or merely complex) predictive analytics programs to monitor, influence, and assess individuals’ behaviour. The rise of a ‘black box society’ portends profound threats to individual autonomy; when critical data and algorithms cannot be a matter of public understanding or debate, both consumers and citizens are unable to comprehend how they are being sorted, categorized, and influenced.[ 2]

A predictable counter-argument has arisen, discounting the comparative competence of human decision makers. Defending opaque sentencing algorithms, for instance, Christine Remington (a Wisconsin assistant attorney general) has stated: ‘We don’t know what’s going on in a judge’s head; it’s a black box, too.’[ 3] Of course, a judge must (upon issuing an important decision) explain why the decision was made; so too are agencies covered by the Administrative Procedure Act obliged to offer a ‘concise statement of basis and purpose’ for rule making.[ 4] But there is a long tradition of realist commentators dismissing the legal justifications adopted by judges as unconvincing fig leaves for the ‘real’ (non-legal) bases of their decisions.

In the first half of the twentieth century, the realist disdain for stated rationales for decisions led in at least two directions: toward more rigorous and open discussions of policy considerations motivating judgments and toward frank recognition of judges as political actors, reflecting certain ideologies, values, and interests. In the twenty-first century, a new response is beginning to emerge: a deployment of natural language processing and machine-learning (ML) techniques to predict whether judges will hear a case and, if so, how they will decide it. ML experts are busily feeding algorithms with the opinions of the Supreme Court of the United States, the European Court of Human Rights, and other judicial bodies as well as with metadata on justices’ ideological commitments, past

Listen American Accent

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWIm… 2/14

voting record, and myriad other variables. By processing data related to cases, and the text of opinions, these systems purport to predict how judges will decide cases, how individual judges will vote, and how to optimize submissions and arguments before them.

This form of prediction is analogous to forecasters using big data (rather than understanding underlying atmospheric dynamics) to predict the movement of storms. An algorithmic analysis of a database of, say, 10,000 past cumulonimbi sweeping over Lake Ontario may prove to be a better predictor of the next cumulonimbus’s track than a trained meteorologist without access to such a data trove. From the perspective of many predictive analytics approaches, judges are just like any other feature of the natural world – an entity that transforms certain inputs (such as briefs and advocacy documents) into outputs (decisions for or against a litigant). Just as forecasters predict whether a cloud will veer southwest or southeast, the user of a ML system might use machine-readable case characteristics to predict whether a rainmaker will prevail in the courtroom.

We call the use of algorithmic predictive analytics in judicial contexts an emerging jurisprudence of behaviourism, since it rests on a fundamentally Skinnerian model of mental processes as a black-boxed transformation of inputs into outputs.[ 5] In this model, persuasion is passé; what matters is prediction.[ 6] After describing and critiquing a recent study typical of this jurisprudence of behaviourism, we question the value of the research program it is advancing. Billed as a method of enhancing the legitimacy and efficiency of the legal system, such modelling is all too likely to become one more tool deployed by richer litigants to gain advantages over poorer ones.[ 7] Moreover, it should raise suspicions if it is used as a triage tool to determine the priority of cases. Such predictive analytics are only as good as the training data on which they depend, and there is good reason to doubt such data could ever generate in social analysis the types of ground truths characteristic of scientific methods applied to the natural world. While fundamental physical laws rarely if ever change, human behaviour can change dramatically in a short period of time. Therefore, one should always be cautious when applying automated methods in the human context, where factors as basic as free will and political change make the behaviour of both decision makers, and those they impact, impossible to predict with certainty.[ 8]

Nor are predictive analytics immune from bias. Just as judges bring biases into the courtroom, algorithm developers are prone to incorporate their own prejudices and priors into their machinery.[ 9] In addition, biases are no easier to address in software than in decisions justified by natural language. Such judicial opinions (or even oral statements) are generally much less opaque than ML algorithms. Unlike many proprietary or hopelessly opaque computational processes proposed to replace them, judges and clerks can be questioned and rebuked for discriminatory behaviour.[ 10] There is a growing literature critiquing the unreflective application of ML techniques to social problems.[ 11] Predictive analytics may reflect biases rather than reasoned decision making.[ 12] They may also leave those affected by automated sorting and categorizing unable to understand the basis of the decisions affecting them, especially when the output from the models in anyway affects one’s life, liberty, or property rights and when litigants are not given the basis of the model’s predictions.[ 13]

This article questions the social utility of prediction models as applied to the judicial system, arguing that their deployment may endanger core rule-of-law values. In full bloom, predictive analytics would not simply be a camera trained on the judicial system, reporting on it, but it would also be an engine of influence, shaping it. Attorneys may decide whether to pursue cases based on such systems; courts swamped by appeals or applications may be tempted to use ML models to triage or prioritize cases. In work published to widespread acclaim in 2016, Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preoţiuc-Pietro, and Vasileios Lampos made bold claims about the place of natural language processing (NLP) in the legal system in their article Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective.[ 14] They claim that ‘advances in Natural Language Processing (NLP) and Machine Learning (ML) provide us with the tools to automatically analyse legal materials, so as to build successful predictive models of judicial outcomes.’[ 15] Presumably, they are referring to their own work as part of these advances. However, close analysis of their ‘systematic study on predicting the outcome of cases tried by the European Court of Human Rights based solely on textual content’ reveals that their soi-disant ‘success’ merits closer scrutiny on both positive and normative grounds.

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWIm… 3/14

The first question to be asked about a study like Predicting Judicial Decisions is: what are its uses and purposes? Aletras and colleagues suggest at least three uses. First, they present their work as a first step toward the development of ML and NLP software that can predict how judges and other authorities will decide legal disputes. Second, Aletras has clearly stated to media that artificial intelligence ‘could also be a valuable tool for highlighting which cases are most likely to be violations of the European Convention of Human Rights’ – in other words, that it could help courts triage which cases they should hear.[ 16] Third, they purport to intervene in a classic jurisprudential debate – whether facts or law matter more in judicial determinations.[ 17] Each of these aims and claims should be rigorously interrogated, given shortcomings of the study that the authors acknowledge. Beyond these acknowledged problems, there are even more faults in their approach which cast doubt on whether the research program of NLP-based prediction of judicial outcomes, even if pursued in a more realistic manner, has anything significant to contribute to our understanding of the legal system.

Although Aletras and colleagues have used cutting edge ML and NLP methods in their study, their approach metaphorically stacks the deck in favour of their software and algorithms in so many ways that it is hard to see its relevance to either practising lawyers or scholars. Nor is it plausible to state that a method this crude, and disconnected from actual legal meaning and reasoning, provides empirical data relevant to jurisprudential debates over legal formalism and realism. As more advanced thinking on artificial intelligence and intelligence augmentation has already demonstrated, there is an inevitable interface of human meaning that is necessary to make sense of social institutions like law.

II Stacking the deck: ‘predicting’ the contemporaneous The European Court of Human Rights (ECtHR) hears cases in which parties allege that their rights under the articles of the European Convention of Human Rights were violated and not remedied by their country’s courts.[ 18] The researchers claim that the textual model has an accuracy of ‘79% on average.’[ 19] Given sweepingly futuristic headlines generated by the study (including ‘Could AI [Artificial Intelligence] Replace Judges and Lawyers?’), a casual reader of reports on the study might assume that this finding means that, using the method of the researchers, those who have some aggregation of data and text about case filings can use that data to predict how the ECtHR will decide a case, with 79 per cent accuracy.[ 20] However, that would not be accurate. Instead, the researchers used the ‘circumstances’ subsection in the cases they claimed to ‘predict,’ which had ‘been formulated by the Court itself.’[ 21] In other words, they claimed to be ‘predicting’ an event (a decision) based on materials released simultaneously with the decision. This is a bit like claiming to ‘predict’ whether a judge had cereal for breakfast yesterday based on a report of the nutritional composition of the materials on the judge’s plate at the exact time she or he consumed the breakfast.[ 22] Readers can (and should) balk at using the term ‘prediction’ to describe correlations between past events (like decisions of a court) and contemporaneously generated, past data (like the circumstances subsection of a case). Sadly, though, few journalists breathlessly reporting the study by Aletras and colleagues did so.

To their credit, though, Aletras and colleagues repeatedly emphasize how much they have effectively stacked the deck by using ECtHR-generated documents themselves to help the ML/NLP software they are using in the study ‘predict’ the outcomes of the cases associated with those documents. A truly predictive system would use the filings of the parties, or data outside the filings, that was in existence before the judgement itself. Aletras and colleagues grudgingly acknowledge that the circumstances subsection ‘should not always be understood as a neutral mirroring of the factual background of the case,’ but they defend their method by stating that the ‘summaries of facts found in the “Circumstances” section have to be at least framed in as neutral and impartial a way as possible.’[ 23] However, they give readers no clear guide as to when the circumstances subsection is actually a neutral mirroring of factual background or how closely it relates to records in existence before a judgment that would actually be useful to those aspiring to develop a predictive system.

Instead, their ‘premise is that published judgments can be used to test the possibility of a text-based analysis for ex ante predictions of outcomes on the assumption that there is enough similarity between (at least) certain chunks of the text of published judgments and applications lodged with the Court and/or briefs submitted by parties with respect to pending cases.’[ 24] But they give us few compelling reasons to accept this assumption since almost any court writing an opinion to justify a judgment is going to develop a facts section in ways that reflect its outcome. The authors state that the ECtHR has ‘limited fact finding powers,’ but they give no

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWIm… 4/14

sense of how much that mitigates the cherry-picking of facts or statements about the facts problem. Nor should we be comforted by the fact that ‘the Court cannot openly acknowledge any kind of bias on its part.’ Indeed, this suggests a need for the Court to avoid the types of transparency in published justification that could help researchers artificially limited to NLP better understand it.[ 25] The authors also state that in the ‘vast majority of cases,’ the ‘parties do not seem to dispute the facts themselves, as contained in the “Circumstances�� subsection, but only their legal significance.’ However, the critical issues here are, first, the facts themselves and, second, how the parties characterized the facts before the circumstances section was written. Again, the fundamental problem of mischaracterization – of ‘prediction’ instead of mere correlation or relationship – crops up to undermine the value of the study.

Even in its most academic mode – as an ostensibly empirical analysis of the prevalence of legal realism – the study by Aletras and colleagues stacks the deck in its favour in important ways. Indeed, it might be seen as assuming at the outset a version of the very hypothesis it ostensibly supports. This hypothesis is that something other than legal reasoning itself drives judicial decisions. Of course, that is true in a trivial sense – there is no case if there are no facts – and perhaps the authors intend to make that trivial point.[ 26] However, their language suggests a larger aim, designed to meld NLP and jurisprudence. Given the critical role of meaning in the latter discipline, and their NLP methods’ indifference to it, one might expect an unhappy coupling here. And that is indeed what we find.

In the study by Aletras and colleagues, the corpus used for the predictive algorithm was a body of ECtHR’s ‘published judgments.’ Within these judgments, a summary of the factual background of the case was summarized (by the Court) in the circumstances section of the judgments, but the pleadings themselves were not included as inputs.[ 27] The law section, which ‘considers the merits of the case, through the use of legal argument,’ was also input into the model to determine how well that section alone could ‘predict’ the case outcome.[ 28]

Aletras and colleagues were selective in the corpus they fed to their algorithms. The only judgments that were included in the corpus were those that passed both a ‘prejudicial stage’ and a second review.[ 29] In both stages, applications were denied if they did not meet ‘admissibility criteria,’ which were largely procedural in nature.[ 30] To the extent that such procedural barriers were deemed ‘legal,’ we might immediately have identified a bias problem in the corpus – that is, the types of cases where the law entirely determined the outcome (no matter how compelling the facts may have been) were removed from a data set that was ostensibly fairly representative of the universe of cases generally. This is not a small problem either; the overwhelming majority of applications were deemed inadmissible or struck out and were not reportable.[ 31]

But let us assume, for now, that the model only aspired to offer data about the realist/formalist divide in those cases that did meet the admissibility criteria. There were other biases in the data set. Only cases that were in English, approximately 33 per cent of the total ECtHR decisions, were included.[ 32] This is a strange omission since the NLP approach employed here had no semantic content – that is, the meaning of the words did not matter to it. Presumably, this omission arose out of concerns for making data coding and processing easier. There was also a subject matter restriction that further limited the scope of the sample. Only cases addressing issues in Articles 3, 6, and 8 of the ECHR were included in training and in verifying the model. And there is yet another limitation: the researchers then threw cases out randomly (so that the data set contained an equal number of violation/no violation cases) before using them as training data.[ 33]

III Problematic characteristics of the ECtHR textual ‘predictive’ model The algorithm used in the case depended on an atomization of case language into words grouped together in sets of one-, two-, three-, and four-word groupings, called n-grams.[ 34] Then, 2,000 of the most frequent n-grams, not taking into consideration ‘grammar, syntax and word order,’ were placed in feature matrices for each section of decisions and for the entire case by using the vectors from each decision.[ 35] Topics, which are created by ‘clustering together n-grams,’ were also created.[ 36] Both topics and n-grams were used to ‘to train Support Vector Machine (SVM) classifiers.’ As the authors explain, an ‘SVM is a machine learning algorithm that has shown particularly good results in text classification, especially using small data sets.’[ 37] Model training data from these opinions were ‘n-gram features,’ which consist of groups of words that ‘appear in similar contexts.’[ 38] Matrix mathematics, which

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWIm… 5/14

are manipulations on two-dimensional tables, and vector space models, which are based on a single column within a table, were programmed to determine clusters of words that should be similar to one another based on textual context.[ 39] These clusters of words are called topics. The model prevented a word group from showing up in more than one topic. Thirty topics, or sets of similar word groupings, were also created for entire court opinions. Topics were similarly created for entire opinions for each article.[ 40] Since the court opinions all follow a standard format, the opinions could be easily dissected into different identifiable sections.[ 41] Note that these sorting methods are legally meaningless. N-grams and topics are not sorted the way a treatise writer might try to organize cases or a judge might try to parse divergent lines of precedent. Rather, they simply serve as potential independent variables to predict a dependent variable (was there a violation, or was there not a violation, of the Convention).

Before going further into the technical details of the study, it is useful to compare it to prior successes of ML in facial or number recognition. When a facial recognition program successfully identifies a given picture as an image of a given person, it does not achieve that machine vision in the way a human being’s eye and brain would do so. Rather, an initial training set of images (or perhaps even a single image) of the person are processed, perhaps on a 1,000-by-1,000-pixel grid. Each box in the grid can be identified as either skin or not skin, smooth or not smooth, along hundreds or even thousands of binaries, many of which would never be noticed by a human being. Moreover, such parameters can be related to one another; so, for example, regions hued as ‘lips’ or ‘eyes’ might have a certain maximum length, width, or ratio to one another (such that a person’s facial ‘signature’ reliably has eyes that are 1.35 times as long as they are wide). Add up enough of these ratios for easily recognized features (ears, eyebrows, foreheads, and so on), and software can quickly find a set of mathematical parameters unique to a given person – or at least unique enough that an algorithm can predict that a given picture is, or is not, a picture of a given person, with a high degree of accuracy. The technology found early commercial success with banks, which needed a way to recognize numbers on cheques (given the wide variety of human handwriting). With enough examples of written numbers (properly reduced to data via dark or filled spaces on a grid), and computational power, this recognition can become nearly perfect.

Before assenting too quickly to the application of such methods to words in cases (as we see them applied to features of faces), we should note that there are not professions of ‘face recognizers’ or ‘number recognizers’ among human beings. So while Facebook’s face recognition algorithm, or TD Bank’s cheque sorter, do not obviously challenge our intuitions about how we recognize faces or numbers, applying ML to legal cases should be marked as a jarring imperialism of ML methods into domains associated with a rich history of meaning (and, to use a classic term from the philosophy of social sciences, Verstehen). In the realm of face recognizing, ‘whatever works’ as a pragmatic ethic of effectiveness underwrites some societies’ acceptance of width/length ratios and other methods to assure algorithmic recognition and classification of individuals.[ 42] The application of ML approaches devoid of apprehension of meaning in the legal context is more troubling. For example, Aletras and colleagues acknowledge that there are cases where the model predicts the incorrect outcome because of the similarity in words in cases that have opposite results. In this case, even if information regarding specific words that triggered the SVM classifier were output, users might not be able to easily determine that the case was likely misclassified.[ 43] Even with confidence interval outputs, this type of problem does not appear to have an easy solution. This is particularly troubling for due process if such an algorithm, in error, incorrectly classified someone’s case because it contained language similarities to another very different case.[ 44] When the cases are obviously misclassified in this way, models like this would likely ‘surreptitiously embed biases, mistakes and discrimination, and worse yet, even reiterate and reinforce them on the new cases processed.’[ 45] So, too, might a batch of training data representing a certain time period when a certain class of cases were dominant help ensure the dominance of such cases in the future. For example, the ‘most predictive topic’ for Article 8 decisions included prominently the words ‘son, body, result, Russian.’ If the system were used in the future to triage cases, ceteris paribus, it might prioritize cases involving sons over daughters or Russians over Poles.[ 46] But if those future cases do not share the characteristics of the cases in the training set that led to the ‘predictiveness’ of ‘son’ status or ‘Russian’ status, their prioritization would be a clear legal mistake.

Troublingly, the entire ‘predictive’ project here may be riddled with spurious correlations. As any student of statistics knows, if one tests enough data sets against one another, spurious correlations will emerge. For example, Tyler Vigen has shown a very tight correlation between the divorce rate in Maine and the per capita consumption of margarine between 2000 and 2009.[ 47] It is unlikely that one variable there is driving the other. Nor is it likely that some intervening variable is affecting both butter

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWIm… 6/14

consumption and divorce rates in a similar way, to ensure a similar correlation in the future. Rather, this is just the type of random association one might expect to emerge once one has thrown enough computing power at enough data sets.

It is hard not to draw similar conclusions with respect to Aletras and colleagues’ ‘predictive’ project. Draw enough variations from the ‘bag of words,’ and some relationships will emerge. Given that the algorithm only had to predict ‘violation’ or ‘no violation,’ even a random guessing program would be expected to have a 50 per cent accuracy rate. A thought experiment easily deflates the meaning of their trumpeted 79 per cent ‘accuracy.’ Imagine that the authors had continual real time surveillance of every aspect of the judges’ lives before they wrote their opinions: the size of the buttons on their shirts and blouses, calories consumed at breakfast, average speed of commute, height and weight, and so forth. Given a near infinite number of parameters of evaluation, it is altogether possible that they could find that a cluster of data around breakfast type, or button size, or some similarly irrelevant characteristics, also added an increment of roughly 29 per cent accuracy to the baseline 50 per cent accuracy achieved via randomness (or always guessing violation). Should scholars celebrate the ‘artificial intelligence’ behind such a finding? No. Ideally, they would chuckle at it, as readers of Vigen’s website find amusement at random relationships between, say, number of letters in winning words at the National Spelling Bee and number of people killed by venomous spiders (which enjoys a 80.57 per cent correlation).

This may seem unfair to Aletras and colleagues since they are using so much more advanced math than Vigen is. However, their models do not factor in meaning, which is of paramount importance in rights determinations. To be sure, words like ‘burial,’ ‘attack,’ and ‘died’ do appear properly predictive, to some extent, in Article 8 decisions and cause no surprise when they are predictive of violations.[ 48] But what are we to make of inclusion of words like ‘result’ in the same list? There is little to no reasoned explanation in their work as to why such words should be predictive with respect to the corpus, let alone future case law.

This is deeply troubling, because it is a foundational principle of both administrative and evidence law that irrelevant factors should not factor into a decision. To be sure, there is little reason the ECtHR would use such a crude model to determine the outcome of cases before it or even to use it as a decision aide. However, software applications often are used in ways for which they were not intended. When they are billed as predictive models, attorneys and others could likely use the models for their own triage purposes. This is especially dangerous when attorneys are generally not very familiar with statistical analysis and ML. The legal community’s ability to scrutinize such models, and correctly interpret their results, is questionable.[ 49] Journalistic hype around studies like this one shows that public understanding is likely even more impaired.[ 50]

Aletras and colleagues are aware of many problems with their approach, and, in the paper, they continually hedge about its utility. But they still assert:

Overall, we believe that building a text-based predictive system of judicial decisions can offer lawyers and judges a useful assisting tool. The system may be used to rapidly identify cases and extract patterns that correlate with certain outcomes. It can also be used to develop prior indicators for diagnosing potential violations of specific Articles in lodged applications and eventually prioritise the decision process on cases where violation seems very likely. This may improve the significant delay imposed by the Court and encourage more applications by individuals who may have been discouraged by the expected time delays.

The paper’s abstract claims the model ‘can be useful, for both lawyers and judges, as an assisting tool to rapidly identify cases and extract patterns which lead to certain decisions.’ Aletras, in a podcast interview, also stated that the model could be used for case triage.[ 52] However, a judicial system that did so, without attending to all of the critiques we have developed above (and perhaps many more), would seriously jeopardize its legitimacy. For example, consider how non-representative the training data here is. Aletras and colleagues openly acknowledge a potential issue with ‘selection effect,’ or the ability of the model to be useful to the multitude of cases that were dismissed before being heard by the Grand Chamber.[ 53] Petitions that were determined to be inadmissible before trial were not included in this study, as they were ‘not reported.’ Therefore, the model’s output is narrowed significantly. Despite these problems, there is a danger that the model could be deployed by bureaucrats at the ECtHR to prioritize certain petitions, given that the Court is deluged with thousands of petitions each year and can only decide a fraction of those cases.

50

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWIm… 7/14

Without a clear understanding of how the model is predicting the success of a claim, it would be irresponsible for judges or their clerks or subordinates to use it in this way.[ 54]

IV Conclusion This article has explored flaws in Aletras and colleagues’ Predicting Judicial Decisions to flag potential flaws in many ML-driven research programs using NLP to predict outcomes in legal systems. When such research programs ignore meaning – the foundation of legal reasoning – their utility and social value is greatly diminished. We also believe that such predictive tools are, at present, largely irrelevant to debates in jurisprudence. If they continue to gloss over the question of social and human meaning in legal systems, NLP researchers should expect justified neglect of their work by governments, law firms, businesses, and the legal academy. [ 55]

Of course, the critiques above should not be construed as a rejection of all statistical analysis of patterns in judicial decision making. Such analyses can shed light on troubling patterns of rulings. They can also alert decision makers when biases begin to emerge.[ 56] For example, a notable study in behavioural economics recently exposed judges imposing shorter sentences after lunch than before it. [ 57] Ideally, such a study does not inspire predictive analytics firms to find other extraneous influences on decision making and to advise clients on how to take advantage of them (by, for example, sending tall attorneys to advocate before judges revealed to be partial to tall advocates). Rather, such disturbing findings are better framed as a prompt to judges to start developing ways of guarding against this hunger bias once they are alerted to it (or, failing that, to snack regularly).[ 58]

As clients, bar associations, and legislators debate how far to permit software to substitute for legal counsel and human consideration of cases, they should keep the limits of predictive analytics in mind. Access to legal information does not constitute access to justice. That depends on well-staffed courts, qualified advocates, and an executive willing and able to enforce the law. Software can generate useful lists of relevant facts and cases, templatized forms, and analytics. However, futurists are too quick to downplay the complexity of legal processes and documents in order to portray them as readily computerizable. It takes time and effort to understand the values internal to substantive law and legal processes. Those outside the profession have little sense of what they are missing when politicians or business leaders opt for computer code to displace legal code in the resolution of disputes – or even to prioritize which disputes are to be heard.[ 59]

Efficiency simpliciter is not an adequate rationale for modelling predictions of judicial behaviour. Nor are such approaches’ potential to generate more complaints and litigation (where success is predicted) or to discourage such interventions (where success is not predicted) necessarily a positive affordance for society as a whole. Critics of ML have long complained that the bias in corpora of past data may simply be recycled into bias in future predictions. Heretofore, authors who have attempted to model the future behaviour of courts from their past behaviour have given us little sense of how such biases may be counteracted, or even detected, once their approaches are more widely disseminated. Modelling of judicial behaviour is often heralded as an advance in access to justice; the better we can predict judgments, so the thinking goes, the better we can know what penalties law breaking will result in. But its partisans underestimate the degree to which inequality in access to human attorneys is exacerbated by current inequality in access to the best software, automated forms, legal analytics, and other technology and by the many ways in which past inequality has deeply shaped training data.[ 60]

To be sure, predictive analytics in law may improve over time. But that possibility does not undermine our position. The most important contribution of our critique is not to cast doubt on the likelihood of further advances in algorithms’ predictive power; rather, we question where the results of such projects are useful to the legal system and demonstrate how they threaten to undermine its legitimacy. For example, the pragmatic and the critical uses of predictive algorithms are in tension. An analyst may reveal biases in judgments, such as legally irrelevant details that somehow seem to be correlated with, and perhaps even driving, decisions. The same analyst may sell the predictive tool to attorneys or courts as a case selection or triage tool. But precisely to the extent that past training data reflect bias, they are likely to reinforce and spread the influence of that bias when they are used by actors outside the judicial system (who may, for example, not even try to advocate for a particular class of meritorious cases since decision makers are

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWIm… 8/14

systematically biased against them). Academics should never assume that merely increasing the ability to predict the future (or analyze what was most important in decisions of the past) is an unalloyed good. Rather, a long history of social scientific research on reflexivity reveals how easily such analysis exacerbates, rather than resolves, the problems it reveals.

To the extent that such reflexivity develops, better that the Pandora’s Box of legal predictive analytics had never been opened. ML may simply replay regrettable aspects of the past into the future. On the other hand, once robust predictive models are available, jurisdictions should carefully consider rules to level the playing field and to ensure that all parties to a dispute have access to critical technology. The law itself is free to consult and copy. To the extent that legal technology determines or heavily influences advocacy, it, too, should be open on equal terms to all parties to a dispute. And, at the very least, any deployment of such approaches during litigation should be revealed to the judge presiding over it, and to opposing parties, when it is deployed. Such a general rule of disclosure is vital to future efforts to understand the influence of ML, artificial intelligence, and predictive analytics on the legal system as a whole.

Footnotes 1 We wish to thank Julia Powles, Andrew D Selbst, Gretchen Greene, and Will Bateman for expert commentary on earlier drafts.

2 Frank Pasquale, The Black Box Society (Cambridge, MA: Harvard University Press, 2015); Ariel Ezrachi & Maurice Stucke, Virtual Competition (Cambridge, MA: Harvard University Press, 2016); Mireille Hildebrandt, ‘Law as Computation in the Era of Artificial Legal Intelligence: Speaking Law to the Power of Statistics’ (2018) 68:Suppl UTLJ 12 [Hildebrandt, ‘Law as Computation’]: ‘We are now living with creatures of our own making that can anticipate our behaviours and pre-empt our intent. They inform our actions even if we don’t know it and their inner workings seem as opaque as our own unconscious.’

3 Quoted in Jason Tashea, ‘Risk-Assessment Algorithms Challenged in Bail, Sentencing and Parole Decisions,’ American Bar Association Journal (1 March 2017), online: American Bar Association <http://www.abajournal.com/magazine/article/algorithm%5fbail%5fsentencing%5fparole>.

4 Chad Oldfather, ‘Writing, Cognition, and the Nature of the Judicial Function’ (2007) 96 Geo LJ 1283 (discussing the types of decisions that must be justified in writing); see also Simon Stern, ‘Copyright Originality and Judicial Originality’ (2013) 63 UTLJ 385 (discussing the ways in which judicial writing can achieve its justificatory function). Administrative Procedure Act, 60 Stat 237 (1946).

5 BF Skinner, Beyond Freedom and Dignity (Hardmondsworth, UK: Penguin Books, 1971).

6 For a powerful defence of persuasion and other forms of rhetoric common in legal and political contexts, see Bryan Garsten, Saving Persuasion (Cambridge, MA: Harvard University Press, 2006).

7 Brian Sheppard, ‘Why Digitizing Harvard’s Law Library May Not Improve Access to Justice,’ Bloomberg BigLaw Business (12 November 2015), online: Bureau of National Affairs <https://bol.bna.com/why-digitizing-harvards-law-library-may-not-improve- access-to-justice/>: ‘Ravel has already said that the users will not have free access to its most powerful analytic and research tools. Those will exist behind a paywall.’ Ravel Law has been described as an ‘artificial intelligence company’ that ‘provides software which can predict the arguments which would win over a judge.’ ‘RavelLaw Acquired by LexisNexis,’ Global Legal Post (9 June 2017), online: Global City Media <http://www.globallegalpost.com/big-stories/ravel-law-acquired-by-lexisnexis-32302305/>; J Dixon, ‘Review of Legal Analytics Platform,’ Litigation World (23 September 2016), online: Lex Machina <https://lexmachina.com/wp-content/uploads/2016/10/LitigationWorld-Review-2016.pdf> (describing cost of predictive analytics platform for patent cases); Frank Pasquale, ‘Technology, Competition, and Values’ (2007) 8 Minn JL Sci & Tech 607 at 608 (describing inequality-enhancing effects of technological arms races).

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWIm… 9/14

8 Oliver Wendell Holmes, ‘The Path of the Law’ (1897) 110 Harv L Rev 991; Ian Kerr, ‘Chapter 4: Prediction, Preemption, Presumption: The Path of Law after the Computational Turn’ in Privacy, Due Process and the Computational Turn: The Philosophy of Law Meets the Philosophy of Technology (Abingdon, UK: Routledge, 2013) 91.

9 Hildebrandt, ‘Law as Computation,’ supra note 1 at 33.

10 Ibid. Ironically, just as purveyors of opaque algorithmic systems are influencing vulnerable legal systems, jurisdictions with more robust protections for human dignity are demanding explanations for algorithmic decision making and accountability for those who use such algorithms. See e.g. Andrew D Selbst & Julia Powles, ‘Meaningful Information and the Right to Explanation’ (2017) 7:4 International Data Privacy Law 233; Gianclaudio Malgieri & Giovanni Comandé, ‘Why a Right to Legibility of Automated Decision-Making Exists in the General Data Protection Regulation’ (2017) 7:4 International Data Privacy Law 243. For background on the development of ‘explainable artificial intelligence,’ see Cliff Kuang, ‘Can an AI Be Taught to Explain Itself?,’ New York Times (21 November 2017), online: The New York Times Company <https://www.nytimes.com/2017/11/21/magazine/can- ai-be-taught-to-explain-itself.html>. Policy makers should try to channel the development of artificial intelligence that ranks, rates, or sorts humans toward explainable (rather than black-box) models.

11 Blaise Agüera y Arcas, Margaret Mitchell, & Alexander Todorov, ‘Physiognomy’s New Clothes,’ Medium (6 May 2017), online: Medium <https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a>; Danah Boyd & Kate Crawford, ‘Critical Questions for Big Data’ (2012) 15:5 Information, Communicaion and Society 662.

12 Federico Cabitza, ‘The Unintended Consequences of Chasing Electric Zebras’ (Insitute of Electrical and Electronics Engineers (IEEE), SMC Interdisciplinary Workshop on the Human Use of Machine Learning, Venice, Italy, 16 December 2016), online: Research Gate <https://www.researchgate.net/publication/311702431%5fThe%5fUnintended%5fConsequences%5fof%5fChasing%5fElectric%5fZebras> [Cabitza, ‘Unintended Consequences’]: ‘ML approach risk[s] to freeze into the decision model two serious and often neglected biases: selection bias, occurring when training data (the above experience E) are not fully representative of the natural case variety due to sampling and sample size; and classification bias, occurring when the single categories associated by the raters to the training data oversimplify borderline cases (i.e., cases for which the observers do not agree, or could not reach an agreement), or when the raters misclassify the cases (for any reason). In both cases, the decision model would surreptitiously embed biases, mistakes and discriminations and, worse yet, even reiterate and reinforce them on the new cases processed.’

13 Jathan Sadowski & Frank Pasquale, ‘The Spectrum of Control: A Social Theory of the Smart City,’ First Monday (31 August 2015), online: First Monday <http://firstmonday.org/article/view/5903/4660>.

14 Nikolaos Aletras et al, ‘Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective’ (2006) 2 PeerJ Computer Science 92. [Aletras et al, ‘Predicting Judicial Decisions’]

15 Ibid. The peer reviewer Gabriela Ferraro also stated that their core thesis is that ‘it is possible to use text processing and machine learning to predict whether, given a case, there has been a violation of an article in the convention of human rights.’ For peer review documents of Aletras et al, see PeerJ Journal, online: Peer J <https://peerj.com/articles/cs-93/reviews/>.

16 Aletras is quoted in Anthony Cuthbertson, ‘Ethical Artificial Intelligence “Judge” Predicts Human Rights Trials,’ Newsweek (24 October 2016), online: Newsweek <http://www.newsweek.com/ethical-artificial-intelligence-judge-predicts-human-rights-trials- 513012>. It seems clear from context that Aletras includes this study in his hopes for artificial intelligence, as it does in his interview for the Australian podcast Future Tense.

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWI… 10/14

17 Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13: ‘[W]e highlight ways in which automatically predicting the outcomes of ECtHR cases could potentially provide insights on whether judges follow a so-called legal model of decision making or their behavior conforms to the legal realists’ theorization, according to which judges primarily decide cases by responding to the stimulus of the facts of the case’ [emphasis in original].

18 See European Court of Human Rights, Questions and Answers (Council of Europe) at 1, 4, online: Council of Europe <http://www.echr.coe.int/Documents/Questions%5fAnswers%5fENG.pdf>; Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13. Convention for the Protection of Human Rights and Fundamental Freedoms, 4 November 1950, 213 UNTS 221 (entered into force 3 September 1953) [ECHR].

19 Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13: ‘Our models can reliably predict ECtHR decisions with high accuracy, i.e., 79% on average.’

20 ‘Could AI Replace Judges and Lawyers?’ BBC News (24 October 2016), online: BBC <http://www.bbc.com/news/av/technology- 37749697/could-ai-replace-judges-and-lawyers>; Ziyaad Borat, ‘Do We Still Need Human Judges in the Age of Artificial Intelligence,’ Open Democracy (8 August 2017), online: openDemocracy <https://www.opendemocracy.net/transformation/ziyaad- bhorat/do-we-still-need-human-judges-in-age-of-artificial-intelligence>.

21 Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13 at 4. Data from this section, or a combination of it with ‘Topics,’ generated the predictive accuracy that was trumpeted in the paper itself, and in the numerous media accounts of it, as a stepping- stone to substitutive automation. We focus in this Part only on the best-performing aspects of the model; our critiques apply a fortiori to worse-performing aspects.

22 To expand the analogy, the analysis of food on a molecular level, disconnected from taste, appearance, or other secondary qualities perceptible by humans, is parallel to natural language processing’s ‘bag-of-words’ approach to processing a text on the level of individual words disconnected from meaning. We use this classic reference to hard-core legal realism’s gustatory approach to irrational indeterminacy in adjudication in honour of Aletras and colleagues’ claims to buttress legal realism. Jerome Frank, Courts on Trial (Princeton: Princeton University Press, 1973) at 162: ‘Out of my own experience as a trial lawyer, I can testify that a trial judge, because of overeating at lunch, may be somnolent in the afternoon court-session that he fails to hear an important item of testimony and so disregards it when deciding the case.’

23 Ibid at 3.

24 Ibid at 4 [emphasis added]. The authors themselves acknowledge that ‘[t]he choices made by the Court when it comes to formulations of the facts incorporate implicit or explicit judgments to the effect that some facts are more relevant than others. This leaves open the possibility that the formulations used by the Court may be tailor-made to fit a specific preferred outcome.’ Ibid. They then give some reasons to believe that this stacking-the-deck effect is ‘mitigated’ but give no clear sense of how to determine the degree to which it is mitigated.

25 Indeed, some legal realists may assume that it is very difficult to identify the facts driving judges’ decisions in their written opinions since they believe judges are skilled at obscuring both their ideology and fact-driven concerns with complex legal doctrine. One need not adopt a Straussianly esoteric hermeneutics to understand the challenges such a view poses to the very text-based analytics deemed supportive of it by Aletras and colleagues.

26 Aletras and colleagues conclude that their study supports the proposition that the Court’s decisions are ‘significantly affected by the stimulus of the facts.’ Ibid. This ‘finding,’ such as it is, is a rather trivial one without further explanation. Judicial proceedings

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWI… 11/14

not ‘significantly affected by’ facts would be arbitrary and lawless. The term ‘stimulus’ evokes behaviourist logic of mind as machine. What matters is not so much the reasoning in the cases, but the ‘facts’ considered as bare stimulus, an input processed by the judicial system into an output of decision.

27 See Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13.

28 See ibid.

29 See ibid.

30 See ibid.

31 See ibid. Moreover, focus on an appellate court like the European Court of Human Rights (ECtHR) also biases the outcome in favour of realism since clear opportunities for the application of law are almost certainly resolved at lower levels of the judicial system.

32 ECtHR, ‘HUDOC’ (10 September 2017), online: Council of Europe <https://hudoc.echr.coe.int/eng#‘documentcollectionid2’: [‘GRANDCHAMBER’,’CHAMBER’]> (18,438 of 55,571 case decisions as of 10 September 2017 were in English).

33 See Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13; Nikolaos Aletras et al, ‘ECHR Dataset,’ Figshare, online: Figshare <https://figshare.com/s/6f7d9e7c375ff0822564>; Christopher D Manning, Prabhakar Raghavan, & Himrich Schütze, Introduction to Information Retrieval (Cambridge, MA: Cambridge University Press, 2008), online: Stanford University <https://nlp.stanford.edu/IR-book/html/htmledition/large-and-difficult-category-taxonomies-1.html>; Andrea Dal Pozzolo et al, ‘Calibrating Probability with Undersampling for Unbalanced Classification’ (Paper presented at the 2015 IEEE Symposium Series on Computational Intelligence, Cape Town, South Africa, 7–10 December 2015), online: University of Notre Dame <https://www3.nd.edu/rjohns15/content/papers/ssci2015_calibrating.pdf > (under-sampling to achieve an equal number of violation/no violation like what is done here is typical with ‘unbalanced datasets’ in a support vector machine, which is the algorithm used in Aletras and colleagues).

34 See Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13; Kavita Ganesan, ‘What Are N-Grams?’ Text Mining, Analytics, and More (23 November 2014), online: Text Mining, Analytics & More Blog <http://text-analytics101.rxnlp.com/2014/11/what-are- n-grams.html> [Ganesan, ‘What Are N-Grams?’].

35 See Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13.

36 See ibid.

37 See ibid.

38 See ibid; Ganesan, ‘What Are N-Grams?’ supra note 33: ‘N-grams of texts are extensively used in text mining and natural language processing tasks. They are basically a set of co-occurring words within a given window and when computing the n-grams you typically move one word forward [although you can move X words forward in more advanced scenarios]. For example, for the sentence “The cow jumps over the moon”. If N = 2 (known as bigrams), then the n-grams would be: the cow; cow jumps; jumps over; over the; the moon. So you have 5 n-grams in this case. Notice that we moved from the cow, to cow jumps, to jumps over, etc, essentially moving one word forward to generate the next bigram. If N = 3, the n-grams would be: the cow jumps; cow jumps over; jumps over the; over the moon. So you have 4 n-grams in this case.’

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWI… 12/14

39 See Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13.

40 See ibid.

41 See ibid.

42 We should note that this acceptance is not universal – wary of violating some jurisdictions’ laws, Facebook does not operate the face recognition algorithm automatically in them.

43 See Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13: ‘On the other hand, cases have been misclassified mainly because their textual information is similar to cases in the opposite class.’

44 Cabitza, ‘Unintended Consequences,’ supra note 11; Hildebrandt, ‘Law as Computation,’ supra note 1 at 28: ‘I will discuss four implications that may disrupt the concept and the Rule of Law: (1) the opacity of ML software may render decisions based on its output inscrutable and thereby incontestable; (2) the shift from meaningful information to computation entails a shift from reason to statistics and from argumentation to simulation; (3) in the process of developing and testing data-driven legal intelligence, a set of fundamental rights may be infringed, compromised, or even violated, notably the right to privacy, to non-discrimination, to the presumption of innocence and due process, while also impacting consumer and employee protection and competition law. Finally, I argue that (4), to the extent that the algorithms become highly proficient, due to being trained by excellent domain experts in law, lawyers may outsource part of their work, as a result of which they may deskill as the software achieves high levels of accuracy.’

45 Cabitza, ‘Unintended Consequences,’ supra note 11.

46 Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13 at 16.

47 Tyler Vigen, Spurious Correlations, online: Spurious Media <http://www.tylervigen.com/spurious-correlations>. Vigen, ‘a criminology student at Harvard Law School, wrote a computer programme to mine datasets for statistical correlations. He posts the funniest ones to Spurious Correlations,’ which is a website. James Fletcher, ‘Spurious Correlations: Margarine Linked to Divorce?’ BBC News (26 May 2014), online: BBC <http://www.bbc.com/news/magazine-27537142>.

48 Aletras et al, ‘Predicting Judicial Decision,’ supra note 13 at 15.

49 Hildebrandt, ‘Law as Computation,’ supra note 1 at 29: ‘Whereas most of us have learnt to read and write, we were not trained to “read” and “write” statistics, we cannot argue against the assumptions that inform ML applications, and we miss the vocabulary that frames “training sets,” “hypotheses space,” “target functions,” “optimization,” “overfitting” and the more. So, even if experts manage to verify the software, most of us lack the skills to make sense of it; we may in fact be forced to depend on claims made by those who stand to gain from its adoption and on the reputation of those offering this type of data driven legal services. This is also the case when we buy and drive a car, but the reliability here is easier to test. We recognize a car crash and would not appreciate a car that drives us from A to C if we want to get to B; with legal intelligence we may simply not detect incorrect interpretations.’

50 Michael Byrne, ‘How to Navigate the AI Hypestorm,’ The Outline (15 September 2017), online: Independent Media <https://theoutline.com/post/2248/how-to-navigate-the-coming-a-i-hypestorm>: ‘[T]he basic playbook is to take a bunch of examples of some phenomenon to be detected, reduce them to data, and then train a statistical model. Faces [and cases] reduce to data just like any other sort of image [or text] reduces to data. … [If the] machine learning model [is] making better predictions than

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWI… 13/14

humans, that’s … completely meaningless. Like, obviously the computer is going to do a better job because it’s solving a math problem and people are solving a people problem.’

51 Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13 at 3.

52 This use was suggested in a podcast interview. ‘Augmented Eternity and the Potential of Prediction,’ Future Tense (12 March 2017), online: American Broadcasting Corporation <http://www.abc.net.au/radionational/programs/futuretense/augmented-eternity- and-the-potential-ofprediction/8319648>. While correctly insisting that the model could not be used to replace judges, its potential to prioritize cases for consideration was discussed.

53 See Aletras et al, ‘Predicting Judicial Decisions,’ supra note 13: ‘[The] selection effect pertains to cases judged by the ECtHR as an international court. Given that the largest percentage of applications never reaches the Chamber or, still less, the Grand Chamber, and that cases have already been tried at the national level, it could very well be the case that the set of ECtHR decisions on the merits primarily refers to cases in which the class of legal reasons, defined in a formal sense, is already considered as indeterminate by competent interpreters. This could help explain why judges primarily react to the facts of the case, rather than to legal arguments. Thus, further text-based analysis is needed in order to determine whether the results could generalise to other courts, especially to domestic courts deciding ECHR claims that are placed lower within the domestic judicial hierarchy.’ ECHR, supra note 17.

54 Moreover, difference in legal knowledge and competence between attorneys and pro se litigants is often an important factor in case outcomes. What if differences in resources drove some of the differences in outcomes in the training data here? The types of claims prioritized by expensive attorneys (or those able to afford them) could end up as part of a template for potential future winning (or expedited) claims, further stratifying litigants. For more on the importance of resources, see Judicial Council of California, Handling Cases Involving Self-Represented Litigants (San Francisco: Judicial Council of California, 2017), online: Judicial Council of California <http://www.courts.ca.gov/documents/benchguide%5fself%5frep%5flitigants.pdf>: ‘[S]elf- represented litigants often have difficulty preparing complete pleadings, meeting procedural requirements, and articulating their cases clearly to the judicial officer. These difficulties produce obvious challenges.’

55 Premature quantification, metricization, and algorithmatization all share this allergy to interpretation and meaning. See e.g. Christopher Newfield & Heather Steffen, ‘Remaking the University: Metrics Noir,’ Los Angeles Review of Books (11 October 2017) (commenting on ‘a particularly subtle and difficult limit of the numerical: its aversion to the interpretive processes through which the complexities of everyday experiences are assessed. Physical and mental states, injuries and attitudes toward them, people in variable social positions always appear together, and their qualities need to be sorted out’); Frank Pasquale, ‘Professional Judgment in an Era of Artificial Intelligence and Machine Learning,’ Boundary2 (forthcoming).

56 Julia Angwin et al, ‘Machine Bias,’ ProPublica (23 May 2016), online: ProPublica <https://www.propublica.org/article/machine- bias-risk-assessments-in-criminal-sentencing> (discussing Northpointe recidivism scoring and the biases of judges making decisions without such a system); see also McCleskey v Kemp, 481 US 279 (1987) (using empirical data on bias in peremptory sentencing); Donald G Gifford & Brian Jones, ‘Keeping Cases from Black Juries: An Empirical Analysis of How Race, Income Inequality, and Regional History Affect Tort Law’ (2016) 73 Wash & Lee L Rev 557.

57 Shai Danziger, Jonathan Levav, & Liora Avanim-Pesso, ‘Extraneous Factors in Judicial Decisions’ (2010) 108:17 Proceedings of the National Academy of Sciences 6889 (finding that the ‘likelihood of a favorable ruling is greater at the very beginning of the work day or after a food break than later in the sequence of cases’). But do note that it is always wise to be cautious about the extrapolability of any given ‘judicial bias’ study. See e.g. Holger Spamann, ‘Are Sleepy Punishers Really Harsh Punishers? Comment’ Harvard Public Law Working Paper no 17–15 (2017), online: SSRN <https://papers.ssrn.com/sol3/papers.cfm? abstract_id=2916375>.

3/22/2020 Prediction, persuasion, and the jurisprudence of behaviourism: UC MegaSearch

eds.a.ebscohost.com/eds/detail/detail?vid=1&sid=80a53932-b932-4bf6-926e-093727bceef6%40sessionmgr4007&bdata=JkF1dGhUeXBlPXNoaWI… 14/14

58 David Golumbia, ‘Judging Like a Machine’ in DM Berry & M Dieter, eds, Postdigital Aesthetics (London: Palgrave Macmillan, 2015) 123 at 135: ‘As attractive as it may be to allow more and more of our world to be judged by machines, we must take very seriously the idea that human judgement, though it be systematically flawed, is nevertheless the only responsible form for human power to take.’

59 Danielle Keats Citron, ‘Technological Due Process’ (2008) 85 Wash L Rev 1249 at 1251 (documenting problematic implementations of automated benefits determinations); David A Super, ‘An Error Message for the Poor,’ New York Times (3 January 2014) A25 (critiquing substitution of technology for informed human judgment).

60 Frank Pasquale, ‘Is Eviction-as-a-Service the Hottest New #LegalTech Trend?,’ Concurring Opinions, online: Concurring Opinions <https://concurringopinions.com/archives/2016/02/is-eviction-as-a-service-the-hottest-new-legaltech-startup.html>.

61 Professor of Law, University of Maryland, United States. http://orcid.org/0000-0001-6104-0944

62 Senior Systems Engineer, United States

~~~~~~~~

By Frank Pasquale and Glyn Cashwell

Copyright of University of Toronto Law Journal is the property of University of Toronto Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.