repost
BIBLIOGRAPHY 1
Prediction, persuasion, and the jurisprudence of behaviourism
Contents
2. II Stacking the deck: ‘predicting’ the contemporaneous
3. III Problematic characteristics of the ECtHR textual ‘predictive’ model
5. Footnotes
Full Text
There is a growing literature critiquing the unreflective application of big data, predictive analytics, artificial intelligence, and machine-learning techniques to social problems. Such methods may reflect biases rather than reasoned decision making. They also may leave those affected by automated sorting and categorizing unable to understand the basis of the decisions affecting them. Despite these problems, machine-learning experts are feeding judicial opinions to algorithms to predict how future cases will be decided. We call the use of such predictive analytics in judicial contexts a jurisprudence of behaviourism as it rests on a fundamentally Skinnerian model of cognition as a black-boxed transformation of inputs into outputs. In this model, persuasion is passé; what matters is prediction. After describing and critiquing a recent study that has advanced this jurisprudence of behaviourism, we question the value of such research. Widespread deployment of prediction models not based on the meaning of important precedents and facts may endanger the core rule-of-law values.
artificial intelligence; cyber law; machine learning; jurisprudence; predictive analysis
A growing chorus of critics are challenging the use of opaque (or merely complex) predictive analytics programs to monitor, influence, and assess individuals’ behaviour. The rise of a ‘black box society’ portends profound threats to individual autonomy; when critical data and algorithms cannot be a matter of public understanding or debate, both consumers and citizens are unable to comprehend how they are being sorted, categorized, and influenced.[ 2]
A predictable counter-argument has arisen, discounting the comparative competence of human decision makers. Defending opaque sentencing algorithms, for instance, Christine Remington (a Wisconsin assistant attorney general) has stated: ‘We don’t know what’s going on in a judge’s head; it’s a black box, too.’[ 3] Of course, a judge must (upon issuing an important decision) explain why the decision was made; so too are agencies covered by the Administrative Procedure Act obliged to offer a ‘concise statement of basis and purpose’ for rule making.[ 4] But there is a long tradition of realist commentators dismissing the legal justifications adopted by judges as unconvincing fig leaves for the ‘real’ (non-legal) bases of their decisions.
In the first half of the twentieth century, the realist disdain for stated rationales for decisions led in at least two directions: toward more rigorous and open discussions of policy considerations motivating judgments and toward frank recognition of judges as political actors, reflecting certain ideologies, values, and interests. In the twenty-first century, a new response is beginning to emerge: a deployment of natural language processing and machine-learning (ML) techniques to predict whether judges will hear a case and, if so, how they will decide it. ML experts are busily feeding algorithms with the opinions of the Supreme Court of the United States, the European Court of Human Rights, and other judicial bodies as well as with metadata on justices’ ideological commitments, past voting record, and myriad other variables. By processing data related to cases, and the text of opinions, these systems purport to predict how judges will decide cases, how individual judges will vote, and how to optimize submissions and arguments before them.
This form of prediction is analogous to forecasters using big data (rather than understanding underlying atmospheric dynamics) to predict the movement of storms. An algorithmic analysis of a database of, say, 10,000 past cumulonimbi sweeping over Lake Ontario may prove to be a better predictor of the next cumulonimbus’s track than a trained meteorologist without access to such a data trove. From the perspective of many predictive analytics approaches, judges are just like any other feature of the natural world – an entity that transforms certain inputs (such as briefs and advocacy documents) into outputs (decisions for or against a litigant). Just as forecasters predict whether a cloud will veer southwest or southeast, the user of a ML system might use machine-readable case characteristics to predict whether a rainmaker will prevail in the courtroom.
We call the use of algorithmic predictive analytics in judicial contexts an emerging jurisprudence of behaviourism, since it rests on a fundamentally Skinnerian model of mental processes as a black-boxed transformation of inputs into outputs.[ 5] In this model, persuasion is passé; what matters is prediction.[ 6] After describing and critiquing a recent study typical of this jurisprudence of behaviourism, we question the value of the research program it is advancing. Billed as a method of enhancing the legitimacy and efficiency of the legal system, such modelling is all too likely to become one more tool deployed by richer litigants to gain advantages over poorer ones.[ 7] Moreover, it should raise suspicions if it is used as a triage tool to determine the priority of cases. Such predictive analytics are only as good as the training data on which they depend, and there is good reason to doubt such data could ever generate in social analysis the types of ground truths characteristic of scientific methods applied to the natural world. While fundamental physical laws rarely if ever change, human behaviour can change dramatically in a short period of time. Therefore, one should always be cautious when applying automated methods in the human context, where factors as basic as free will and political change make the behaviour of both decision makers, and those they impact, impossible to predict with certainty.[ 8]
Nor are predictive analytics immune from bias. Just as judges bring biases into the courtroom, algorithm developers are prone to incorporate their own prejudices and priors into their machinery.[ 9] In addition, biases are no easier to address in software than in decisions justified by natural language. Such judicial opinions (or even oral statements) are generally much less opaque than ML algorithms. Unlike many proprietary or hopelessly opaque computational processes proposed to replace them, judges and clerks can be questioned and rebuked for discriminatory behaviour.[ 10] There is a growing literature critiquing the unreflective application of ML techniques to social problems.[ 11] Predictive analytics may reflect biases rather than reasoned decision making.[ 12] They may also leave those affected by automated sorting and categorizing unable to understand the basis of the decisions affecting them, especially when the output from the models in anyway affects one’s life, liberty, or property rights and when litigants are not given the basis of the model’s predictions.[ 13]
This article questions the social utility of prediction models as applied to the judicial system, arguing that their deployment may endanger core rule-of-law values. In full bloom, predictive analytics would not simply be a camera trained on the judicial system, reporting on it, but it would also be an engine of influence, shaping it. Attorneys may decide whether to pursue cases based on such systems; courts swamped by appeals or applications may be tempted to use ML models to triage or prioritize cases. In work published to widespread acclaim in 2016, Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preoţiuc-Pietro, and Vasileios Lampos made bold claims about the place of natural language processing (NLP) in the legal system in their article Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective.[ 14] They claim that ‘advances in Natural Language Processing (NLP) and Machine Learning (ML) provide us with the tools to automatically analyse legal materials, so as to build successful predictive models of judicial outcomes.’[ 15] Presumably, they are referring to their own work as part of these advances. However, close analysis of their ‘systematic study on predicting the outcome of cases tried by the European Court of Human Rights based solely on textual content’ reveals that their soi-disant ‘success’ merits closer scrutiny on both positive and normative grounds.
The first question to be asked about a study like Predicting Judicial Decisions is: what are its uses and purposes? Aletras and colleagues suggest at least three uses. First, they present their work as a first step toward the development of ML and NLP software that can predict how judges and other authorities will decide legal disputes. Second, Aletras has clearly stated to media that artificial intelligence ‘could also be a valuable tool for highlighting which cases are most likely to be violations of the European Convention of Human Rights’ – in other words, that it could help courts triage which cases they should hear.[ 16] Third, they purport to intervene in a classic jurisprudential debate – whether facts or law matter more in judicial determinations.[ 17] Each of these aims and claims should be rigorously interrogated, given shortcomings of the study that the authors acknowledge. Beyond these acknowledged problems, there are even more faults in their approach which cast doubt on whether the research program of NLP-based prediction of judicial outcomes, even if pursued in a more realistic manner, has anything significant to contribute to our understanding of the legal system.
Although Aletras and colleagues have used cutting edge ML and NLP methods in their study, their approach metaphorically stacks the deck in favour of their software and algorithms in so many ways that it is hard to see its relevance to either practising lawyers or scholars. Nor is it plausible to state that a method this crude, and disconnected from actual legal meaning and reasoning, provides empirical data relevant to jurisprudential debates over legal formalism and realism. As more advanced thinking on artificial intelligence and intelligence augmentation has already demonstrated, there is an inevitable interface of human meaning that is necessary to make sense of social institutions like law.
II Stacking the deck: ‘predicting’ the contemporaneous
The European Court of Human Rights (ECtHR) hears cases in which parties allege that their rights under the articles of the European Convention of Human Rights were violated and not remedied by their country’s courts.[ 18] The researchers claim that the textual model has an accuracy of ‘79% on average.’[ 19] Given sweepingly futuristic headlines generated by the study (including ‘Could AI [Artificial Intelligence] Replace Judges and Lawyers?’), a casual reader of reports on the study might assume that this finding means that, using the method of the researchers, those who have some aggregation of data and text about case filings can use that data to predict how the ECtHR will decide a case, with 79 per cent accuracy.[ 20] However, that would not be accurate. Instead, the researchers used the ‘circumstances’ subsection in the cases they claimed to ‘predict,’ which had ‘been formulated by the Court itself.’[ 21] In other words, they claimed to be ‘predicting’ an event (a decision) based on materials released simultaneously with the decision. This is a bit like claiming to ‘predict’ whether a judge had cereal for breakfast yesterday based on a report of the nutritional composition of the materials on the judge’s plate at the exact time she or he consumed the breakfast.[ 22] Readers can (and should) balk at using the term ‘prediction’ to describe correlations between past events (like decisions of a court) and contemporaneously generated, past data (like the circumstances subsection of a case). Sadly, though, few journalists breathlessly reporting the study by Aletras and colleagues did so.
To their credit, though, Aletras and colleagues repeatedly emphasize how much they have effectively stacked the deck by using ECtHR-generated documents themselves to help the ML/NLP software they are using in the study ‘predict’ the outcomes of the cases associated with those documents. A truly predictive system would use the filings of the parties, or data outside the filings, that was in existence before the judgement itself. Aletras and colleagues grudgingly acknowledge that the circumstances subsection ‘should not always be understood as a neutral mirroring of the factual background of the case,’ but they defend their method by stating that the ‘summaries of facts found in the “Circumstances” section have to be at least framed in as neutral and impartial a way as possible.’[ 23] However, they give readers no clear guide as to when the circumstances subsection is actually a neutral mirroring of factual background or how closely it relates to records in existence before a judgment that would actually be useful to those aspiring to develop a predictive system.
Instead, their ‘premise is that published judgments can be used to test the possibility of a text-based analysis for ex ante predictions of outcomes on the assumption that there is enough similarity between (at least) certain chunks of the text of published judgments and applications lodged with the Court and/or briefs submitted by parties with respect to pending cases.’[ 24] But they give us few compelling reasons to accept this assumption since almost any court writing an opinion to justify a judgment is going to develop a facts section in ways that reflect its outcome. The authors state that the ECtHR has ‘limited fact finding powers,’ but they give no sense of how much that mitigates the cherry-picking of facts or statements about the facts problem. Nor should we be comforted by the fact that ‘the Court cannot openly acknowledge any kind of bias on its part.’ Indeed, this suggests a need for the Court to avoid the types of transparency in published justification that could help researchers artificially limited to NLP better understand it.[ 25] The authors also state that in the ‘vast majority of cases,’ the ‘parties do not seem to dispute the facts themselves, as contained in the “Circumstances” subsection, but only their legal significance.’ However, the critical issues here are, first, the facts themselves and, second, how the parties characterized the facts before the circumstances section was written. Again, the fundamental problem of mischaracterization – of ‘prediction’ instead of mere correlation or relationship – crops up to undermine the value of the study.
Even in its most academic mode – as an ostensibly empirical analysis of the prevalence of legal realism – the study by Aletras and colleagues stacks the deck in its favour in important ways. Indeed, it might be seen as assuming at the outset a version of the very hypothesis it ostensibly supports. This hypothesis is that something other than legal reasoning itself drives judicial decisions. Of course, that is true in a trivial sense – there is no case if there are no facts – and perhaps the authors intend to make that trivial point.[ 26] However, their language suggests a larger aim, designed to meld NLP and jurisprudence. Given the critical role of meaning in the latter discipline, and their NLP methods’ indifference to it, one might expect an unhappy coupling here. And that is indeed what we find.
In the study by Aletras and colleagues, the corpus used for the predictive algorithm was a body of ECtHR’s ‘published judgments.’ Within these judgments, a summary of the factual background of the case was summarized (by the Court) in the circumstances section of the judgments, but the pleadings themselves were not included as inputs.[ 27] The law section, which ‘considers the merits of the case, through the use of legal argument,’ was also input into the model to determine how well that section alone could ‘predict’ the case outcome.[ 28]
Aletras and colleagues were selective in the corpus they fed to their algorithms. The only judgments that were included in the corpus were those that passed both a ‘prejudicial stage’ and a second review.[ 29] In both stages, applications were denied if they did not meet ‘admissibility criteria,’ which were largely procedural in nature.[ 30] To the extent that such procedural barriers were deemed ‘legal,’ we might immediately have identified a bias problem in the corpus – that is, the types of cases where the law entirely determined the outcome (no matter how compelling the facts may have been) were removed from a data set that was ostensibly fairly representative of the universe of cases generally. This is not a small problem either; the overwhelming majority of applications were deemed inadmissible or struck out and were not reportable.[ 31]
But let us assume, for now, that the model only aspired to offer data about the realist/formalist divide in those cases that did meet the admissibility criteria. There were other biases in the data set. Only cases that were in English, approximately 33 per cent of the total ECtHR decisions, were included.[ 32] This is a strange omission since the NLP approach employed here had no semantic content – that is, the meaning of the words did not matter to it. Presumably, this omission arose out of concerns for making data coding and processing easier. There was also a subject matter restriction that further limited the scope of the sample. Only cases addressing issues in Articles 3, 6, and 8 of the ECHR were included in training and in verifying the model. And there is yet another limitation: the researchers then threw cases out randomly (so that the data set contained an equal number of violation/no violation cases) before using them as training data.[ 33]
III Problematic characteristics of the ECtHR textual ‘predictive’ model
The algorithm used in the case depended on an atomization of case language into words grouped together in sets of one-, two-, three-, and four-word groupings, called n-grams.[ 34] Then, 2,000 of the most frequent n-grams, not taking into consideration ‘grammar, syntax and word order,’ were placed in feature matrices for each section of decisions and for the entire case by using the vectors from each decision.[ 35] Topics, which are created by ‘clustering together n-grams,’ were also created.[ 36] Both topics and n-grams were used to ‘to train Support Vector Machine (SVM) classifiers.’ As the authors explain, an ‘SVM is a machine learning algorithm that has shown particularly good results in text classification, especially using small data sets.’[ 37] Model training data from these opinions were ‘n-gram features,’ which consist of groups of words that ‘appear in similar contexts.’[ 38] Matrix mathematics, which are manipulations on two-dimensional tables, and vector space models, which are based on a single column within a table, were programmed to determine clusters of words that should be similar to one another based on textual context.[ 39] These clusters of words are called topics. The model prevented a word group from showing up in more than one topic. Thirty topics, or sets of similar word groupings, were also created for entire court opinions. Topics were similarly created for entire opinions for each article.[ 40] Since the court opinions all follow a standard format, the opinions could be easily dissected into different identifiable sections.[ 41] Note that these sorting methods are legally meaningless. N-grams and topics are not sorted the way a treatise writer might try to organize cases or a judge might try to parse divergent lines of precedent. Rather, they simply serve as potential independent variables to predict a dependent variable (was there a violation, or was there not a violation, of the Convention).
Before going further into the technical details of the study, it is useful to compare it to prior successes of ML in facial or number recognition. When a facial recognition program successfully identifies a given picture as an image of a given person, it does not achieve that machine vision in the way a human being’s eye and brain would do so. Rather, an initial training set of images (or perhaps even a single image) of the person are processed, perhaps on a 1,000-by-1,000-pixel grid. Each box in the grid can be identified as either skin or not skin, smooth or not smooth, along hundreds or even thousands of binaries, many of which would never be noticed by a human being. Moreover, such parameters can be related to one another; so, for example, regions hued as ‘lips’ or ‘eyes’ might have a certain maximum length, width, or ratio to one another (such that a person’s facial ‘signature’ reliably has eyes that are 1.35 times as long as they are wide). Add up enough of these ratios for easily recognized features (ears, eyebrows, foreheads, and so on), and software can quickly find a set of mathematical parameters unique to a given person – or at least unique enough that an algorithm can predict that a given picture is, or is not, a picture of a given person, with a high degree of accuracy. The technology found early commercial success with banks, which needed a way to recognize numbers on cheques (given the wide variety of human handwriting). With enough examples of written numbers (properly reduced to data via dark or filled spaces on a grid), and computational power, this recognition can become nearly perfect.
Before assenting too quickly to the application of such methods to words in cases (as we see them applied to features of faces), we should note that there are not professions of ‘face recognizers’ or ‘number recognizers’ among human beings. So while Facebook’s face recognition algorithm, or TD Bank’s cheque sorter, do not obviously challenge our intuitions about how we recognize faces or numbers, applying ML to legal cases should be marked as a jarring imperialism of ML methods into domains associated with a rich history of meaning (and, to use a classic term from the philosophy of social sciences, Verstehen). In the realm of face recognizing, ‘whatever works’ as a pragmatic ethic of effectiveness underwrites some societies’ acceptance of width/length ratios and other methods to assure algorithmic recognition and classification of individuals.[ 42] The application of ML approaches devoid of apprehension of meaning in the legal context is more troubling. For example, Aletras and colleagues acknowledge that there are cases where the model predicts the incorrect outcome because of the similarity in words in cases that have opposite results. In this case, even if information regarding specific words that triggered the SVM classifier were output, users might not be able to easily determine that the case was likely misclassified.[ 43] Even with confidence interval outputs, this type of problem does not appear to have an easy solution. This is particularly troubling for due process if such an algorithm, in error, incorrectly classified someone’s case because it contained language similarities to another very different case.[ 44] When the cases are obviously misclassified in this way, models like this would likely ‘surreptitiously embed biases, mistakes and discrimination, and worse yet, even reiterate and reinforce them on the new cases processed.’[ 45] So, too, might a batch of training data representing a certain time period when a certain class of cases were dominant help ensure the dominance of such cases in the future. For example, the ‘most predictive topic’ for Article 8 decisions included prominently the words ‘son, body, result, Russian.’ If the system were used in the future to triage cases, ceteris paribus, it might prioritize cases involving sons over daughters or Russians over Poles.[ 46] But if those future cases do not share the characteristics of the cases in the training set that led to the ‘predictiveness’ of ‘son’ status or ‘Russian’ status, their prioritization would be a clear legal mistake.
Troublingly, the entire ‘predictive’ project here may be riddled with spurious correlations. As any student of statistics knows, if one tests enough data sets against one another, spurious correlations will emerge. For example, Tyler Vigen has shown a very tight correlation between the divorce rate in Maine and the per capita consumption of margarine between 2000 and 2009.[ 47] It is unlikely that one variable there is driving the other. Nor is it likely that some intervening variable is affecting both butter consumption and divorce rates in a similar way, to ensure a similar correlation in the future. Rather, this is just the type of random association one might expect to emerge once one has thrown enough computing power at enough data sets.
It is hard not to draw similar conclusions with respect to Aletras and colleagues’ ‘predictive’ project. Draw enough variations from the ‘bag of words,’ and some relationships will emerge. Given that the algorithm only had to predict ‘violation’ or ‘no violation,’ even a random guessing program would be expected to have a 50 per cent accuracy rate. A thought experiment easily deflates the meaning of their trumpeted 79 per cent ‘accuracy.’ Imagine that the authors had continual real time surveillance of every aspect of the judges’ lives before they wrote their opinions: the size of the buttons on their shirts and blouses, calories consumed at breakfast, average speed of commute, height and weight, and so forth. Given a near infinite number of parameters of evaluation, it is altogether possible that they could find that a cluster of data around breakfast type, or button size, or some similarly irrelevant characteristics, also added an increment of roughly 29 per cent accuracy to the baseline 50 per cent accuracy achieved via randomness (or always guessing violation). Should scholars celebrate the ‘artificial intelligence’ behind such a finding? No. Ideally, they would chuckle at it, as readers of Vigen’s website find amusement at random relationships between, say, number of letters in winning words at the National Spelling Bee and number of people killed by venomous spiders (which enjoys a 80.57 per cent correlation).
This may seem unfair to Aletras and colleagues since they are using so much more advanced math than Vigen is. However, their models do not factor in meaning, which is of paramount importance in rights determinations. To be sure, words like ‘burial,’ ‘attack,’ and ‘died’ do appear properly predictive, to some extent, in Article 8 decisions and cause no surprise when they are predictive of violations.[ 48] But what are we to make of inclusion of words like ‘result’ in the same list? There is little to no reasoned explanation in their work as to why such words should be predictive with respect to the corpus, let alone future case law.
This is deeply troubling, because it is a foundational principle of both administrative and evidence law that irrelevant factors should not factor into a decision. To be sure, there is little reason the ECtHR would use such a crude model to determine the outcome of cases before it or even to use it as a decision aide. However, software applications often are used in ways for which they were not intended. When they are billed as predictive models, attorneys and others could likely use the models for their own triage purposes. This is especially dangerous when attorneys are generally not very familiar with statistical analysis and ML. The legal community’s ability to scrutinize such models, and correctly interpret their results, is questionable.[ 49] Journalistic hype around studies like this one shows that public understanding is likely even more impaired.[ 50]
Aletras and colleagues are aware of many problems with their approach, and, in the paper, they continually hedge about its utility. But they still assert:
Overall, we believe that building a text-based predictive system of judicial decisions can offer lawyers and judges a useful assisting tool. The system may be used to rapidly identify cases and extract patterns that correlate with certain outcomes. It can also be used to develop prior indicators for diagnosing potential violations of specific Articles in lodged applications and eventually prioritise the decision process on cases where violation seems very likely. This may improve the significant delay imposed by the Court and encourage more applications by individuals who may have been discouraged by the expected time delays.50
The paper’s abstract claims the model ‘can be useful, for both lawyers and judges, as an assisting tool to rapidly identify cases and extract patterns which lead to certain decisions.’ Aletras, in a podcast interview, also stated that the model could be used for case triage.[ 52] However, a judicial system that did so, without attending to all of the critiques we have developed above (and perhaps many more), would seriously jeopardize its legitimacy. For example, consider how non-representative the training data here is. Aletras and colleagues openly acknowledge a potential issue with ‘selection effect,’ or the ability of the model to be useful to the multitude of cases that were dismissed before being heard by the Grand Chamber.[ 53] Petitions that were determined to be inadmissible before trial were not included in this study, as they were ‘not reported.’ Therefore, the model’s output is narrowed significantly. Despite these problems, there is a danger that the model could be deployed by bureaucrats at the ECtHR to prioritize certain petitions, given that the Court is deluged with thousands of petitions each year and can only decide a fraction of those cases. Without a clear understanding of how the model is predicting the success of a claim, it would be irresponsible for judges or their clerks or subordinates to use it in this way.[ 54]
This article has explored flaws in Aletras and colleagues’ Predicting Judicial Decisions to flag potential flaws in many ML-driven research programs using NLP to predict outcomes in legal systems. When such research programs ignore meaning – the foundation of legal reasoning – their utility and social value is greatly diminished. We also believe that such predictive tools are, at present, largely irrelevant to debates in jurisprudence. If they continue to gloss over the question of social and human meaning in legal systems, NLP researchers should expect justified neglect of their work by governments, law firms, businesses, and the legal academy.[ 55]
Of course, the critiques above should not be construed as a rejection of all statistical analysis of patterns in judicial decision making. Such analyses can shed light on troubling patterns of rulings. They can also alert decision makers when biases begin to emerge.[ 56] For example, a notable study in behavioural economics recently exposed judges imposing shorter sentences after lunch than before it.[ 57] Ideally, such a study does not inspire predictive analytics firms to find other extraneous influences on decision making and to advise clients on how to take advantage of them (by, for example, sending tall attorneys to advocate before judges revealed to be partial to tall advocates). Rather, such disturbing findings are better framed as a prompt to judges to start developing ways of guarding against this hunger bias once they are alerted to it (or, failing that, to snack regularly).[ 58]
As clients, bar associations, and legislators debate how far to permit software to substitute for legal counsel and human consideration of cases, they should keep the limits of predictive analytics in mind. Access to legal information does not constitute access to justice. That depends on well-staffed courts, qualified advocates, and an executive willing and able to enforce the law. Software can generate useful lists of relevant facts and cases, templatized forms, and analytics. However, futurists are too quick to downplay the complexity of legal processes and documents in order to portray them as readily computerizable. It takes time and effort to understand the values internal to substantive law and legal processes. Those outside the profession have little sense of what they are missing when politicians or business leaders opt for computer code to displace legal code in the resolution of disputes – or even to prioritize which disputes are to be heard.[ 59]
Efficiency simpliciter is not an adequate rationale for modelling predictions of judicial behaviour. Nor are such approaches’ potential to generate more complaints and litigation (where success is predicted) or to discourage such interventions (where success is not predicted) necessarily a positive affordance for society as a whole. Critics of ML have long complained that the bias in corpora of past data may simply be recycled into bias in future predictions. Heretofore, authors who have attempted to model the future behaviour of courts from their past behaviour have given us little sense of how such biases may be counteracted, or even detected, once their approaches are more widely disseminated. Modelling of judicial behaviour is often heralded as an advance in access to justice; the better we can predict judgments, so the thinking goes, the better we can know what penalties law breaking will result in. But its partisans underestimate the degree to which inequality in access to human attorneys is exacerbated by current inequality in access to the best software, automated forms, legal analytics, and other technology and by the many ways in which past inequality has deeply shaped training data.[ 60]
To be sure, predictive analytics in law may improve over time. But that possibility does not undermine our position. The most important contribution of our critique is not to cast doubt on the likelihood of further advances in algorithms’ predictive power; rather, we question where the results of such projects are useful to the legal system and demonstrate how they threaten to undermine its legitimacy. For example, the pragmatic and the critical uses of predictive algorithms are in tension. An analyst may reveal biases in judgments, such as legally irrelevant details that somehow seem to be correlated with, and perhaps even driving, decisions. The same analyst may sell the predictive tool to attorneys or courts as a case selection or triage tool. But precisely to the extent that past training data reflect bias, they are likely to reinforce and spread the influence of that bias when they are used by actors outside the judicial system (who may, for example, not even try to advocate for a particular class of meritorious cases since decision makers are systematically biased against them). Academics should never assume that merely increasing the ability to predict the future (or analyze what was most important in decisions of the past) is an unalloyed good. Rather, a long history of social scientific research on reflexivity reveals how easily such analysis exacerbates, rather than resolves, the problems it reveals.
To the extent that such reflexivity develops, better that the Pandora’s Box of legal predictive analytics had never been opened. ML may simply replay regrettable aspects of the past into the future. On the other hand, once robust predictive models are available, jurisdictions should carefully consider rules to level the playing field and to ensure that all parties to a dispute have access to critical technology. The law itself is free to consult and copy. To the extent that legal technology determines or heavily influences advocacy, it, too, should be open on equal terms to all parties to a dispute. And, at the very least, any deployment of such approaches during litigation should be revealed to the judge presiding over it, and to opposing parties, when it is deployed. Such a general rule of disclosure is vital to future efforts to understand the influence of ML, artificial intelligence, and predictive analytics on the legal system as a whole.
BIBLIOGRAPHY 2
Emergence of Big Data Research in Operations Management, Information Systems, and Healthcare: Past Contributions and Future Roadmap
Contents
1. Introduction
3. Operations and Supply Chain Management
6. Healthcare Information Exchange
10. Way Ahead: Potential Applications and Challenges
11. Internet of Things (IoT) and Smart City
12. Predictive Manufacturing and 3‐D Printer
13. Smart Healthcare
15. Closing Thoughts
16. References
Full Text
In this day, in the age of big data, consumers leave an easily traceable digital footprint whenever they visit a website online. Firms are interested in capturing the digital footprints of their consumers to understand and predict consumer behavior. This study deals with how big data analytics has been used in the domains of information systems, operations management, and healthcare. We also discuss the future potential of big data applications in these domains (especially in the areas of cloud computing, Internet of Things and smart city, predictive manufacturing and 3‐D printing, and smart healthcare) and the associated challenges. In this study, we present a framework for applications of big data in these domains with the goal of providing some interesting directions for future research.
big data; information systems; operations management; healthcare
Data is the new science. Big Data holds the answers
—Pat Gelsinger (Gelsinger[ 43] )
This is an era where we are generating data at an exponential rate. Large quantities of data representing our digital footprint are generated whenever we interact over social media and chat applications, use online shopping portals, or even when we use such ubiquitous applications as Google Search or Google Maps (Marr [ 89] ). Aside from data generated by us as users, an enormous amount of data comes from “smart” devices, that is, devices with sensors that collect data from the physical world and convert them into a digital form (Hashem et al. [ 56] , Riggins and Wamba [ 117] ). This ever‐growing stream of data generation is made possible by the advancements in computing and mobile technology and the increasing accessibility of the Internet. For example, according to a report by the United States Census Bureau, in 2015, 78% of U.S. households had a desktop or laptop, 75% had a handheld computer such as a smartphone, and 77% had a broadband Internet connection (Ryan and Lewis [ 124] ). All of these devices, when connected to the Internet, have the ability to generate data in large quantities for those who know how to aggregate it.
It is these data—texts, reviews, ratings, news, images, videos, audio, email, chat communications, search history, etc.—that form the foundation of big data. Big data is characterized by four dimensions: Volume, Velocity, Variety, and Veracity (Dykes [ 38] , McAfee et al. [ 92] , Zikopoulos and Eaton [ 155] ). Since the data is in unstructured form, a few years ago, it was almost impossible to analyze the data in this form and get meaningful insights. However, today with betterment of analytics tools and technology, not only can we obtain valuable information from the data but also use the insights to predict future trends (Chen et al. [ 17] ). Most of the analytics involve artificial intelligence and machine learning (Marr [ 90] ). The computers are trained to identify patterns from the data and they can spot patterns much more reliably and efficiently than humans. Advanced analytics tools can produce millions of these results in a very short time. A report by Rubinson Partners, a marketing and research firm, shows that advertisers can boost their Return on Advertisement Spending (ROAS) by up to 16× using aggregated big datawhich give them information about the right time of advertising to the consumer (Rubinson [ 122] ).
As a result, there is tremendous curiosity about the application of big data among corporate houses. Anyone who wants to have or maintain leverage over their competitors today is encouraged to gather data and analyze them using big data analytics. However, there is still a lack of knowledge about how to implement bigdata analytics in many companies. In this article, we investigate how several disciplines, specifically Information systems, operations and supply chain management, and healthcare, have applied big data in their domain. We also explore future research avenues for big data in these areas.
There was a time in academic research when data were collected solely for testing hypotheses to confirm our belief about certain phenomena or behaviors. However, when we use the Internet today, we leave a digital footprint that can be easily traced, collected, and utilized by big data analytics to understand and predict consumer behavior. Today it is even possible to store and analyze such massive data at an inexpensive rate. These analytics technologies can deliver new knowledge on their own without active human intervention (Dhar [ 34] ), and as such can be very valuable.
Information systems (IS) has been an interdisciplinary domain conducting research at the intersection of computer technology and data from the business world (Agarwal and Dhar [ 2] ). A majority of the existing research in the IS domain focuses on understanding and implementing processes that increase the efficiency of business operations. Since IS researchers were accustomed to handling huge volume of data, they started with an early advantage as far as research in big data is concerned, when compared to other business disciplines (Goes [ 49] ). IS has contributed to the field of work surrounding big data in many ways, including surrounding issues of data integrity, data security and cybersecurity, social media, e‐commerce, and web/mobile advertising. We briefly discuss the recent work in each of these areas.
Data integrity is critical to big data. To semantically integrate heterogeneous databases, it is essential to identify what entities in a data source map to the same entities in some other data sources so that data have a uniform and common structure across all heterogeneous databases (Kong et al. [ 69] ). This process is called entity reconciliation (Enríquez et al. [ 39] , Zhao and Ram [ 153] ). Entity reconciliation is of paramount importance to the process of data integration and management in the big data environment. Researchers have studied entity reconciliation from various perspectives. For example, Li et al. ([ 82] ) propose a context‐based entity description (CED) for entity reconciliation where objects can be compared with the CED to ascertain their corresponding entities. Some researchers have also studied rule‐based frameworks for entity reconciliation (Li et al. [ 83] ).
Data security is another topic in big data where several research studies have been conducted (e.g., Chen and Zhang [ 16] , Demchenko et al. [ 31] , Katal et al. [67] ). Some studies suggest the use of real‐time security analysis as a measure for risk prevention (Lafuente [ 77] ), whereas some others investigate privacy‐preserving data mining (PPDM) operations (Xu et al. [ 148] ). PPDM is a method of preserving data in such a way that applying data mining algorithms on the data do not disclose any sensitive information about the data. Big data analytics and optimization can be used as an answer against advanced cybersecurity threats (Ji et al. [ 64] ). Since big data covers massive breadth of information sources and enormous depth of data, specifying and detecting risks become very precise (Hurst et al. [ 61] , Sagiroglu and Sinanc [ 125] ).
Some work at the interface of IS‐Marketing research has also touched on the topic of big data. For example, data from social media have been analyzed to comprehend behavior and predict events (Ruths and Pfeffer [ 123] , Xu et al. [ 149] ). In this direction, Qiu and Kumar ([ 115] ) study the performance of prediction markets through a randomized field experiment and find that an increase in audience size and a higher level of online endorsement lead to more precise predictions. Moreover, they also suggest integrating social media in predicting market because social effects and reputational concerns improve the participants’ prediction accuracy. The results from this study recommend that the predictions will be more refined by targeting people of intermediate abilities. Another area of social media research where big data has contributed is text analysis and sentiment mining (Mallipeddi et al. [ 88] , Salehan and Kim [ 126] ). In this area, Kumar et al. ([ 74] ) study the importance of management responses to online consumer reviews. The results show that organizations who chose to respond to consumer comments and reviews experienced a surge in the total number of check‐ins. Findings from this study also confirm that the spillover effect of online management response on neighboring organizations depends on whether the focal organization and the neighboring organizations are direct competitor of each other. Furthermore, Millham and Thakur ([ 96] ) examine the pitfalls of applying big data techniques to social media data. In this direction, Kumar et al. ([ 75] ) propose a novel hierarchical supervised‐learning approach to increase the likelihood of detecting anomalies in online reviews by analyzing several user features and then characterizing their collective behavior in a unified manner. The dishonest online reviews are difficult to detect because of complex interactions between several user characteristics, such as review velocity, volume, and variety. Kumar et al. ([ 75] ) model user characteristics and interactions among them as univariate and multivariate distributions. They then stack these distributions using several supervised‐learning techniques, such as Logistic Regression, Support Vector Machine, and k‐Nearest Neighbors yielding robust meta‐classifiers.
Big data analytics has also been studied from the point of view of strategic decision‐making in e‐commerce (Akter and Wamba [ 3] ) and digital marketing (Fulgoni [ 41] , Minelli et al. [ 97] ). Some of the growing areas of research in e‐commerce include the advertising strategy of online firms and their use of recommender systems (Ghoshal et al. [ 47] , [ 48] , Liu et al. [ 85] ). For example, Liu et al. ([ 85] ) study the advertising game between two electronic retailers subject to a given level of information technology (IT) capacity. They reach the conclusion that if IT capacity constraints of the firms are not included in advertisement decisions, then it may result in wastage of advertisement expenditure. Based on their results, they present implementable insights for policy makers regarding how to control wasteful advertising. Ghoshal et al. ([ 48] ) find that recommendation systems impact the prices of products in both personalizing and non‐personalizing firms.
Furthermore, web and mobile advertising has been an interesting area of research since the arrival of dot‐com firms (Dawande et al. [ 28] , [ 29] , Fan et al. [ 40] , Kumar and Sethi [ 71] , Kumar et al. [ 72] ). Dutta et al. ([ 37] ) and Kumar ([ 70] ) summarize the use and future trends of data analytics and optimization in web and mobile advertising. Mookerjee et al. ([ 100] ) develop a model predicting visitor's click on web advertisements. They then discuss an approach to manage Internet ads so that both click‐rate and revenue earned from clicks are increased. The above group of scholars has also developed a decision‐model that maximizes the advertising firm's revenue subject to a click‐through rate constraint (Mookerjee et al. [ 98] , [ 100] ). Another study uses the real‐world data to validate new optimization methods for mobile advertising (Mookerjee et al. [ 99] ).
IS scholars have also studied big data as a service, for example, a platform combining big data and analytics in cloud computing (Assunção et al. [ 6] , Demirkan and Delen [ 33] , Zheng et al. [ 154] ). For instance, the Big‐Data‐as‐a‐Service (BDaaS) has been explored to yield user‐friendly application programming interfaces (APIs) so that the users can easily access the service‐generated big data analytic tools and corresponding results (Zheng et al. [ 154] ). Cloud computingplays a vital role in the use and adaption of big data analytics because infrastructure requirement and cost of resources can be adjusted according to actual demand (Assunção et al. [ 6] ).
Some studies have also been conducted on IT governance from the perspective of big data (Hashem et al. [ 55] , Tallon [ 135] ) and deception detection (Fuller et al. [ 42] , Rubin and Lukoianova [ 121] ). In the IT governance domain, Tallon ([ 135] ) suggests that good data governance practices maintain a balance between value creation and risk exposure. Implementing such practices help firm earn a competitive leverage from their use of big data and application of big dataanalytics.
Figure summarizes the above discussion. This figure also includes the contributions of big data in Operations and Supply Chain Management, and Healthcare (discussed in the following sections).
Operations and Supply Chain Management
With the betterment of enterprise resource planning (ERP) software, it is easier to capture data at different levels of operations. Firms want to analyze these data to develop more efficient processes. Hence, big data and big data analytics are being used by operations and supply chain academia as well as the industry to get insights from existing data in order to make better and informed decisions (Muhtaroglu et al. [ 103] , Wamba et al. [ 143] ). The key areas in this domain where bigdata has left an impact are supply chain network design, risk management, inventory management, and retail operations.
Big data analytics has been used to align sourcing strategies with the organizational goals (Romano and Formentini [ 119] ) and to evaluate the performance of suppliers (Chai and Ngai [ 15] , Choi [ 21] ). Supply chain network design can itself account for a massive amount of data and hence is a favorite area for applying big data analytics. Researchers have studied supply chain network design where the demand is uncertain (Benyoucef et al. [ 9] , Bouzembrak et al. [ 11] , Soleimani et al. [ 133] ) as well as where the demand is certain (Jindal and Sangwan [ 66] , Tiwari et al. [ 138] ). Firms can use analytics to ascertain the cost, quality, and time‐to‐market parameters of products to gain leverage over competitors (Bloch [ 10] , Luchs and Swan [ 86] , Srinivasan et al. [ 134] ).
Big data analytics has also been applied to maximize production (Noyes et al., [ 108] ) and minimize the material waste (Sharma and Agrawal [ 130] ). Noyes et al. ([ 108] ) recommend that changes in existing manufacturing processes, incorporating automation, and simplification of methods and raw materials, will result in increasing the speed and throughput of in‐process analytics during polysaccharide manufacturing processes. Moreover, Sharma and Agrawal ([ 130] ) implemented fuzzy analytic hierarchy process to solve production control policy selection problem. Inventory challenges, such as cost, demand, and supply fluctuations have also been studied using big data analytics (Babai et al. [ 8] , Hayya et al. [ 58] ). In this direction, Babai et al. ([ 8] ) discuss a new dynamic inventory control method where forecasts and uncertainties related to forecast are exogenous and known at each period.
Big data has also been increasingly used in retailing. In the last decade, retailing has been one of the key areas of research for the OM researchers, especially with the growth of multi‐channel retailing (Mehra et al. [ 95] ). Big data analytics has also been applied to retail operations by firms to reduce cost and to market themselves better than the competition (Dutta et al. [ 37] , Janakiraman et al. [ 62] , Kumar et al. [ 73] ). For instance, big data techniques are now being heavily used in recommender systems that reduce consumer search efforts (Dutta et al. [ 37] ). Kumar et al. ([ 73] ) study how the presence of brick‐and‐mortar stores impacts consumers’ online purchase decision. Furthermore, Janakiraman et al. ([ 62] ) study product returns in multi‐channel retailing taking into consideration consumers’ channel preference and choice.
Healthcare systems in the United States have been rapidly adopting electronic health records (EHRs) and Healthcare Information Exchanges (HIEs) that are contributing to the accumulation of massive quantities of heterogeneous medical data from various sections of the healthcare industry—payers, providers, and pharmaceuticals (Demirezen et al. [ 32] , Rajapakshe et al. [ 116] ). These data can be analyzed in order to derive insights that can improve quality of healthcare (Groves et al. [ 50] ). However, the analyses and practical applications of such data become a challenge because of its enormity and complexity. Since big data can deal with massive data volume and variety at high velocity, it has the potential to create significant value in healthcare by improving outcomes while lowering costs (Roski et al. [ 120] ). It has been shown to improve the quality of care, make operational processes more efficient, predict and plan responses to disease epidemics, and optimize healthcare spending at all levels (Nambiar et al. [ 105] ). Here, we explore how big data analytics has revolutionized the healthcare industry.
One of the subsections of the healthcare industry where big data has contributed the most is biomedical research. With the emergence and enhancement of parallelcomputing and cloud computing—two of the most important infrastructural pillars of big data analytics—and with the extensive use of EHRs and HIEs, the cost and effort of capturing and exploring biomedical data are decreasing.
In bioinformatics, big data contributes in yielding infrastructure for computing and data processing, including error detection techniques. Cloud‐based analytics tools, such as Hadoop and MapReduce, are extensively used in the biomedical domain (Taylor [ 136] ). Parallel computing models, such as CloudBurst (Schatz [127] ), Contrail (Schatz et al. [ 128] ), and Crossbow (Gurtowski et al. [ 52] ), are making the genome mapping process easier. CloudBurst improves the performance of the genome mapping process as well as reduces the time required for mapping significantly (Schatz [ 127] ). DistMap, a scalable, integrated workflow on a Hadoop cluster, supports nine different mapping tools (Pandey and Schlötterer [ 112] ). SeqWare (D O'Connor et al. [ 24] ), based on Apache HBase database (George [ 45] ), is used for accessing large‐scale whole‐genome datasets, whereas Hydra (based on Hadoop‐distributed computing framework) is used for processing large peptide and spectra databases (Lewis et al. [ 81] ). Tools such as SAMQA (Robinson et al. [ 118] ), ART (Huang et al. [ 59] ), and CloudRS (Chen et al. [ 18] ) help in identifying errors in sequencing data. Furthermore, Genome Analysis Toolkit (GATK) (McKenna et al. [ 94] , Van der Auwera et al. [139] ), BlueSNP (Huang et al. [ 60] ), and Myrna (Langmead et al. [ 78] ) are toolkits and packages that aid researchers in analyzing genomic data.
Healthcare Information Exchange
Clinical informatics focuses on the application of IT in the healthcare domain. It includes activity‐based research, analysis of the relationship between a patient'smain diagnosis (MD) and underlying cause of death (UCD), and storage of data from EHRs and HIEs (Luo et al. [ 87] ). Big data's main contributions have been to the manner in which EHR and HIE data are stored. The clinical real‐time stream data are stored using NoSQL database, Hadoop, and HBase database because of their high‐performance characteristics (Dutta et al. [ 36] , Jin et al. [ 65] , Mazurek [ 91] ). Some research work has also studied and proposed several interactive methods of sharing medical data from multiple platforms (Chen et al. [ 19] ).
Healthcare Information Exchanges are used for efficient information sharing among heterogeneous healthcare entities, thus increasing the quality of care provided. Janakiraman et al. ([ 63] ) study the use of HIEs in emergency departments (EDs) and find that the benefits of HIEs increase with more information on patients, doctors, and prior interaction between them. Yaraghi et al. ([ 150] ) model HIE as a multi‐sided platform. Users evaluate the self‐service technologies of the model based on both user‐specific and network‐specific factors. Another body of research studies whether healthcare reforming models leads to better patient‐centric outcomes (Youn et al. [ 151] ).
Big data techniques have enabled the availability and analyses of a massive volume of clinical data. Insights derived from this data analysis can help medical professionals in identifying disease symptoms and predicting the cause and occurrence of diseases much better, eventually resulting in an overall improved quality of care (Genta and Sonnenberg [ 44] , McGregor [ 93] , Wang and Krishnan [ 144] ). Since the size and complexity of data are enormous and often involve integrating clinical data from various platforms to understand the bigger picture, data security is often compromised during analysis of clinical data. Big datatechniques can address this issue (Schultz [ 129] ). Researchers have proposed several models and frameworks to efficiently protect the privacy of the data as well as effectively deal with concurrent analyses of datasets (Lin et al. [ 84] , Sobhy et al. [ 132] ).
With the dawn of improved imaging technology, EHRs are often accompanied with high quality medical images. Studying the clinical data along with the analysis of such images will lead to better diagnoses, as well as more accurate prediction of diseases in future (Ghani et al. [ 46] ). Medical image informatics focuses on processing images for meaningful insights using big data tools and technologies. Similarly, picture archiving and communication systems (PACS) have been critically advantageous for the medical community, since these medical images can be used for improved decision regarding treatment of patients and predicting re‐admission (Ghani et al. [ 46] ). Silva et al. ([ 131] ) discuss how to integrate data in PACS when the digital imaging and communications in medicine (DICOM) object repository and database system of PACS are transferred to the cloud. Since analyzing large quantities of high quality clinical images using big data analytics generates rich, spatially oriented information at the cellular and sub‐cellular levels, systems such as Hadoop‐GIS (Wang et al. [ 145] ), that is, cost‐effective parallelsystems, are being developed to aid in managing advanced spatial queries.
Recent studies have also used big data techniques to analyze the contents of social media as a means for contagious disease surveillance, as well as for monitoring the occurrence of diseases throughout the world (Hay et al. [ 57] , Young et al. [ 152] ). Big data analytics tools are used on social media communications to detect depression‐related emotional patterns, and thus identify individuals suffering from depression from among the users (Nambisan et al. [ 106] ). Health IT infrastructures, such as the US Veterans Health Administration's (VHA), have facilitated improved quality of care by providing structured clinical data from EHRs as well as unstructured data such as physician's notes (Kupersmith et al. [ 76] ).
In coming times, there is a massive potential of HIEs becoming public utility infomediaries that many interested markets can access to derive information (De Brantes et al. [ 30] ). However, a major hurdle that adaption of HIEs faces is privacy concern among consumers. A section of researchers is building HIE frameworks incorporating privacy and security principles. For example, Pickard and Swan ([ 113] ) have created a health information sharing framework, which increases sharing of health information, built on trust, motivation, and informed consent. Trust is necessary for dealing with access control issues, motivation maps the willingness to share, and informed consent enforces the legal requirement to keep the information safe. In another study, Anderson and Agarwal ([ 5] ) find that type of the requesting stakeholder and how the information will be used are two important factors that affect the privacy concern of an individual while providing access to one's health information. Numerous states in the United States have enacted laws that incentivize HIE efforts and address the concerns of patients regarding sharing of health information. In another study, Adjerid et al. ([ 1] ) observe whether various forms of privacy regulation policies facilitate or decrease HIE efforts. They find that although privacy regulation alone negatively effects HIE efforts, when combined with incentives, privacy regulation with patient consent requirement positively impacts HIE efforts.
Way Ahead: Potential Applications and Challenges
In this section, we discuss the potential of big data applications in Information Systems, Operations/Supply Chain, and Healthcare domains. Figure summarizes the key areas of future research.
Internet of Things (IoT) and Smart City
The Internet of Things creates a world of interconnected sensory devices containing sensors that can collect and store information from their respective real‐world surroundings (Hashem et al. [ 56] , Riggins and Wamba [ 117] ). According to Business Insider, the number of IoT devices will be 75 billion by the year 2020 (Danova [ 26] ). These devices can be sensors, databases, Bluetooth devices, global positioning system (GPS), and radio‐frequency identification (RFID) tags (O'Leary [ 109] ). These devices collect massive amount of data, and if we delve down deep into this information using big data analytic tools and techniques, we may be able to derive useful insights. The applications of IoT and big data analytics combined have the potential to bring path‐breaking changes to various industries and academic research. However, at the same time, since these subjects are still very new, there are uncertainties among scholars about how to implement them, and how best to extract the business value from these concepts (Riggins and Wamba [ 117] ).
One of the domains where the coupling of big data techniques and IoT has made significant progress is the concept of a smart city, that is, where each component of urban surrounding consists of devices that are connected to a network (Hashem et al. [ 55] ). These devices can collect data from their surroundings and share among themselves. These data can be used to monitor and manage the city in a refined dynamic manner, to improve the standard of living, and to also support the sustainability of the smart city (Kitchin [ 68] ). IoT concepts enable information sharing across various devices, thus aiding in the creation big data caches. Furthermore, big data analytics are used to conduct real‐time analysis of smart city components. Kitchin ([ 68] ) mentions that urban governance decisions and future policies regarding city life are based on these analyses. Some sub‐areas under smart city where the bulk of research is being conducted are energy grids (Chourabi et al. [ 22] ), smart environments (Atzori et al. [ 7] , Nam and Pardo [ 104] , Tiwari et al. [ 137] ), waste management (Neirotti et al. [ 107] , Washburn et al. [ 146] ), smart healthcare (Nam and Pardo [ 104] , Washburn et al. [ 146] ), and public security (Neirotti et al. [ 107] , Washburn et al. [ 146] ). An emerging field surrounding smart city research is an area where big data has the potential to make a lot of contribution in the coming days.
Predictive Manufacturing and 3‐D Printer
Predictive manufacturing is based on cyber physical systems (CPS). CPS consists of devices that communicate with each other, as well as with the physical world, with the help of sensors and actuators (Alur [ 4] ). CPS technology is becoming increasingly popular among manufacturers in the United States and Europe as it allows them to gain an edge in international manufacturing dynamics (Wright [ 147] ). CPS technology can also be used to improve the design of products, to track its production and in‐service performance, and to enhance productivity and efficiency of the manufacturers. General Electric (GE) and Rolls Royce have embedded sensors on their jet engines that capture data during flight and post‐flight, and maintenance decisions can then be made based on these logged data (Dai et al. [ 25] ).
Massive amounts of data are being collected from manufacturing plants through RFID and CPS technologies (Lee et al. [ 79] ). As more advancement is made in big data analytics, these data about production equipment and operations can be processed better. Security of CPS and predictive manufacturing is another potential area where big data techniques can be applied for better security outcomes. Furthermore, additive manufacturing processes, also known as 3‐D printing, are used to build three‐dimensional objects by depositing materials layer‐by‐layer (Campbell et al. [ 12] , Conner et al. [ 23] ). 3‐D printing is a path‐breaking technology that, in coming future, will make the existing models of manufacturing for certain products obsolete (Waller and Fawcett [ 142] ). Hence, it is profoundly important that we study the applications of big data analytics to additive manufacturing in order to derive insights.
Smart Healthcare is an extension of IoT ideas in the healthcare industry; that is, IoT devices equipped with RFID, Wireless Sensor Network (WSN), and advanced mobile technologies are being used to monitor patients and biomedical devices (Catarinucci et al. [ 14] ). In the smart healthcare architecture, IoT‐supporting devices are being used for seamless and constant data collection, and big data technology on the cloud is being used for storing, analyzing, and sharing this information (Muhammad et al. [ 102] ). The nexus of IoT and big data analytics hosted on cloud technology will not only help in more accurate detection and treatment of illnesses, but will also provide quality healthcare at a reduced cost (Varshney and Chang [ 140] ). Moreover, smart healthcare enables to bring specialized healthcare to people who have restricted movement, or who are in remote areas where there is a dearth of specialized doctors (Muhammad et al. [ 102] ).
Recently, the use of wearable devices has seen a rapid growth, and the number of such units shipped annually is expected to reach 148 million by 2019 (Danova [27] ). Olshansky et al. ([ 110] ) discuss how data captured by wearable devices can be transmitted to health data aggregation services, such as Human API (humanapi.co) and Welltok (welltok.com), who can transform the data into measures of risk. These measures can be used to observe health trends as well as to detect and prevent diseases. Some promising topics of research in the smart healthcare domain where big data can play an important role are smart and connected health (Carroll [ 13] , Harwood et al. [ 54] , Leroy et al. [ 80] ), and privacy issues in the smart healthcare framework (Ding et al. [ 35] ).
In this article, we explored the application of big data in three different domains—information systems, operations and supply chain, and healthcare. But, the line between these disciplines are blurring with each passing day. Several new avenues of research are becoming popular that are common to at least two of these domains. One such topic is use of ERP platforms in healthcare that is common to all the three fields.
Healthcare organizations accumulate massive amounts of information from various departments and then different entities in healthcare management rely on to carry out their services. An automated integrated system, such as an ERP system to manage the information coming from different services and processes, will enable healthcare organizations to improve efficiency of service and quality of care (Handayani et al. [ 53] ). The motivations underlying the adoption of ERP system in healthcare management are technological, managerial, clinical, and financial (Poba‐Nzaou et al. [ 114] ). An ERP system integrates various business units of healthcare organization, such as finance, operation and supply chain management, and human resource, and provides easy access within each unit. It can also address the disparity in healthcare quality between urban and rural settings. ERP provides connectivity among all healthcare centers and hence information can also be accessed from rural centers (Padhy et al. [ 111] ). Benefits from implementing ERP can be classified into four categories—patients’ satisfaction, stakeholders’ satisfaction, operations efficiency, and strategic and performance management (Chiarini et al. [ 20] ). However, ERP systems are costly to acquire and involve hidden costs even after successful implementation such as integration testing and staff members training costs (Gupta [ 51] , Wailgum [ 141] ). Till date, majority of research work involving ERP in healthcare domain has revolved around implementation of ERP systems (Mucheleka and Halonen [ 101] ). One potential research avenue is to conduct empirical studies to quantify the benefits from implementation of such systems.
We generate data whenever we use the Internet. Aside from the data generated by us, several interconnected smart devices collect data, that is, devices with sensors collect data from their surrounding real world. With this tremendous quantity of data generated each day, big data and big data analytics are very much in demand in several industries as well as among scholars. In this study, we discussed the contributions of big data in information systems, operations and supply chain management, and healthcare domains. At the end, we talked about four sub‐areas of these domains—cloud computing, Internet of things (IoT) and smart city, predictive manufacturing and 3‐D printer, and smart healthcare—where big data techniques can lead to significant improvements. We also discussed the corresponding challenges and future research opportunities in the field, noting numerous areas for growth and exploration.
BIBLIOGRAPHY 3
Is a computer capable, like a human, of experiencing emotions (empathy, jealousy, fear)? Can a computer, through cunning, imitate the expression of such emotions for "personal" gain? Allowing for all this to be possible, it would follow necessarily that the computer must not only be self-conscious but also have awareness and understanding of the human mind, in order to know its interlocutors' expectations and anticipate their response. Perhaps the real question is beyond "Can a computer think?" One may ask: "Can a computer be as manipulative, as deceptive, as duplicitous-or as charming, as honest, and as kind as a human can be?"
SUNDAY, May 11, 1997, was a day like any other. Everything that was supposed to happen in politics, sports, and entertainment happened on that day, with one notable exception. In a New York City hotel an unexpected history-making event took place. A chess tournament pitting a human against a machine saw Garry Kasparov, the then reigning world chess champion, being defeated by a computer called Deep Blue. A new era had dawned.
In 2011, the prowess of the question-answering computer Watson on the television game show Jeopardy! captured the public's imagination. Watson won a match against two seasoned Jeopardy!players and received the $1-million prize. More recently, in 2016 a Go-playing program by the name of AlphaGo won a tournament against Lee Sedol, the recognized best player on the planet, by a score of 4 to 1. And on June 18, 2018, a program dubbed Project Debater engaged two humans in debates, on the topics of government subsidy of space exploration and increased investment in telemedicine, respectively, and did remarkably well. The world is beginning to pay attention.
These four achievements are harbingers of greater things to come. What is the common thread between Deep Blue, Watson, AlphaGo, Project Debater, and many other successes? Artificial Intelligence, a branch of computer science that aims to create intelligent systems. Over the past two or three years, Artificial Intelligence (AI), a scientific enterprise, has become a social phenomenon, with myriad economic, cultural, and philosophical implications. The advent of self-driving cars, speech-activated automated assistants, and data analytics more generally has transformed every sector of society. AI is reshaping and reinventing such fields as health care, business, transportation, education, and entertainment. The news media are replete with stories on the new cognitive technologies generated by AI and their effect on our daily lives and lifestyles. What is the reason for this explosion of excitement over AI?
As a result of some recent advances in machine learning technologies, the field is about a decade ahead of where we thought it would be at this time, with advances proceeding at an exponential rate. So says Elon Musk. In a BBC interview, famous physicist Stephen Hawking (1942-2018) warned that "the development of full artificial intelligence could spell the end of the human race." And fears that the singularity is nigh have resulted in websites, YouTube videos, and articles describing our impending doom. But is it the takeover of an artificial intelligence we should be worrying about? Or should we be more concerned about giving too much power to unintelligentAI? To make an accurate judgement, we need to understand what all the fuss is about.
MACHINE "learning" refers to a family of computational methods for analyzing data into statistically significant regularities that are useful for some purpose. These regularities are called "features" and the process of uncovering them "feature detection." Humans and other animals detect features whenever they recognize an object in the world: to perceive that a particular bone is the kind of thing that can be chomped on is to recognize a pattern of similarity between the bone being perceived and a host of bone experiences in the past.
Machine learning technologies have become increasingly adept at such classification tasks in well-understood areas. In the context of human faces, for example, systems that are sensitive to features such as noses, lips, eyes, and so on perform as well as humans on face recognition tasks. But some domains are so vast and multivaried that even humans don't have a good handle on what set of features will be useful for a given task. For example, we know that online "clicking" behaviour is a source of potentially useful data, but we aren't sure how to organize it in order to highlight the useful patterns. But if a human programmer doesn't know what features an AI system should detect, how can the system be built?
The AI excitement over the last few years is the result of some very promising advances toward solving this problem. Deep Learning algorithms can "extract" features from a set of data and thereby move beyond what humans know. The techniques have been used successfully on labelled data sets, where humans have already tagged the photographs with captions-"DOG PLAYING BALL"-that are used as a way of "supervising" a system's learning by tracking how close or far it is on a given input from the correct answer. Recently there has been success with unlabelled data sets, what is called, "unsupervised" learning. The millennial Pandora's Box has been opened.
AlphaGo is a result of this new wave of machine learning. Deep Blue played chess by brute force, searching deeply through the hardcoded array of possible outcomes before choosing the optimal move. A human has no chance against this kind of opponent, not because it is so much smarter, but simply because it has a bigger working memory than a human does. With Go this canned approach was not possible: there are far more possible choices for each move, too many to hardcode and then search in real time. But Deep Learning systems such as AlphaGo can "learn" the relevant patterns of game play by extracting move features from millions of example games. The more games it plays, the more subtle its feature set becomes. On March 15, 2016, AlphaGo was awarded the highest Go grandmaster rank by South Korea's Go Association. Even the creators of AlphaGo at Google's DeepMind in London have no idea what move it will play at any given point in a game. Is the singularity at hand?
TO answer that question, we need to consider carefully whether such systems are in fact learningand becoming intelligent. These questions take on urgency as increasingly we use them to make important decisions about human lives.
In 2017 the Canadian Institute for Advanced Research (CIFAR) was awarded a $i25-million budget for the Pan-Canadian Artificial Intelligence Strategy, an initiative to revamp every facet of our bureaucracy with AI technology. The health care system is one of the first areas targeted for change. And a pilot project for early detection of possible suicides is already underway.
How will such technology be used? Sally might be at risk for suicide, but it doesn't follow from this that she ought to be put under surveillance, institutionalized, or otherwise have her autonomy undermined. More generally, machine learning is an excellent tool for data analysis, but it cannot tell us what to do.
Practical judgement, the ability to bring relevant considerations to bear on a particular situation, guides us in our considered actions. Determining relevance is the critical skill here. How do we do it? This is the million-dollar question, of course, and we won't answer it here. Minimally, however, it requires a capacity to synthesize what is important in a given situation with what is important in human life more generally. In other words, it requires an understanding of what it means to be a laughing, working, eating, resting, playing being.
We still do not understand how meaning works. But we do know that being an expert pattern-detector in some domain or other is not all there is to it. The failures of our new AI heroes tell the story. During the Jeopardy! -IBM Challenge, Watson revealed what was behind its curtain-lots of meaningless data-in its answer to the "Final Jeopardy! " question. The category was "US Cities," and the answer was "Its largest airport is named for a World War II hero; its second largest, for a World War II battle." Watson's response? "What is Toronto?"
No surprise, then, that strategy games have been an AI industry focus: the tight constraints of game situations make the field of relevance narrow and, consequently, the chances of success great. Even so, the story of chess and AI is far from over. The 1,400-year-old game recently received a boost with the invention of Quantum Chess at Queen's University. This variant uses the weird properties of quantum physics in order to introduce an element of uncertainty into the game, thereby giving humans an equal chance when playing against computers. Unlike the chess pieces of the classical game, where a rook is a rook, and a knight is a knight, a Quantum Chess piece is a superposition of states, each representing a different classic chess piece. A player does not know the identity of a piece (that is, whether it is a pawn, a bishop, a queen, and so on) until the piece is selected for a move. Furthermore, thanks to the bizarre property of entanglement, the state of a chess piece is somehow "bound" to the state of another piece, regardless of how far they are separated; touch one, and you affect the other! The unpredictability inherent in Quantum Chess creates a level playing field for humans and computers. Unlike the case in classic chess (where a program can engage in a deterministic and thorough search for a good move), the hidden identities of the pieces and the probabilistic nature of quantum physics greatly diminish the computer's ability to conduct a reliable search. Perhaps judgment will give humans an edge, even in this limited domain. When it comes to Quantum Chess, even a novice chess player may have a chance against a more experienced human, as demonstrated by the following anecdote. On January 26, 2016, a movie was premiered at the California Institute of Technology during an event entitled One Entangled Evening: A Celebration of Richard Feynman's Quantum Legacy. The movie captured an exciting, and at times funny, game of Quantum Chess between Hollywood actor Paul Rudd and Stephen Hawking. It is worth noting that in this version of Quantum Chess superposition has a different meaning from being a superposition of states. Rather, superposition is spatial, in the sense that the same chess piece can be, at the same time, in two separate locations on the chess board (one known and one unknown). Touching a piece in order to execute a move determines probabilistically from which of the two locations the piece is to move. It is as though the piece manifests itself suddenly, either choosing to stay in its visible location or possibly disappearing and materializing elsewhere on the board (thereby revealing the unknown location).
OTHER aspects of AI are increasingly being addressed in popular culture. The dark and suspenseful movie Ex Machina (2014), directed by Alex Garland, offers an interesting treatment of issues surrounding machine intelligence. An experimental female robot is being tested for possessing intelligence. She beguiles the young man testing her and persuades him to take actions leading to her liberation from captivity and simultaneously to his tragic end. The movie adds the following unexpected twist to the standard question of whether a machine can possess intelligence. If a robot displays an emotion toward a human that may be interpreted, for example, as love, is the emotion real, in the sense of being the repetition of a learned vocabulary, or is it purposefully faked?
Is the robot being sincere, or might it be pretending? In other words, has the computer reached a level of intelligence that allows it to be able not only to automatically utter the words that express a human sentiment but in fact to intentionally simulate that feeling for a good or a bad purpose? Is a computer capable, like a human, of experiencing emotions (empathy, jealousy, fear)? Can a computer, through cunning, imitate the expression of such emotions for "personal" gain? Allowing for all this to be possible, it would follow necessarily that the computer must not only be self-conscious but also have awareness and understanding of the human mind, in order to know its interlocutors' expectations and anticipate their response. Perhaps the real question is beyond "Can a computer think?" One may ask: "Can a computer be as manipulative, as deceptive, as duplicitous-or as charming, as honest, and as kind as a human can be?"
What will an AI system capable of making practical judgements look like? Obviously, this is a foundational question for AI. Whatever the answer is, we know that we don't yet have it. We shouldn't be worried about the singularity-we are a long way off from that. But we should be concerned about the use to which AI technologies are being put. In our over-confidence in these technologies, we are giving them too much power.