Article Analysis

profilejams_r1
BigData_2018.pdf

Baban Hasnat is a professor of international business and economics in the College at Brockport, State University of

New York. The author expresses his appreciation to Steve Breslawski, Mustafa Canbolat, and Barry Hettler for their

help in improving this article.

580

©2018, Journal of Economic Issues / Association for Evolutionary Economics

JOURNAL OF ECONOMIC ISSUES

Vol. LII No. 2 June 2018

DOI 10.1080/00213624.2018.1469938

Big Data: An Institutional Perspective on Opportunities and

Challenges

Baban Hasnat

Abstract: The data revolution is already reshaping how knowledge is produced,

business conducted, humanitarian assistance handled, public officials elected, and

governance enacted. Economists rely on data to describe, interpret, and forecast

economic activity. Despite the rich tradition of using large datasets, institutional

economics have shied away from big data. This article describes, reviews, and

reflects on big data, with a particular focus on economic development. It illustrates

the vast opportunities and challenges for big data as an important tool for the

benefit of the public. It suggests that big data and data analytics, if used properly,

can provide real-time actionable information that can be used to identify problems

and needs, offer services, and provide feedback on the effectiveness of policy

action.

Keywords: big data, Google trends, humanitarian assistance

JEL Classification Codes: O1, O4

The world is undergoing a data revolution. The revolution is already reshaping how

knowledge is produced, business conducted, humanitarian assistance handled, public

officials elected, and governance enacted (Kitchin 2014). Data now pours in from

nearly everywhere at all times and from every device — this is undeniably an era of big

data. Big data is produced anyway (data exhaust), it is often accessible in real time,

and it arises from the merging of different sources. It is an endless source of data for

the economic and social world. Its impact on the economy has been referred to as

“the new oil” (Pringle 2017).

Government agencies, international organizations, and private institutions have

been collecting economic and social data for a long time. Economists have relied on

these sources to describe, interpret, and forecast economic activity. Macroeconomists,

in particular, have been at the forefront of exploiting large datasets. For example,

Arthur F. Burns and Wesley C. Mitchell’s (1946) pioneering search for patterns and

581

Big Data

regularities in the data led to the identification of the business cycle. Similar work by

Simon Kuznets (1941) led to the creation of the National Income and Product

Accounts. Unfortunately, current institutional economists have shown very little

interest in it. A review of the table of the contents and abstracts of the Journal of

Economic Issues, the Journal of Institutional and Theoretical Economics, and the Journal of

Institutional Economics found no articles on big data. This is surprising because early

institutionalists displayed a particular penchant for data to understand economic

issues and to make policy recommendations.

My objective in this article is to describe, review, and reflect on big data, with a

particular focus on economic development. I illustrate the vast opportunities and

challenges that big data presents as an important tool for the public good. I also show

that big data and data analytics, if used properly, can provide real-time actionable

information that can be used to identify problems and needs, offer services, and

provide feedback on the effectiveness of policy action. My inspiration for this study

comes from the work of Wesley C. Mitchell, who believed that acquiring the facts and

“detailed sifting of data outside the context of a worked out model” (Hirsch 1976,

206) is the correct approach to understanding economic issues.

What Is Big Data?

The term “big data” emerged in the 1990s and gained momentum in the early 2000s.

Similar to many new concepts, big data has been variously defined and

operationalized. Clearly, size often comes to mind when referring to big data. It is

commonly defined as the astonishing amount of structured and unstructured data

that are being generated, captured, and stored at an amazing speed. An example of big

data would be Walmart’s customer transaction data. Every hour, Walmart handles

over one million transactions, which are captured into its databases that are estimated

to contain over 2,560 terabytes of data (1 terabyte = 10244 byte) — equivalent to 167

times the information contained in all the books in the Library of Congress (Economist

2010). In a single day, there are about 5.2 billion Google searches, twenty-two billion

texts sent, and more than four million hours of content uploaded to YouTube, with

users watching 5.97 billion hours of YouTube videos (Schultz 2017). In regard to

hardware and software, big data is often defined as data that is too large and complex

for processing with traditional database management tools. Paradoxically, what is

considered big data today may become small data in five years due to advances in

technologies, platforms, and analytical capabilities. The data science community

concentrates on its characteristics and defines big data in terms of the 3V model:

volume (amount of data), velocity (speed of data flow), and variety (range of data types

and sources). Other dimensions, such as variability (highly inconsistent with periodic

peaks) and veracity (trust and uncertainty), are also added to the 3Vs to characterize

big data (Gandomi and Haider 2015).

The United Nations’ (UN) Department of Economic and Social Affairs (2015)

classifies big data into three categories: (i) social networks (human-sourced

information, such as Facebook, Twitter, blogs, Instagram, YouTube, Internet searches,

582

Baban Hasnat

text messages, etc.), (ii) traditional business systems (process-mediated data, such as

data generated in the context of business transactions, e-commerce, credit cards, and

medical records), and (iii) Internet of Things (machine-generated data, such as data

produced by weather, pollution, and traffic sensors, in addition to mobile phone

tracking, satellite images and logs registered by computer systems). Danah Boyd and

Kate Crawford (2012) describe big data as a cultural, technological, and scholarly

phenomenon that rests on the interplay of technology (tools and algorithms to gather,

store, etc., data); analysis (identifying patterns to understand economic, social,

political, technical, and legal issues), and mythology (the widespread belief that the

large data sets offer a higher form of intelligence and knowledge).

The Use of Big Data for Development and Humanitarian Assistance

Big data increasingly concerns people’s real behavior, not just the topics on which

people seek information through searching Google or through posting on Facebook.

Posts on social media may or may not represent a person, but how that person spends

time, whom he/she associates with, what he/she buys, where he/she goes, and so on,

can reveal an enormous amount about that person. Data scientists can predict, with

reasonable accuracy, if the person will take out a payback loan, develop diabetes, or

buy tickets (Pentland 2018). Thus, the growth of new technologies and new sources of

data, often available in real time, offers a number of important dividends for

development. It can improve the efficiency of low-income people because they can

access a wide range of information on price and cost, thereby allowing them to save

money and time. Development programs can be inclusive as socially and economically

excluded groups increasingly voice their positions in defining development priorities.

This gives people access, empowerment, voice, opportunity, and security — something

that Amartya Sen (1999) has been advocating as the goal of development.

Highlighting the importance of big data, the United Nations declares: “It is time for

the development community and policymakers around the world to recognize and

seize this historical opportunity to address twenty-first century challenges, including

the effects of global volatility, climate change, and demographic shifts, with twenty-

first century tools” (United Nations Global Pulse 2012, 6).

Big data and data analytics have appeared on policymakers’ radars only in the

last few years. They are still in the early years of understanding big data and its

application in international development. Data analytics can be used to predict the

characteristics of sub-groups such as, for example, school dropout rates and social

welfare programs. An analysis of Twitter and Google trends and other social media

can be used to assess the attitude of different groups to social problems and issues or

their response to different prevention strategies. Big data can allow the integration of

multiple sources of data into a data platform (UN Food and Agricultural

Organization’s AQUASTAT n.d.), mapping (Ebola outbreaks, the spread of crop

diseases, the location of victims in an earthquake, etc.), monitoring trends (rural

poverty in China), and real-time early-warning signals (hunger, drought, and ethnic

conflict). These tools are now starting to be used in development programs and

Big Data

583

583

emergency management. Below I highlight some successful cases in the use of big data

in economic development and humanitarian assistance:

• No census has been possible in Afghanistan since 1979 due to security concerns.

By combing through satellite imagery, remote sensing data, global information

system modeling, and demographic surveys, the United Nations’ Fund for

Population Activities was able to generate population maps for Afghanistan.

• Combining satellite and other sources of data, the Food and Agricultural

Organization has developed AQUASTAT, which is a global water information

system that collects, analyses, and disseminates data and information on water

resources, water use, agricultural water management and other information

related to water (FAO).

• As mobile phones are becoming ever-present in the developing world, it is now

possible to turn mobile phone-generated data into an economic development

tool. For example, when mobile operators see airtime top-off amounts

decreasing in a certain region, it is a sign of loss of income in the region.

Policymakers can take action based on such information before the information

appears in official indicators (World Economic Forum 2012). Mobile payments

for agricultural products, input purchases, and subsidies, combined with satellite

images, may improve predictions of food production trends and incentives.

Early detection of production trends can help governments provide targeted

assistance. Mining mobile phone data and proxies for poverty indicators have

been developed, which gives policymakers a much more economical and

continuous source of data on poverty trends (United Nations Global Pulse

2016).

• Policymakers are increasingly resorting to big data to manage epidemics and

healthcare. For example, the human population movement is a challenge to

eliminate malaria in developing countries. Amy Wesolowski et al. (2012)

analyzed the travel patterns of fifteen million mobile phone owners in Kenya

over a period of twelve months. Combining travel data with census and survey

data, together with spatially referenced malaria data, the global information

system, and network analysis tools, the authors were able to identify, map, and

quantify malaria risk areas. People’s lifestyles can be analyzed from the data

generated by the use of smartphones and apps, which offer opportunities for

primary prevention. In Iceland’s capital, Reykjavik, a combination of behavioral

economics, big data, and mobile technology has helped identify individuals at

increased risk of lifestyle-related diseases (i.e., diabetics) and reverse their

condition (Thorgeirsson 2017). Global Viral, a non-profit organization based in

San Francisco, uses big data to identify the locations, sources, and drivers of

local outbreaks of global epidemics up to a week ahead of global bodies, such as

the World Health Organization, that depend on traditional techniques and

indicators.

• Big data shows particular promise in emergency management. Immediately after

the April 2015 earthquake in Nepal, Flowminder/WorldPro used mobile phone

584

Baban Hasnat

data to create a report on population displacement, which the UN used to

coordinate humanitarian assistance. When a devastating earthquake struck

Haiti in 2010, a group of volunteers took it upon themselves to analyze

informational content on Facebook, Twitter, and text messages to locate

affected areas and victims of the earthquake. The information was quickly

loaded — with more than 1.4 million edits — on street maps to construct a crisis

street map to assist humanitarian action.

• Big data and data analytics can be used to gain insight into how firms respond

to trade reforms or economic shocks. For example, the US-based company

Panjiva collects custom transaction information (e.g., source, destination, types

of goods) via a machine-learning algorithm that covers data for eight countries,

with 190 partner countries comprising 450 million records. The data can convey

anticipated action from the US, China, and Europe in terms of trade policies in

2017, the prospects for the shipping industry, and the industries that have the

most to win and lose from trade.

• Combing real-time traffic conditions with past traffic patterns and weather

forecasts, urban planners are better able to manage public transportation, the

police and fire departments, and save time and gasoline for citizens and

businesses.

Applications of Big Data: Two Case Studies

Several sectors of the economy that are important for development are also quite data-

intensive. I present two case studies to show the use of big data. The first case shows

the tracking of words. Figure 1 combines the actual unemployment data from the

U.S. Bureau of Labor Statistics in October 2017 with simple Google searches for the

word “unemployment” in the fifty U.S. states and Washington, D.C., at the same

time. The figure clearly shows that the Google Trend data correlates very closely with

the actual unemployment statistics. The potential for development is straightforward.

Each month, the Bureau of Labor Statistics’ employees survey 60,000 households

(approximately 110,000 individuals) over the phone or in person and inquire about

labor force activities. The survey results are published with a time lag of one month.

Google search trend data are available for free and can be accessed with a simple

computer in real time.

Figure 2 shows two indexes for China’s manufacturing capacity. The PMI index

provides an overall view of activity in the manufacturing sector. It is calculated from a

monthly survey of approximately 430 purchasing managers in China. The SMI index

was created by SpaceKnow, a company that specializes in geospatial analysis.

SpaceKnow has taken over two billion satellite photos in China over the last fifteen

years. By analyzing changes in images across 6,000 industrial sites and incorporating

the number of trucks in industrial parks and the frequency of turnovers, it allows the

company to measure the manufacturing sector and competitive capacity. The PMI

index comes with a four-week time lag, while the SpaceKnow index can be received in

real time.

Big Data

585

585

Figure 1. State Unemployment Rate and Google Trend (October 2017)

Figure 2. Index for China’s Manufacturing Sector Activity Based on Actual Survey

and Satellite Image

The Challenge

Despite its availability and advances in technological and analytical capacity, big data

has not been widely adopted as a tool for economic development because of the

586

Baban Hasnat

number of challenges. One of the most sensitive issues for anyone wishing to explore

the use of big data for economic development and policymaking is privacy. Safety,

diversity, pluralism, and democracy are compromised without privacy. Recent

research has shown that it is possible to “de-anonymize” previously anonymized

datasets. Much of the big data belongs to private companies, and they may not have

any incentive to share proprietary data for security and privacy concerns. Convincing

private companies to allow economists to access business data is difficult because

there are important privacy and competitive issues that a private company must

consider before it allows a researcher to access company data (Hilbert 2016).

Access to big data is a major challenge. Economists traditionally rely on their

own survey data or government survey data for their research. Just because a

government entity collects data (i.e., the IRS, the Social Security Administration, etc.)

does not mean that economists will be able to access it easily. Certain protocols must

be followed, which is generally time-consuming. For example, a Harvard researcher

needed very high-level security clearance, which took months to obtain, and he also

had to submit information on all his places of residence in the last ten years and

could only access the IRS data set in secure data rooms authorized by the central

office (Einav and Levin 2013; Taylor, Schroeder and Meyer 2014). In addition, the

process could favor researchers who have the resources, influence, and network to

gain access to the data, which may lead to “data haves’ and ‘data have-nots” (Boyed

and Crawford 2012).

Big data is worthless unless it is used for improved decision-making. To do this,

organizations must resort to managing data (acquisition and recording; extraction,

cleaning, and annotation; integration, aggregation, and representation) and data

analytics (modeling and analysis and interpretations). Data management for

computation may be a challenge for developing countries and will require major

investments in information and communication technology. Accurate and actionable

data mining and analysis, particularly in real-time, requires extensive technical skills.

Developing countries may not be able to afford the data scientists and infrastructure.

A significant share of big data is generated from people’s perception, intentions,

and desires. Policymakers have to be careful about concluding before making a

judgment about what the data is really conveying because perception, intentions, and

desires can change rapidly. Additionally, combining data from multiple sources may

also mean magnifying the data flaws (Bollier 2010). Thus, theory and context matter

even more for extremely large data sets. A case in point is how Google Trend data

failed to predict flu trends. Google Flu Trends (GFT) is a big data tool that claimed to

accurately predict flu epidemics in the US. Because GFT could predict an increase in

cases of flu before the Center of Disease Control, it was trumpeted as the beginning

of the big data era. Unfortunately, the GFT’s prediction did not match reality.

Despite improving its model, Google has been persistently overestimating the flu

since at least 2011 (Fung 2014).

Economists typically look for a particular dataset to answer an unsettled

question, but data mining leads to searches for the unsettled question. Noting that big

data often involves billions of observations, Hal Varian (2014) argued that the

Big Data

587

587

concept of statistical significance, a mainstay in hypothesis testing, may be useless in

certain situations. Others worry that a substantial project that uses big data is

essentially descriptive because the data will reveal correlations rather than causality.

Conclusion

It is clear that the size, speed, and nature of big data are extremely valuable in certain

situations and can be a powerful tool to address various social ills and development

efforts by providing early warnings, real-time awareness, and real-time feedback.

Nevertheless, we cannot ignore the data context and cultural context. We must not

forget that big data has its limitations and biases. We need to consider these and use

caution in interpreting the data. Correlation is not causation and should not replace

or act as a proxy for official statistics. In fact, big data should complement the existing

data. At present, some motivated persons and non-profit organizations are

spearheading the use of big data for public benefit. The prerequisites for making big

data effective for development are extensive technological infrastructure, generic

software services, and human capacities and skills. Developing countries have a long

way to go before big data becomes an everyday tool.

References

Bollier, David. The Promise and Peril of Big Data. Communications and Society Program. The Aspen

Institute, 2010.

Boyd, Danah and Kate Crawford. “Critical Questions for Big Data.” Information, Communication & Society

15, 5 (2012): 662-679.

Burns, Arthur F. and Wesley C. Mitchell. Measuring Business Cycles. New York, NY: Columbia University

Press, 1946.

Economist. “Data, Data Everywhere.” Special report. The Economist, February 25, 2010. Available at http://

www.economist.com/node/15557443. Accessed Nov 1, 2017.

Fung, Kaiser. “Google Flu Trends’ Failure Shows Good Data > Big Data.” Harvard Business Review, March

25, 2014

Einav, Liran and Jonathan D. Levin. “The Data Revolution and Economic Analysis.” Working Paper No.

19035. NBER, May 2013. Available at http://www.nber.org/papers/w19035.pdf. Accessed August

1, 2017

Gandomi, Amir and Murtaza Haider. “Beyond the Hype: Big Data Concepts, Methods, and Analytics.”

International Journal of Information Management 35, 2 (2015): 137-144.

Hilbert, Martin. “Big Data for Development: A Review of Promises and Challenges.” Development Policy

Review 34, 1 (2016): 135-174.

Hirsch, Abraham. “The A Posteriori Method and the Creation of New Theory: W.C. Mitchell as a Case

Study.” History of Political Economy 8, 2 (1976): 195-206.

Kitchin, Rob. The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences.

Thousand Oaks, CA: Sage Publishing, 2014.

Kuznets, Simon. National Income and Its Composition, 1919–1938. New York, NY: National Bureau of

Economic Research, 1941.

Pentland, Alex Sandy. “Reinventing Society in the Wake of Big Data: A Conversation with Alex ‘Sandy’

Pentland.” Edge, August 30, 2018. Available at https://www.edge.org/conversation/

alex_sandy_pentland-reinventing-society-in-the-wake-of-big-data. Accessed November 19, 2018.

Pringle, Ramona. “Data Is the New Oil.” CBC News, August 25, 2017. Available at http://www.cbc.ca/

news/technology/data-is-the-new-oil-1.4259677. Accessed November 27, 2018.

Sen, Amartya. Development as Freedom. New York, NY: Oxford University Press, 1999.

588

Baban Hasnat

Taylor, Linnet, Ralph Schroeder and Eric Meyer. “Emerging Practices and Perspectives on Big Data

Analysis in Economics: Bigger and Better or More of the Same?” Big Data & Society, July-December

2014, pp. 1-10.

Thorgeirsson, Tryggvi. “Hospital Impact — Behavioral Economics and Big Data May Improve Health and

Reduce Healthcare Costs.” FierceHealthcare, September 26, 2017. Available at

https://www.fiercehealthcare.com/hospitals/hospital-impact-behavioral-economics-may-improve-

health-and-reduce-healthcare-costs. Accessed December 8, 2017.

Schultz, Jeff. “How Much Data Is Created on the Internet Each Day?” Micro Focus Blog, August 10, 2017.

Available at https://blog.microfocus.com/how-much-data-is-created-on-the-internet-each-day.

Accessed on December 10, 2018.

United Nations. Department of Economic and Social Affairs Statistics Division. Classification of Types of Big

Data, ESA/STAT/AC.289/26 11. UNSTAT, May 2015. Available at https://unstats.un.org/unsd/

class/intercop/expertgroup/2015/AC289-26.PDF. Accessed December 1, 2017.

United Nation. Food and Agriculture Organization. AQUASTAT, n.d. Available at http://www.fao.org/

nr/water/aquastat/main/index.stm. Accessed December 5, 2017.

United Nations Global Pulse. Big Data for Development: Challenges and Opportunities. UN Global Pulse, May

2012. Available at http://www.unglobalpulse.org/sites/default/files/BigDataforDevelopment-

UNGlobalPulseMay2012.pdf. Accessed November 15, 2017.

———. Integrating Big Data into the Monitoring and Evaluation of Development Programs. UN Global Pulse, 2016.

Available at http://unglobalpulse.org/sites/default/files/

IntegratingBigData_intoMEDP_web_UNGP.pdf. Accessed November 16, 2017

Varian, Hal. “Big Data: New Tricks for Econometrics.” Journal of Economic Perspectives 28, 2 (2014): 3-28.

Wesolowski, Amy, Nathan Eagle, Andrew J. Tatem, David L. Smith, Abdisalan M. Noor, Robert W. Snow

and Caroline O. Buckee. “Quantifying the Impact of Human Mobility on Malaria.” Science 338,

6104 (2012): 267-270.

World Economic Forum. Big Data, Big Impact: New Possibilities for International Development. World

Economic Forum, 2012. Available at http://www3.weforum.org/docs/

WEF_TC_MFS_BigDataBigImpact_Briefing_2012.pdf. Accessed December 10, 2017.

Copyright of Journal of Economic Issues (Taylor & Francis Ltd) is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.