project

yasernoory
DataandInformation.docx

Data and Information

pyramid with 3 layers: bottom (data); next up (information); top (knowledge)

Data Pyramid

The Power of Data

What Is Data?  

Data is a fact or set of facts that have been gathered about an object, idea, place, person, etc.  The facts are stored or represented in the form of numbers, measurements, words, descriptions, or observations. Note that a single fact is actually a datum, and data is the plural form. Conventionally, the plural form (data) is used for singular and plural purposes, and we will do that here.

As noted in the definition above, data can be represented in different forms. Here are several types of data (Pierce, 2017):

· Qualitative data: descriptive data that includes such facts as color, texture, feel, description of an experience, perceptions of strengths or weaknesses, etc. This is data to which numbers are not normally assigned.

· Quantitative data: facts which are presented as numbers such as test scores, number of students in the class, number of words on a page, capacity of a hard drive. Quantitative data can also be subdivided into discrete and continuous data.

· Discrete data can only be assigned a certain value, such as whole numbers. For example, there are 32 students in the class, the hard drive can store eight Gigabytes, the test score was 89 percent.

· Continuous data reflects a range into which the values may fall. Optimal tire pressures may fall anywhere between 30 PSI (pounds per square inch) and 33 PSI, including any fractional pressure in between these values.

· Categorical data: groups the facts into a category such as "new" or "used" or "for sale" or "not for sale," etc.

Why Is Data Collected?  

Typically, data is collected to tell a story or solve a problem. Beginning with a question that needs to be addressed focuses both the type of data to be collected and the follow-on review of what is gathered, perhaps providing an answer to the question, or revealing patterns, or uncovering unusual results that were not expected. There may be interesting results hidden in the facts gathered. But the question that drives the collection of data also helps identify the audience who will be the recipients of your findings or the story you want to tell (School of Data, 2013).

How Is Data Collected?  

There are many ways for collecting data—direct observation (counting people in the coffee shop), a census (all items or individuals in the group are measured), or a sample (selected items or individuals in the group are measured), physical measurements taken by persons or machines, interviews, etc.

In much broader terms, the basic data sources are:

· collecting data yourself

· finding data that has already been collected and released for others to use

· getting additional data by asking sources for updates, or by getting access to data that is typically hidden from public use

This last list of "hidden" data sources includes the government, organizations, and scientific projects and institutions. Two great places where individuals can find data are projects such as Open Access Directory's data repository (http://oad.simmons.edu/oadwiki/Data_repositories) and Open Knowledge Foundation's datahub.io (https://datahub.io/dataset).

In What Format Is Data Collected?  

The purpose for gathering data is to tell that story or answer a question.  But in order to do that, the data has to be in a format that allows the data to be analyzed. Outside of simply eyeballing the data or using paper and pencil, the best format to use with computer analysis tools (such as Excel) is to obtain the data in machine-readable form—that is, in a form such that the data can be imported into a computer program. The most common format for exchanging or importing data is in comma separated values (CSV). The data pieces, whether words or numbers, are separated by commas, and the data can be read directly into a spreadsheet program.

Big Data  

The term "big data" became mainstream in the early 2000s via the work of industry analyst Doug Laney. His definition of the term incorporates the following (SAS, n.d.):

· Volume: Organizations collect data from a variety of sources, including business transactions, social media, and information from sensor or machine-to-machine data. In the past, storage would have been an issue, but new technologies have helped..

· Velocity: Data streams into the data center at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors, and smart metering drive the need to deal with torrents of data in near-real time.

· Variety: Data comes in all types of formats—from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.

The SAS Institute Inc.(SAS) adds two additional dimensions when it comes to big data (SAS, n.d.):

· Variability: In addition to the increasing velocities and varieties of data, data flows can be inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal, and event-triggered peak data loads can be challenging to manage.

· Complexity: Data comes from multiple sources, which makes it difficult to link, match, cleanse, and transform data across systems. Connecting relationships, hierarchies, and multiple data linkages is important.

What Is the Importance of Big Data?  

It is actually not the amount of data that is important but what is done with that data. Particularly in the business environment, analysis of that data can find answers to questions about potential reductions in cost and time, and help in making smart decisions about new product development (SAS, n.d.). Other critical business tasks can be supported by using the gathered data to determine what has caused failures, issues, and/or defects or detecting and mitigating fraudulent behavior before the organization is severely affected (SAS, n.d.). If done properly, data collection (and analysis) will allow a business to focus time, personnel, and resources on the issues that will generate the greatest returns.

What Are the Sources for Big Data?

Sources for big data: Internal data sources: 88% Transactions, 73% Log data, 57% Emails; External data sources: 43% social media, 38% Audio, 34% Photos and video.

Where Does Big Data Come From?

Source: IBM Big Data and Analytics Hub

The three most prevalent sources for large amounts of data are the following:

· Streaming data that comes from the information infrastructure within an organization. For example, all transactions accomplished via the IT systems within the business are captured on a daily or even hourly basis. This includes logs of daily activities and email or other types of messages received from internal sources.

· Social media data, including audio, photo, and video files that are retrieved from watching activity on Facebook, the business's website, or websites of related businesses or competitors. This can aid marketing, sales, and customer support functions. 

· Publicly available sources, such as data.gov, the CIA World Factbook, or the European Union Open Data Portal. A browser search of "sources for data sets" will provide a long list of sources for data that addresses many areas of interest.

Sampling, Surveys, and Polls  

When researchers are looking to collect data about a particular topic that affects a large group or even the entire population, it is not cost-effective or even practical to contact every member of the group or the population for data input. Instead, most such studies are based on gathering responses from a sample, or a subset of the entire population. Although everyone in the population is not individually contacted, the results of sampling are considered to be representative of the population.

In order for the sampling to be truly representative, it is critical that the sample subset be representative of the large group. Randomly selecting the subset of participants is the primary way of guaranteeing that anyone could have been selected. "The basic principle: If selected correctly, a randomly selected small sample of a population of people can represent the attitudes, opinions, or projected behavior of all of the people from which the sample is obtained" (Newport, Saad, & Moore, 1997).

Whereas sampling is the method for creating the pool of persons to be contacted, surveys and polls are the means by which the data is collected from the sample.

What is the difference between a survey and a poll? Both may use sample sets of participants that represent the group that is being surveyed or polled. A poll typically asks one question while a survey is generally used to ask a range of questions.

Here is an example of a poll question—one multiple-choice question and a list of answers from which the participant selects one or more answers (including "Other," which allows the participant to enter an answer not in the list).

Poll question that asks What is your favorite color? and a list of answers of red, orange, yellow, blue, indigo, violet, and other.

Sample Poll

A survey, on the other hand, allows for asking more than one question that covers a wider area of interest. And it allows for different types of questions and/or responses, including such things as age, address, as well as multiple-choice questions. Customer satisfaction surveys are one common form of a survey. And students are asked to complete a course evaluation survey in the latter weeks of each course at UMUC.

Here is a sample survey based on a Likert scale (ranking the response using values 1-5). There are four questions, making this a survey and not a poll, which typically consists of a single yes/no/uncertain.

Mythical Unicorns just sold you a T-shirt. Check the response that best matches your satisfaction level with this product.

 

Strongly Disagree

 

 

 

Strongly Agree

 

5

4

3

2

1

The item was as specified

O

O

O

O

O

The size was as expected

O

O

O

O

O

The material is as expected

O

O

O

O

O

I would recommend this item to others

O

O

O

O

O

Two of the most familiar polling companies are Gallup and Nielsen. Gallup's method of selecting polling participants is to generate a list of all phone numbers (landline and cell phone) in the United States and then use a subset of that list, which covers all geographical areas based on area codes, to call and interview individuals.

Nielsen uses a slightly different approach. Households that participate are selected at random from a predefined sample based on census data. The census data provides critical information on household income, size, age of residents, etc. A certain number of houses from each group is selected.

The sample size in any poll is critical in meeting validity criteria for poll results. However, simply increasing the size of the sample group does not equate to increased validity. Gallup and other major polls use sample sizes of between 1,000 and 1,500 for standard surveys of the US population "because they provide a solid balance of accuracy against the increased economic cost (Newport, Saad, & Moore, 1997).

Nielsen's TV ratings work on the same principle. Nielsen gets around 5,000 households to agree to be part of the representative sample to find out who is watching TV and what those people are watching. To be accurate, that sample set of 5,000 households needs to be representative of all U.S. households with TVs (How Stuff Works, Entertainment, n.d.).

Meters are installed in the home, and these meters can track when TV sets are on and the channels that are being watched. Data gathered by these boxes is then sent to the company each night. Nielsen than compares the data received with the programs that are on TV at any time, and thus determines how many people watch which program.

This research is worth billions of dollars. Advertisers pay to air their commercials on TV programs using rates that are based on Nielsen's data. Programmers also use Nielsen's data to decide which shows to keep and which to cancel. A show that has several million viewers may seem popular to us, but a network may need millions more watching that program to make it a financial success. That's why some shows with a loyal following still get canceled (How Stuff Works, Entertainment, n.d.).

Who Uses Big Data?  

Big data plays a role in almost every industry. SAS (n.d.) provides a summary of some of the industries affected by big data.

Banking

Understanding customers and customer satisfaction, minimizing risk and fraud while maintaining regulatory compliance

Education

Impact school systems, students, and curriculum by identifying at-risk students, ensuring adequate student progress, and implementing systems to evaluate and support teachers and principals

Government

Managing utilities, running agencies, dealing with traffic congestion, preventing crime. Governments must also address issues of transparency and privacy.

Health Care

Respecting privacy as it relates to patient records, treatment plans, prescription information while at the same time uncovering insights into improving patient care

Manufacturing

Boost quality and output while minimizing waste; support for more agile business decisions.

Retail

Building customer relationships, marketing, handling transactions, revitalizing business

The Power of Information

So now you have the data. What do you do with it? If all you have is a set of random numbers, they tell you nothing until you also know the context in which these numbers were gathered. For example, if you were given 32 random numbers, some repeated, between 18 and 95, they would be meaningless until you were also told that these numbers were the ages of the students in your course. Until you know the context, the data by itself only provides you with the foundation for eventually organizing the data in such a way as to provide you with the information needed to find answers to the questions or tell the story. How is that organization done? How is data transformed into information?

Data, as you know, comes in many forms—numbers, words, pictures. In an example, we will use a set of numerical, discrete data. 

Here is the data set:

 30.22, 35.5, 45.5, 56.32, 52.62, 49.90, 52.26

This raw data is useless as displayed here. However, once the context is included (the question), the data now can result in useful information. The question was: What was the annual precipitation in Reading, Pennsylvania between 2000 and 2006? Now you have data you can work with. The data can be transformed into useful information by importing it into a spreadsheet and displaying results of minimum and maximum values, or creating a picture or graph of the same results.

Bar graph showing rainfall amounts by year: 2000: 30.22, 2001: 33.4, 2002: 45.5, 2003: 56.32, 2004: 52.62; 2005: 49.9; 2006: 52.26

Rainfall Amounts

In such a small data set, you can easily pick out the minimum and maximum rainfall. But in a data set that covers 1863 to 2006 (123 years), it would be more difficult. This is where the analysis functions in spreadsheet programs become helpful in organizing the data and providing you with usable information about the topic or question.

One caveat—the value or correctness of your information is dependent upon the correctness of the data you use to generate that information. An old adage that applied to computer programs applies here, as well: "Garbage in; garbage out."  If your raw data is bad, your answer to the question, or the resulting story you tell may be flawed as well.

In the tutorials, you will work with some of the basic functions that spreadsheet applications provide.  Although we will work exclusively with Microsoft Office's Excel program, there are other spreadsheet apps available, some created specifically for work with big data and those that provide more analysis functionality than that provided by Excel. See "List of Spreadsheet Software" on Wikipedia for a list of free and proprietary spreadsheet apps. But we will be using Excel for tutorials and exercises (projects) in this course.

How Is Information Used?  

pyramid with 3 layers: bottom (data); next up (information); top (knowledge)

Data Pyramid

Information is the next step up in the information theory pyramid. It is the foundation for knowledge. But in a more practical way, information provides the basis for making decisions. It is the next step in answering the question or telling the story. 

Making decisions is a part of everyday life—from what to prepare for dinner, to more life-changing decisions such as where to live, whom to marry, or what to choose as a course of study in college. To make the best decisions, it is important to gather the relevant information. You can delay making a decision if all you do is endlessly search for information without coming to any conclusion. Or you could take a vote, throw a dart at a list, or toss a coin.

However, it is possible that an inability to make a reasoned decision is because there is not enough information or too much information. Even if your information is on target, if you involve too many others in the decision process, the need to include everyone's views and values may end up being too complicated. 

If the decision involves change, that potential movement in the status quo may make the solution too difficult to accept.  Finally, if you just don't care about the outcome, one way or the other, it may be hard to invest the effort needed to come to a conclusion. Regardless of the outcome, information is gathered for a reason (Skills You Need, n.d.)

Here is another way to look at the uses of information. These uses are tied very closely to a "need" that has been identified (that question or story) for gathering information. As such, the list does not directly indicate how the information is used, but why it was gathered. The assumption may be made that the information is then used to address the issue (Taylor, 1991):

· Enlightenment: context information

· Problem understanding: better comprehension of a specific problem

· Instrumental: what to do and how to do something

· Factual: precise data

· Confirmational: verify a piece of information

· Projective: future oriented

· Motivational: relates to personal involvement

· Personal or political: relationships, statue, reputation, personal fulfillment

Summary  

Data does not depend on information, but information depends on data. Raw data by itself has no meaning. Information results when context or meaning is added to the raw data, resulting in at least the first level of understanding the answer to whatever question prompted the gathering of that data. Here are some properties of data:

· Data can be stored, copied, duplicated, modified, and/or moved.

· Data remains static—it does not necessarily improve over time; rather, data can decay as it becomes outdated or is no longer applicable to the question being asked.

· Data has no value until it is converted into usable information ("Value" here only refers to the fact that, standing alone, raw data does not tell a story or answer a question. The data itself may have great "value," financial or otherwise, to the person or entity that seeks to use that data).

· Data that is incorrect or used outside of the context for which it was gathered may result in incorrect information.

Information, on the other hand (Doyle, 2014):

· results when context is added to data—what, when, where, why, how the data was collected

· is data that has been converted into a form that makes understanding of the data useful; it is data with meaning

· becomes the basis for understanding a question, or making inferences, or making decisions; it helps tell the story.

We have begun with an overview of the first two elements in the information pyramid: data and information.  Readings in the following weeks will focus on knowledge, knowledge management, and business intelligence.

References  

Doyle, M. (2014, August 6). What is the difference between data and information? [Blog post]. Retrieved from https://salespop.pipelinersales.com/sales-management/difference-between-data-and-information/

How Stuff Works - Entertainment. (n.d.) How do television ratings work? Retrieved from http://entertainment.howstuffworks.com/question433.htm

IBM Big Data and Analytics Hub. (n.d.). Where does big data come from? Retrieved from http://www.ibmbigdatahub.com/infographic/where-does-big-data-come

Newport, F., Saad, L., & Moore, D. (1997). How are polls conducted? In M. Golay, Where America stands. John Wiley & Sons.

Pierce, R. (2017, February 15). Data, probability, and statistics. Retrieved from https://www.mathsisfun.com/data/index.html

Quantitative Environmental Learning Project. (n.d.). DataSet#049; Reading, PA precipitation. Retrieved from  http://seattlecentral.edu/qelp/sets/049/049.html

SAS. (n.d.). Big data: What is it and why it matters. Retrieved from http://www.sas.com/en_th/insights/big-data/what-is-big-data.html

School of Data. (2013). What is data? Retrieved from https://schoolofdata.org/handbook/courses/what-is-data/

Skills You Need. (n.d.). Decision making. Retrieved from https://www.skillsyouneed.com/ips/decision-making.html

Taylor, R. (1991). Information use environments. In B. Dervin & M. J. Voight (Eds.), Progress in communication science. Norwood, NJ: Ablex.

 

Information and Information Sharing (Networks)

How Is Information Shared?

Before we look at the vehicles by which information is shared, we need to consider what information sharing means. Information sharing, according to Technopedia, "describes the exchange of data between various organizations, people, and technologies." There are several types of information sharing ("Information Sharing," n.d.):

· Information shared by individuals (email messages, chat messages, postings on Facebook or videos posted on YouTube, research papers submitted to an online forum)

· Information shared by organizations (such as business and business financial reports, or the RSS feed from the online branch of a cable news station)

· Information shared between firmware/software (such as the IP addresses of available wifi hotspots or the link between your computer and smart TV). 

The sharing of information did not suddenly start with the advent of technology.  Humans have been sharing information ever since there were two who could communicate in any way. But in the not too distant past, it was challenging for information to be shared electronically, either because there was no internet connection available or because hardware and software applications on the two ends of the communication line could not really "talk" to one another.

The introduction of wide- and local-area networks, networks within businesses (intranets), and of standardized protocols and application compatibility among a widely diverse set of computer hardware and software have all facilitated the huge growth in global information sharing. This growth is exponential as more networks and organizations connect and information becomes easier to share across the internet ("Information Sharing," n.d.).

Emerging Technologies That Support Information Sharing

How big is the internet? The internet is a massive collection of networked computing devices. The World Wide Web is the means by which these computing devices share data or information. We often use the terms internet and World Wide Web as if they mean the same thing.

So the size of the internet is a difficult question to answer. Is the size based on the number of web pages? Or is it the number of individual pieces (bytes) of data available and/or shared on a more or less daily basis? Should the dark web (the portion of the internet accessible only with specific browsers) be included? If the size is based on data, then should the data stored in the "cloud" also be included? Or should we simply count the number of computing devices that may be accessing data? Or the number of web pages that can be reached? One can see the challenge in trying to assign a size to the internet.

· Data: Think of a byte as representing a single letter or character that you could type via your keyboard. Based solely on data, it is estimated that the growth of the internet will double every two years and by 2020 will be approximately 40 zettabytes (40,000,000.000,000,000,000,000 bytes) (Live-Counter.com, n.d.).

· Websites: On the other hand, if the size of the internet is calculated on the number of websites that you could access via Google, Bing, Yahoo, Edge, or any other browser, then the size of the number of pages on the World Wide Web may be estimated at 2 billion (de Kunder, 2018).

· Computing devices: Finally, what is the estimate of the number of computing devices that have access to the internet? We have to consider all devices that connect to the internet, i.e., computers, smartphones, watches, traffic signals, thermostats in homes (all of this interconnectedness is called the internet of things). The prediction is that by 2020, there will be four such devices connected to the internet for every person on the planet—about 24 billion (Business Insider, 2018).

Whichever means is used to think about the size of the internet or the World Wide Web, the numbers are beyond ones we can even imagine.

But data/information sharing relies on networks. So let's look at some network basics.

Network Basics

How does data get from one computer to another over a network? When you send an email or text message, post a response to a discussion question, or submit a file to the assignment folder in class, how does that data get from your machine to the destination? We will address these questions here in a simple way without delving too deeply into the technical underpinnings or the architecture of a network.

Clients and Servers

First, there are at least three computing systems in play, yours (the origin), the computer at the destination, and, in between, a server. In reality, there are many more computers that are involved in this transmission, but we will focus here primarily on the origin and end point of the data that is being sent. The computers at the origin and the destination are considered clients. A server provides the services that a client uses.

First, a brief bit about the server. There are different types of dedicated servers (servers that are never clients as well), depending upon the services they provide to clients (e.g., email, internet access, storage of files, even access to a printer). Typically, dedicated servers run a specialized type of operating system that enables these computers to handle the functions of a server. But a single server can provide multiple functions, and some can also be clients.

In a home network, at least one computer that is connected to a router acts as a server for the other computing devices in the home. The operating system on that computer makes this computer both a server (for the other devices on the home network) and a client which accesses and uses the services of servers outside of the home network. A dedicated server is not required in this type of network, sometimes called a peer-to-peer network.

When you send data, you are the client and you connect to a server that processes the data according to the type it is. In the case of an email server, it takes your email and sends it forward to the destination computer. In the case of posting to a discussion or submitting an assignment, you are still the client, and the university's dedicated educational servers take your submission and send it to the correct destination in your classroom, which is located on the university's computers.

Sending and Receiving Data Over the Internet

Let's say you want to send a picture of your cat to your favorite aunt.

Email format with the To: line indicating "favorite aunt" and subject line: "my cat." The message body: "Here is a picture of my favorite cat from your favorite niece." The picture included is that of a sleeping white cat.

My Cat

You attach the picture to the email and send it on. What actually happens to that picture and the email message to which it has been attached?

All data in the computer is stored actually as a series of 1s and 0s (what you see on the screen is a translation of that set of digits). So the email and the picture you are sending are stored in the computer as 1s and 0s, and small sets of these numbers (called packets) that make out the contents of your email are sent out over the network.

Each packet has an origination address (your computer), a destination address (your aunt's computer) and a sequence number. The sequence number allows the packets to be reassembled in the correct order at the destination. The picture will be cut up into packets and the packets put on the network.

Remember that the internet is an interconnection of millions of pathways between connected computers. The email server determines which pathway is best to use – for each individual packet it sends forward. We will talk about transmission speed and bandwidth and how this affects the pathways in the next section. The individual packets that make up the cat's picture may not all take the same route to the server and then on to the destination. (Here, each rectangle represents a packet.)

Picture of the white cat cut into 7 rectangular sections.

White Cat in Sections

Because each packet may take a different path to the destination, these packets may not arrive in the same order in which they were sent out.

The seven rectangles that comprise the picture of the cat are jumbled and not in an order that allows the picture to be complete.

White Cat in Jumbled Sections

So the picture needs to be reassembled at the destination in the correct order. That's where that sequence number comes into play. When the individual packets are received from the email server, they are put back together in the correct order. And the cat appears complete.

Leftmost picture again shows the seven rectangles that make up the picture of the cat but arranged in correct order. The rightmost picture shows the complete picture of the sleeping cat.

Cat Arranged in Correct Order

Transmission Speeds and Bandwidth

Have you ever noticed a lag time when attempting to download a web page, a file, or a video via the internet? Or heard others complain about how "slow" the internet is? Or have not been able to even load a web page? What are the elements that affect the transmission of that data over the internet (the largest of the networks)?

Transmission speed and bandwidth are separate elements, although both are measured in bits per second, kilobits per second or megabits per second (a bit is a single 1 or 0, a kilobit 1024 bits, and a megabit about 1 million bits). Bandwidth refers to how much data (how many bits) can be pushed during any one second. Transmission speed refers to how fast those bits are sent or received. The two are interconnected and hard to separate because one affects the other.

Transmission speed is measured in bits per second. Anything less than 1 megabit/second (Mbps) is considered slow (turtle speed), while 50 Mbps and up (even 1 gigabit/second) are preferred (cheetah speed). But speed can be affected by the equipment you are using—your computer, your router, or even the cable used to connect your computer or router to the internet access point. Speed can also be impacted by the number of users who are accessing the same site (Opera blogs, 2015).

Your computer may be able to handle (send or accept) large amounts of data in one second, but the speed your internet provider uses to push the data or the capacity of your transmission lines may affect the speed at which that data is sent or received.

On the other hand, bandwidth refers to the number of bits that can be pushed in one second, regardless of the speed at which they are transmitted. Bandwidth might be compared to whether your connection has the capacity to handle a bowling ball or a marble. The larger the capacity, the more data that can be pushed in any one second. But even if your connection can handle a bowling ball, if the transmission speed is that of a turtle, or the connection is clogged by other users, it will appear that the download is slow. The best combination, of course, is a high rate of transmission and a large bandwidth.

In summary, bandwidth relates to the amount of data that can be uploaded or downloaded from your computer (measured in bits/second). Speed (transmission speed) relates to how fast that data can be pushed through the transmission lines to or from your computer (measured also in bits/second) (Tiwari, 2017).

Let's put this into perspective by looking at the elements of a network in your home or one in a small business.

Hardware and Media

Think of all the ways in which home computers are now used: schoolwork, shopping, email, calling family and friends, playing games, watching television shows, downloading music and videos. Many households have more than one computer, creating the need for small home networks. A home network allows computers to share access to the internet; use a single printer; send and receive files, pictures, and other documents; and even share access to televisions and game systems.

Different network types may have different hardware, but share the same components (Wilson & Fuller, n.d.):

· more than one computer

· hardware (modem and router) and software

· a path for the information to follow from one computer to another; this path is the medium, and it can take the form of wires or cables or radio waves

· a firewall—a hardware device or software program that protects the network from malicious users or hackers and makes transactions secure

Communication Hardware

You need a way to allow the computers in the house to "talk" to each other (e.g., share files) and to surf the internet for such services as email, social networking, and search engines needed for news, information, or online research. If you are connecting a computer to the internet, you are joining a vast network of computers. If you are linking computers in the household with each other and each with the internet, you are creating a home network that, in turn, links to the vast network of computers that make up the internet. We will look at two of the components needed to establish and maintain that connectivity.

Modems

Signals between computers travel over wires such as phone lines, cables such as those provided by your cable company, or, less often, satellite signals. The data that is entered into your computer for processing and the information that results from processing as output are stored in digital format, commonly referred to as zeros and ones.

But telephone wires were originally designed to transfer the sound waves represented by the human voice. Wires transfer computer-generated data, too, but via a different format: analog signals. Sound waves are called analog and represent data continuously, like a clock with hands, rather than digitally (with a series of zeros and ones) as computers do.

A device is required to convert the digital output from the computer into an analog format for transfer and then to rebuild the original data into digital format at the receiving computer. That device is called a dial-up modem (modulator-demodulator). A digital subscriber line (DSL), CATV cable (the same type of cable that provides cable television), or fiber optic cable will support transmission of the data in digital format, and your modem will most likely be a digital or broadband modem.

The last type of modem is a wireless modem. This type of modem uses the cell phone network and connects to the internet wirelessly via cell signal providers.

The following images are examples of standalone (external) modems. Each would have an RJ11 jack for a DSL connection, or a coaxial jack for cable, and at least one Ethernet jack. This device is what you use to actually receive your internet connection through phone line or cable.

Image of a Cellpipe modem by Bell.

Modem 1

True Tech Talk Time (2011), Wikimedia Commons

Image of an analog modem

Modem 2

Source: Digitalsignal, 2013, Wikimedia Commons

This picture shows the connection options.

· gray, where a telephone line would be connected for a DSL service

· blue, where a USB cable might be connected

· yellow, where an Ethernet cable would be connected:

Modem with ports for ethernet, USB, DSL

Modem 3

Source: Feureau (n.d.), Wikimedia Commons

A connection for a coaxial cable is not included in the above picture.  This is a picture of a typical coaxial cable.

This is a picture of a typical coaxial cable.

Coaxial Cable

Routers

Image of a router in which each image would be connected via ethernet cable.

Wired Router

Source: Asim Saleem, Wikimedia Commons

A router with Wi-Fi connection.

ASDL Router With Wi-Fi

Source: Asim Saleem (2007), Wikimedia Commons

A dual-bank Wi-Fi router

TPLink Router

Source: Firecracker PR (2013), Flickr

The top image is of a wired router—each computer would be connected via Ethernet cable. The other two are wireless routers.

 

If you are connecting more than one computer to the internet, the other component you will need is a router. In general, a router connects networks that use different communication protocols, such as a home network and the internet. In a typical home network, a router is used to allow multiple computers to access the internet or other computers via a single modem. The router will receive data from the individual computers and send it to the correct destination. Just as important, the router sends the received data from the internet or other sources to the correct computer in the home.

In some cases, the modem and router may be combined into a single device. Although it is not accurate to say that modems and routers are identical, most likely the small box supplied by your Internet service provider (ISP) is likely a combination of both modem and router. If the device that is connected via phone line or cable has an antenna, it is likely a combination of modem and wireless router.

Network Configurations

1. One computer in the system, called the server or host, is physically connected to the router via a cable. The other computers in the home are clients that need to be connected to the host or primary computer (the server). The router must be positioned between the primary computer and the modem.

Configuration of a computer network with internet and service provider, modem, router, and three stations connected via cable.

Network Configuration 1

Source: Janet Zimmer, Creative Commons

2. The client computers may all be connected directly to the router via ethernet cables:

Configuration of a computer network with internet and service provider, modem, router, and three stations connected via cable.

Network Configuration 2

Source: Janet Zimmer, Creative Commons

3. The computers may all be connected to a wireless router. Even if a wireless router is used, one computer might be cabled to the router. But none of the other computers needs to be cabled to the server. Each can access the router, and thus the modem, without the need for a server machine. This is definitely the case if you are connecting to the internet via a Wi-Fi hot spot. You are accessing other computers or the internet via a wireless router that is in turn connected behind the scenes to a modem, which connects to a wired network.

Configuration of a computer network with internet and service provider, modem, router, and three stations connected via a wireless router.

Network Configuration 3

Source: Janet Zimmer, Creative Commons

Now let's expand this into a larger environment, such as a business that has multiple employees, possibly even spread over various countries in the world.

LANs, MANs, WANs, and GANs

Instead of having a desktop or laptop acting as the server, data and information is shared with others on the network through connections to other computers and to a larger mainframe computer or network attached storage (NAS) hardware device, which acts as the server. The peripheral computers on the network may be connected via cables or via wireless access points.

The basic components are the same—multiple computers connected so that data can be shared. The difference between a local area network (LAN), metropolitan area network (MAN), wide area network (WAN), and global area network (GAN) is geographical—that is, the limitation of distance between the computers and the larger mainframe or server than controls the network and data flowing across that network. MANs, WANs, and GANs also typically consist of multiple networks.

· In a LAN, the network is confined to a limited area (think of a college campus).  

· A MAN would typically be found connecting various networks in a city (so that you can access the municipal network, the police, etc. Or, for example, the MAN covers the computer networks at the UMUC main campus in Adelphi, Maryland, with the campus in Largo, Maryland.

· A WAN covers areas that consist of multiple LANs or MANs, or even networks based in different countries. Typically, a WAN will use satellites to provide a bridge between widely dispersed networks.

· A GAN is not limited to any geographical area and, in fact, spans the entire globe. The internet might be considered a GAN.

Image of a local area network (LAN) featuring a central server computer, and spokes coming out of the server leading to a printer, a cloud representing the internet, and a hub of individual stations.

LAN

Source: Microsoft

Image of a metropolitan area network (MAN), showing how computers are connected within a metro city area, with a circular area of machines and then connections extending outward to buildings and offices.

MAN

Source: Microsoft

Image of a wide area network (WAN) showing a partial world globe with microwave antennas, satellite earth stations, host computers, telephone wires, local telephone exchanges, and fiber optic cable connections.

WAN

Source: Microsoft

Image depicting a GAN (global area network) showing planet Earth in space surrounded by communication satellites.

GAN

Source: Microsoft

Cloud Computing

If you or a business employ cloud computing, (if you are using Office 365, for example, and have not downloaded a copy of Office to your local laptop), that simply means that instead of storing and accessing data and programs via the mainframe server, the access is solely over the internet. There still is a physical server (or servers) where the data and programs are stored, but access is solely via the internet.                     

Image that shows a cloud containing servers, virtual desktop, software platform, applications, and storage data with an arrow pointing to an external router, switch, and units for end users.

Cloud Computing

Source: Microsoft

What Is the Internet of Things?

"The internet of things (IoT) is a computing concept that describes the idea of everyday physical objects being connected to the internet and being able to identify themselves to other devices" ("Internet of things," 2017). Another way to define the IoT is that of a network of internet-connected objects able to collect and exchange data using embedded sensors (Meola, 2016). Think of these "objects" as devices that can recognize and connect to other devices (without human intervention) and can, as a result, share data and even analyze that data. Most of these devices are, by nature, wirelessly connected. Experts predict that more than 24 billion IoT devices will be in place by 2020 —about four devices for every human on the planet (Meola, 2016).

Why Is the Internet of Things Important?

You are probably already aware of some of the things that connect separate devices, such as a home alarm system and a smartphone. There can be economic benefit from analyzing the resulting data streams. Here are several examples from SAS (n.d.):

· Intelligent transport solutions speed up traffic flows, reduce fuel consumption, prioritize vehicle repair schedules, and save lives.

· Smart electric grids more efficiently connect renewable resources, improve system reliability, and charge customers based on smaller usage increments.

· Machine monitoring sensors diagnose—and predict—pending maintenance issues, near-term part stockouts, and even prioritize maintenance crew schedules for repair equipment and regional needs.

· Data-driven systems are being built into the infrastructure of "smart cities," making it easier for municipalities to run waste management, law enforcement and other programs more efficiently.

Here are some of the devices, current and future, that are or will be part of the IoT: smart thermostats, light bulbs, refrigerators, toothbrushes, pet feeders, and coffee makers. Various IOT devices are implemented in a large-scale fashion in things like security systems, smart homes, and factories.

Because of this incredible number of connected devices being put in place, there is rising concern about security and privacy. "Hackers could penetrate connected cars, critical infrastructure, and even people's homes" (Meola, 2016), prompting tech companies to assess the cybersecurity concerns.

Summary

Before the advent of the internet and its very global reach, data was often kept in silos, which prevented easy sharing with other entities. This was the result of formats for the data that were proprietary (created by and for a specific business or entity). The data could not be easily ported to other recipients, or simply could not be either exported or imported because of incompatibilities between the hardware and/or software on the two ends of the data pipeline.

For example, dates can be stored in various formats such as 03-04-2017 or March 4th , 2017, or as in countries outside the United States where the date comes before the month (04/03/2017 refers to March 4th, 2017). These different formats made the sharing of a date field problematic. For the most part, these problems have been taken care of by programs that recognize the various formats for dates, making the sharing of data between computer networks commonplace. This has also been driven, to a large extent by social networking. Technopedia summarizes this nicely:

Facebook has 750 million accounts, YouTube has over 400 million, and the other social networking sites and applications have established between them a sharing network of over a billion people. In terms of information sharing, this is a global proportion, with almost 10 percent of the world's population sharing information across common networks regularly ("Information sharing," n.d.).

Information sharing, if used intelligently, can lead to a more effective way to manage any organization, whether it is a government or a business. The sharing of information about products and services can lead to an improvement in customer access to services and customer satisfaction. Local and international access to banking and other financial products, as well as shopping, are already common and go beyond the entertainment value of a social media site such as Facebook.

Intelligent information sharing can result in lower costs, as well as "improving overall accuracy of public data and allowing organizations and individuals alike to have access to information that they might need and entertainment that they want to experience" ("Information sharing," n.d.).

References

Business Insider. (2016). There will be 24 billion IoT devices installed on Earth by 2020. Retrieved from http://www.businessinsider.com/there-will-be-34-billion-iot-devices-installed-on-earth-by-2020-2016-5

de Kunder, M. (2018). The size of the world wide web (the internet). Retrieved from http://www.worldwidewebsize.com/

Futurism.com. (2018). By 2020, there will be 4 devices for every human on Earth. Retrieved from https://futurism.com/by-2020-there-will-be-4-devices-for-every-human-on-earth/

Information sharing. (n.d.). In Technopedia. Retrieved from https://www.techopedia.com/definition/24839/information-sharing

Internet of things (IoT). (2017). In Technopedia. Retrieved from https://www.techopedia.com/definition/28247/internet-of-things-iot​

Live-Counter.com. (n.d.). How big is the internet? Retrieved from http://www.live-counter.com/how-big-is-the-internet/

Meola, A. (2016, December 19). What is the internet of things (IoT)? Meaning and definition. Retrieved from  http://www.businessinsider.com/what-is-the-internet-of-things-definition-2016-8

Opera blogs. (2015, June 24). What factors affect your internet speed? [Blog post]. Retrieved from https://blogs.opera.com/news/2015/06/what-affect-internet-speed/

SAS. (n.d.). Why is the internet of things important?  Retrieved from  https://www.sas.com/en_us/insights/big-data/internet-of-things.html

Tiwari, A. (2017, July 14). What's the difference between internet bandwidth and speed? [Blog post]. Retrieved from https://fossbytes.com/whats-the-difference-between-internet-bandwidth-and-speed/

Wilson, T. V., & Fuller, J. (n.d.) How home networking works. Retrieved from https://computer.howstuffworks.com/home-network.htm

Licenses and Attributions

Wired router by Asim Saleem is available under a Creative Commons Attribution-ShareAlike 3.0 Unportedlicense.

ASDL router with Wi-Fi (2007) by Asim Saleem is available under a Creative Commons Attribution-ShareAlike 3.0 Unported license.

TP-Link Router (2013) by Firecracker PR is available under a Creative Commons Attribution 2.0 Genericlicense.

Network Configuration 3 by Janet Zimmer is available under a Creative Commons Attribution 3.0 United States license.

Cellpipe modem by True Tech Talk Time (2011) on Wikimedia Commons is available under a Creative Commons Attribution-ShareAlike 3.0 Unported license.

Analog modem by Digitalsignal (2013) is available under a Creative Commons Attribution-ShareAlike 3.0 Unported license.

Modem with ports by Feureau is available under a Cti rea ve Commons Attribution-ShareAlike 3.0 Unportedlicense.

Network Configuration 2 by Janet Zimmer is available under a Creative Commons Attribution 3.0 United States license.

Knowledge Management, Business Intelligence, and Databases

Business Knowledge

Let's turn our attention to the world of business and how the data and information gathered by and for the business can lead to a sense of business knowledge or business intelligence.  First, we'll define business knowledge, also known as knowledge management (KM) or sometimes business intelligence (BI).

The Power of Knowledge in Business

Knowledge management is the gathering, organizing, sharing, and analyzing of the data and information to which a business has access. The data that is stored in a repository can then be organized, shared, and analyzed.

That data comes from many sources, both from within the business itself (organizational memory including experiences and skills of the workforce, documents regarding customers and suppliers, existing designs and processes etc.) and from outside sources (market research and market reports, talking to customers and suppliers, professional associations and trade bodies, trade exhibitions and conferences, and collaboration with associated institutions and businesses) (NIBusinessInfo.co.uk, n.d.)  However, knowledge management may very well not mean the same thing to different companies.

Sometimes KM is defined or explained in terms of productivity gain. Another description of KM can be couched in terms of data sharing of business intelligence rather than on the sharing of knowledge. Finally, some companies focus on implementation of employee portals (sources where employees will have access to the data needed for their specific jobs). Seiner (2002) offers the following definition and impact of knowledge management:

In every case, the intentions of sharing knowledge are good, even if the definition of knowledge management and the definition of knowledge itself varies from place to place. The definition of knowledge management that I use states that KM involves a discipline of spreading knowledge of individuals and groups across the organization in ways that directly impact performance. This impact on performance can take many shapes and forms. The impact can be related to the promotion of "healthy" or smart business activities, the involvement of knowledge stewards in daily activities, the limiting of the risk associated with people leaving the organization, understanding employee needs for knowledge, and making that knowledge, information, and data available to them.

Why Is Knowledge Management Important for a Business?

Here are three key reasons why it is important for a business to generate and manage KM (Quast, 2012):

· KM facilitates decision-making capability.  Processing an overwhelming amount of information can get in the way of achieving high-quality decisions. David Derbyshire, in an article by Quast (2012), reports that scientists claim that the amount of data sent to a typical person in the course of a year is "the equivalent of every person in the world reading 174 newspapers every single day." A knowledge management system that can make sense of all this data can facilitate better, more informed decisions.

· KM builds learning organizations by making learning routine. Simply put, capturing learning from experience builds knowledge that can then be used to streamline operations and improve processes.

· KM stimulates cultural change and innovation. KM programs can help managers be open to change and fosters an environment open to ideas and insight. This type of environment can lead to innovation, even for owners of businesses regardless of size.

All of that data needs to be catalogued, sorted, filtered, and linked in order for intelligent data analysis to occur with results that a business can use. This involves data mining in addition to data analysis. Remember that the goal of KM is to provide the business with output that can be used to address specific business tasks and projects (Rouse, n.d.). In order to understand the basis of this type of business knowledge or business intelligence, we need to consider the foundations of data storage—databases.

Databases: Basic Concepts

In December 2013, Target, the department store chain, announced that a security breach had allowed unauthorized individuals to gain access to information about customers who had recently shopped at Target and used a credit card or debit card to make purchases. Then in January 2014, Neiman Marcus reported the same problem: Unauthorized people had accessed its database and taken shoppers' personal information, including names, addresses, phone numbers, and e-mail addresses. Both security breaches had grave implications for the personally identifiable information (PII) of the shoppers, with the potential for these unauthorized persons to complete credit card or loan applications, make purchases, and potentially affect credit ratings for millions of customers.

All of the customer data had been stored in databases. What makes it so convenient to simply swipe your credit card or debit card when making a purchase, and to have the purchase approved almost instantaneously, is also what makes these security breaches so potentially devastating. All of the information the store needs for billing—the card numbers, billing address, phone number, email address, and more—is stored in large databases maintained by the store. This collection of data about you is organized in such a way that retrieval can be done in seconds. That is the strength of databases and database software.

Basic Organization of a Database

So, what is a database? It is a computer-based collection of related pieces of data organized so that the data can readily be accessed, managed, and updated. As you recall, computers work with the binary system where a bit is the smallest entity represented and is either a one (on) or a zero (off). Together, eight bits combine to form a byte, which represents one character. The characters represented by bytes may be letters, numbers, and/or special symbols. By themselves, characters generally are not meaningful, but they combine to form meaningful data, such as your first name. Your first name consists of a set of characters ultimately stored as bits in the computer.

Databases operate by organizing data into meaningful groupings called fields, records, tables, and finally databases.

· Fields contain one or more characters or an audio, video, or image file. Fields can be designed to hold only character data or only numeric data, or they can be designed to hold other types of data, such as images or audio or video files, or a hyperlink to a website or other information source. For instance, one field might contain a first name, another field a last name, another the street portion of an address, and so on. Each field is given a specific name. You can visualize fields as being organized in columns of a particular table in the database.

· Records are collections of related fields, usually organized in a row. A record contains the information related to a specific entity—a person, a place, a thing, or an event. For example, a record might contain a first name, a last name, and a social security number. You can visualize the records as being organized into rows in a particular table, where all the components of a single row are made up of fields that represent characteristics of a person, place, thing, or event.

· Tables are collections of related records. A table contains all the records for a particular group or type of thing or event.

· Databases are made up of related tables and other objects such as queries, forms, and reports that help users view data in meaningful ways.

The overall organization of the database, then, moves from the smallest piece of meaningful information (the field) through records and tables to the database itself, which contains tables, queries, forms, and reports.

Interacting With a Database

Database software, also called a database management system (DBMS), allows users to interact with the database at all levels of the hierarchy. After a database structure is designed and the structure is implemented by creating the fields and tables, data is entered typically via a data entry form, a window on the screen that allows the user to enter data into the fields that make up a record. Data can also be modified via the form.

query is a way of retrieving data from the database via specification of criteria that identify exactly what data is to be retrieved and how it might be sorted and displayed. Data can also be modified via the query.

Reports are also used to retrieve data from the database, in which the user can specify how the retrieved data are presented. Reports can be displayed, printed, or shared. Reports can also be used for things like mailing labels.

The term file maintenance refers to procedures that keep data current in the database. File maintenance supports adding, modifying, or deleting records, and creating backup copies of the database.

wizard is a software application that is used to create tables, queries, and reports. The wizard itself is not used to enter or modify data in the fields or records. If you are familiar with Microsoft Access, you may have used wizards to help create elements of a project.

Different Database Models

All databases are composed of the elements identified above. But the organization of the records and tables can be different. What are the models used for structuring a database?

The older, less efficient models are the hierarchical and network models. Both models are restricted in that the organization of data has to be defined up front, making the structure inflexible. It is difficult to add new fields or tables.

Hierarchical Model

In the hierarchical model, the structure is predefined. If new fields are needed, the entire database has to be redefined. This diagram represents an example of the hierarchical structure:

Illustration of the hierarchical model with categories of fund managers, clients, and types of funds and arrows showing the direct downward relationships between the managers, specific clients, and funds. Only one client per manager per category; if there is more than one client, a new row is created.

Hierarchical Model

Source: Janet Zimmer, Creative Commons

The top field—in this example, fund managers—is called the "parent," and the other fields (clients and types of funds) are called the "children" of that parent. There is no way to relate the children of one parent with the children of another—no common key field. For example, it is impossible to retrieve just a list of the client names (a child field) from all the fund manager records without retrieving all the data for all the fund managers.

Think about it as an all-or-nothing situation. To enter the data about an item, person, or place, or even to retrieve even one piece of information about that same item, the entire record for that item has to be retrieved or opened for editing. If you want to know what funds Wolf was managing, you have to retrieve the entire record wherever Wolf's name appears as the fund manager; that is, the fund manager's name, the client name, and which funds this client was invested in. You can see the inefficiency in this structure and why a newer model quickly developed.

Network Model

The network database is an extension of the hierarchical model. It allows a parent record to have more than one child, and child records to be related to more than one subchild record. It is more flexible than the hierarchical structure because new relationships can be established between data. But because the structure still needed to be defined in advance (Williams & Sawyer, 2013), it is still fairly inflexible.

Illustration showing a network model with categories of fund managers, clients, and types of funds and the data connected by arrows showing the relationships between the managers (top dog, wolf, etc.) and clients (Smith, Jones, etc.), and stocks, bonds and annuities.

Network Model

Source: Janet Zimmer, Creative Commons

Relational Model

The relational database model grew out of a need for greater flexibility in adding new fields and tables, and to retrieve just the information desired for a particular purpose. Instead of each record containing all the information about a person, place, event, etc., the information is spread across different tables.

But these tables must have a means of connecting the different types of data that are used to describe an individual, place, thing, or event. That is accomplished by inserting a common key field, called the primary key, into each record to link the records in separate tables. So if one table contains a fund manager's name, another table client information, another fund types, and another the value of investments, a unique field common to each table links the records from the different tables. If an identification (ID) number is the primary key, each record in the various tables that are associated with an individual, for example, will have the same ID number as one of its fields.

Image showing relational model with key field, manager name, client, type of fund, and value of investment as categories and manager and client ID, first and last names, address and contact info, etc., as the data.

Relational Model

Source: Janet Zimmer, Creative Commons

This is how the data in the tables might appear:

Manager ID Table

Manager ID

1

2

3

Manager Name Table

Manager ID

Manager Name

1

Top Dog

2

Wolf

3

DoItYourself

Client Table

Manager ID

Client ID

First Name

Last Name

Address

Contact Info

1

AAA

A

Smith

1234 Hemlock Rd. Huntsville, AL

345-555-4321

2

BBB

M

Garner

3608 Pines Blvd. Lauderdale, FL

954-555-9876

3

CCC

K

Sharpe

34 E. Hilltop Ave. San Francisco, CA

846-123-5555

Type of Fund Table

Client ID

Fund Type

AAA

Stocks

AAA

Bonds

BBB

Bonds

BBB

Annuities

CCC

Stocks

Value of Investment Table

Manager ID

Client ID

Initial Investment

Current Value

1

AAA

$500

$50,000

2

BBB

$99,000

$2,465,723

3

CCC

$4,500

$495,000

In this case, new fields in the existing tables or even completely new tables can be added—perhaps an email address for the client, or a new fund type. If the Manager ID is maintained as the primary key in each new table and the secondary key for the client in the appropriate tables, then links to the other information in the other tables can be maintained without rebuilding the database.

It is easy to retrieve specific information from the relational database, as there is no need to retrieve all the information spread across different tables with all fields, as was necessary in a hierarchical database. The fields from specified tables are accessed via a special database language called Structured Query Language (SQL). This same language is used to create, modify, and maintain a relational database. When you use wizards in Microsoft Access to create databases, enter the data, and retrieve data for specified reports, you have used a wizard based on the SQL needed to manipulate the database.

Object-Oriented Model

In an object-oriented database, the data itself is conceived of as objects. The object can consist of data (character, numeric, etc.), or it can be instructions on what to do with that data. For example, in a relational database, all the elements that make up a dog—nose, eyes, mouth, ears, body, legs, tail—would be stored in separate fields. In an object-oriented database, all these components would be stored in one "object," the dog.

Illustration showing a dog and parts of a dog -- tongue, tail, spots, ear, head to show the difference between a relational database, which divides into parts, or an object-oriented database, which would include only the dog and not the parts.

Object-Oriented Database

Source: Janet Zimmer, Creative Commons.

An object-oriented database is especially useful in areas such as design, scientific experiments, telecommunications, geographical information systems (GIS), and multimedia such as photos, sound files, and video (Williams & Sawyer, 2013). Note that all of these applications rely heavily on the use of images or multimedia. Those types of data cannot as easily be stored in the typical relational database where all the fields are typically character, text, or number-based. In object-oriented databases, the data in various tables are linked to and accessed by the use of pointers instead of common fields as in the relational database.

Multidimensional Model

The final database model is the multidimensional database, built and used optimally for data warehouse and online analytical processing applications. A special type of database language called online analytical processing (OLAP) or multidimensional OLAP (MOLAP) is used to manipulate the data in these types of databases.

Integrity and Validity of the Data

What is common to all database structures, whether hierarchical, relational, object-oriented, or multidimensional, is the support for ease of entry (adding records), retrieving the desired information, updating or correcting records, and ensuring that the data in the database is accurate.

Data integrity should ensure that the data can be verified as correct, is up-to-date or timely, is organized in a way that is useful, is accessible to the user when it is needed, and is cost-effective (that is, its value is greater than the cost to produce the data).

Data validity is accomplished by comparing data that is being entered to a set of rules to ensure that the entry complies with those rules (Shelly & Vermaat, 2013). For example, if text data is entered into a field that has been set up to accept only numbers, the user entering the data should be alerted immediately to the mismatch.

All of the database models support the following advantages, to some extent, over a file processing system (one in which data is stored in flat files—that is, with no connections between the data in the files) (Shelly & Vermaat, 2013).

· Reduced data redundancy: Duplicate data is more easily avoided.

· Improved data integrity: Changes are made in one place instead of needing to search through multiple files or spreadsheets to find where changes need to be made.

· Shared data: A single set of data in the database can be shared with multiple users. Security settings define which users can access, add, modify, or delete records.

· Easier access: With appropriate software and access privileges, a nontechnical person can use the database without needing to know the complexity of the underlying structure.

· Reduced development time: The tools available for creating a database can result in an easier and faster development process than would be required for developing and maintaining multiple separate files that have been created and organized for different types of users or departments.

There are some challenges to the creation and use of databases, as well (Shelly & Vermaat, 2013).

· A database system may be more complex than a series of spreadsheets or lists and may require people with special training to design and implement the database.

· A database consumes more memory, storage, and processing power than a file processing system.

· Because a great deal of information is stored in the database, if it is lost or the data become corrupt and unusable, it may affect all those who need to access the data.

· Unauthorized access to a database containing personally identifiable information (PII) could result in harm to those individuals whose information is accessed. 

Databases and Security Issues

Would it be possible to store all of the available data (in digital form) in a single database? Most likely not, since the volume of digital data doubles almost every year (Vishen, 2013). One recent estimate by EMC lists the total volume of digital data at 4.4 trillion gigabytes (Dartnell, 2014). That data is currently stored in many different databases, and claims of having the largest database by volume are contested. Recent reports agree that the current holder of that title is the World Data Center for Climate (WDCC) operated by the Max Planck Institute for Meteorology and German Climate Computing Center (Vishen, 2013). Also among the world's largest databases (Vishen, 2013):

· National Energy Research Scientific Computer Center (Lawrence Labs)

· AT&T (calling records)

· Google

· Sprint (calling records)

· LexisNexis (legal research)

· YouTube

· Amazon

· Central Intelligence Agency

· Library of Congress 

NSA's surveillance database and PRISM data mining program could contend for the top spot if the number of data records were revealed.

Why Data Security Is Important

It is the confidentiality, integrity, and availability (CIA) of the data in a database that need to be protected. Confidentiality can be lost if an unauthorized person gains entry or access to a database, or if a person who is authorized to view selected records in a database accesses other records he or she should not be able to view (Nuramn, 2011).

If the data is altered by someone who is unauthorized to do so, the result is a loss of data integrity. And if those who need to have access to the database and its services are blocked from doing so, there is a resulting loss of availability. Security of any database is significantly impacted by any one or more of these basic components of CIA being violated (Nuramn, 2011).

Both businesses and home computer users should be concerned about data security. The information stored in databases—client information, payment information, personal files, bank account details, and more—can be hard to replace. The loss could result from one of the following factors (Spamlaws, 2016):

· physical threats such as a fire or a significant power outage

· human error that results in errors in the processing of information or unintended deletion of data, or from erroneous input

· corporate espionage, theft, or malicious activity.

Loss of this data is potentially dangerous if it falls into the wrong hands.

It is in these three areas that a risk assessment of the database's security and protection of the data should focus. Is there a backup procedure that would allow access to the data if the primary database is destroyed by a physical threat? That same backup procedure might be important in case the CIA of the database is inadvertently affected by human error. And what safeguards can/should be put in place to prevent incidents of espionage, theft, or other malicious activity? We will look again at risk assessments later.

How Common Are Database Breaches?

Just how prevalent are the threats against databases? Is it worth the time, money, and personnel effort to ensure that the database is safeguarded? Remember the Target and Neiman Marcus problems that surfaced in late 2013? And the saga of Edward Snowden and the NSA leaks? They were just two of many such database breaches. 

Database breaches are the exposure of database records containing personally identifiable information (PII) or other sensitive information to unauthorized viewers. Risk Based Security (RBS), a group of consultants and founders of the Open Security Foundation (OSF), report that in 2013, there were a record number of data records exposed via data breaches. Over 822 million such records were made available to persons who had no authority to view these records (RBS, 2014). But remember, the number of reported database breaches does not reflect the total number of breaches that occurred. Some companies do not report breaches in order to protect their reputations or to prevent customers from abandoning the company. The following is a short list of what RBS discovered.

· The business sector accounted for 53.4 percent of reported incidents, followed by government (19.3 percent), medical (11.5 percent), education (8.2 percent), and unknown (7.6 percent).

· Hacking was the cause of 59.8 percent of reported incidents, accounting for 72.0 percent of exposed records.

· Of the reported incidents, 4.8 percent were the result of web-related attacks, which amounted to 16.9 percent of exposed records.

· Four incidents in 2013 alone secured a place on the Top 10 All-Time Breaches list:

· Adobe—152 million records. Customer IDs, encrypted passwords, debit or credit card numbers, and other information relating to customer orders was compromised.

· Unknown organizations—140 million records. North Korean hackers exposed email addresses and identification numbers of South Korean individuals.

· Target—110 million records. Information included customer names, addresses, phone numbers, email addresses, credit/debit card numbers, PINs, and security codes.

· Pinterest—70 million records. A flaw in the site's application programming interface (API) exposed users' email addresses.

Even if you were not affected by any of the above data breaches, if you have used a credit card, made an airline reservation, subscribed to a magazine, been a patient in a hospital, or shopped at a chain store (supermarket or department store), or if you are a member of an online social media site, your information is stored in a database. How vulnerable is your PII?

What Are the Most Common Causes of Database Breaches?

As evidenced by the NSA Snowden leaks and the Target breach, no database and no government agency, company, or business is as secure as the owners of that database think. It is difficult for database administrators and security managers to keep pace with the threats and vulnerabilities that continually emerge. And to compound the issues, every company/business/government has different security issues, making it particularly difficult to standardize any one solution. However, there are some common threats and vulnerabilities that seem to occur repeatedly.

Threats

Unauthorized Access by Insiders

The malicious insider with approved access to the system is one of the greatest threats to database security.

People attack computers because that's where the information is, and in our hyper-competitive, hi-tech business and international environment, information increasingly has great value. Some alienated individuals also gain a sense of power, control, and self-importance through successful penetration of computer systems to steal or destroy information or disrupt an organization's activities (Ashiq, 2015).

Another scenario might involve employees affected by a workforce reduction who take customer account lists, financial data, or strategic plans with them when they leave. Proprietary information could end up with competitors or be widely disseminated online (PricewaterhouseCoopers, 2008).

Insiders may also be a threat to database security if they are granted database access privileges that go beyond what their job requires, abuse legitimate database privileges for unauthorized purposes, or convert access to that of an administrator.

Accidental Breaches Resulting from Incorrect—but Not Malicious—Use

The data breach is not always the result of a deliberate attempt to subvert data security; sometimes it is an unintended consequence. For example, employees might export data from the parent database system at work and send it, typically unencrypted, to personal email addresses to enable the employee to work from home. The data then might be subsequently compromised from that home computer. Or a data mining application might contain flaws that allow a user without the correct access credentials to stumble upon database records inadvertently. (Note: If the user deliberately continues to access the data without permission, this situation becomes a malicious insider threat.)

Unprotected Personal Hardware Collection

It is becoming increasingly common for data to be transferred to other personal mobile devices—USB flash drives, smartphones, and tablets. Many people use a mobile device—personal or company-supplied—for business purposes. However, mobile devices are a significant source of data breaches. The devices are lost or stolen, the owners don't install antimalware protection, or the devices even lack passwords (Bruemmer, 2014). Data is at risk if an employee stores any information on the device or if the device is used to access a company's network and/or database (Bruemmer, 2014).

Stolen Laptops

Forgetful or careless laptop owners whose equipment is taken expose data on that laptop to persons not authorized to have access to the data. This can also happen if a laptop is replaced and the hard drive on the original machine is not properly erased or destroyed.

Weak Authentication

A legitimate database user typically is required to submit an ID and password in order to gain access to a protected database. Authentication is the process (internal to the database program itself) by which the credentials of the user are verified and access may be granted. If the process of authentication is weak, an attacker can assume the identity of a legitimate user by stealing or obtaining log-in credentials. Credentials may be obtained by various means:

· Credential theft. The attacker accesses password files or finds a paper on which the legitimate user has written down the ID and password.

· Social engineering. The attacker deceives someone into providing the log-in ID and password by posing as a supervisor, IT maintenance personnel, or other authority.

· Brute-force attacks. Have you ever been locked out of an account after attempting to log in more than three times with an incorrect password? If so, this is the simplest (and perhaps least effective) means of blocking a brute-force attack, whether it is an attempt to access files on your machine or to access a database.

However, not all password-protected systems, databases, or files block access after three attempts. For example, if you have put a lock on a file on your computer, you most likely have not set a limit on the number of attempts on that file. A brute-force attack is a password-guessing approach in which the attacker attempts to discover a password by systematically testing every combination of letters, numbers, and symbols until the correct combination is found. Depending upon the password's length and complexity, this can be difficult to complete. However, there are widely available tools that hackers can use to find the password, and it can be difficult to block all the means by which hacker will try to find the password (UVA Computer Science System Administration Database, n.d.)

Exploiting Weaknesses in an Operating System or Network

Worms, viruses, or Trojan horses could be introduced into an unprotected or poorly protected operating system or computer network that supports the database, leading to potential unauthorized database access (loss of confidentiality), data corruption (loss of integrity), or denial of service (DoS), a loss of access to legitimate users. A DoS may be achieved by causing a server to stop functioning, or "crash," flooding a network with message traffic or overloading resources on the computer, forcing it to stop handling additional tasks or processing.

Theft of Database Backup Tapes or Hard Drives

Database backups typically do not have the same security measures in place that the primary database employs. These backups may not be encrypted, and the media on which backups are stored are also unprotected. Theft of the backup media may allow the attacker full access to the data stored within the backup (Manes, 2015).

Vulnerabilities

There are other means by which databases are exposed to security breaches, and these are considered vulnerabilities that may subject a database to a security breach. These are more passive, but they can do as much harm as direct threats:

· Data at rest (unencrypted information) that is passively residing in storage within the boundaries of company computers, perhaps waiting to be moved to a secure database. Data at rest typically is not as well protected as data that has been entered into the database and enjoys the database security measures.

· Data in motion is information that is being electronically transmitted outside the company's protected network via email or other communication mediums. For example, the data might be transferred to a backup facility that is not part of the internal storage media used for daily work. Or if the company uses the cloud for data storage backups, the transfer might take place outside of the company's protected network. This can lead to a loss of sensitive data if there is a malicious attack via malware during the transfer process or during execution of a flawed business process that allows unauthorized persons to view or obtain the data. (This is not the same as the accidental breach resulting from incorrect but not malicious use noted above, where the home computer to which the data has been transferred is attacked or breached. That accidental breach occurred without any intention of harm by the employee.)

· Poor architecture, in which security was not adequately factored into the design and development of the database structure. This vulnerability may not be discovered until there is an attempted or successful data breach.

· Vendor bugs, particularly programming flaws that allow actions to take place within the database and with the data that were not intended or planned. Much like poor application architecture, this vulnerability may not be uncovered until there is an attempted or successful data breach.

· An unlocked database is one that has no security measures in place to control access or auditing. This seems counterintuitive, but many home users employing a database for personal needs, or even for working on company data while at home, may be working with an unlocked database (Nichols, 2007; (PricewaterhouseCoopers, 2008).

Risk Assessments

In the business environment, it is critical that a thorough risk assessment takes place and be periodically reviewed. The assessment should address (Spamlaws, 2016):

· who has access to what data

· the circumstances under which access to the database may need to change

· who maintains the passwords needed to access the database

· who uses the company's computers for access to the internet, email programs, etc., and how employees access those resources

· what type of firewalls and antimalware solutions to put in place

· the training of the staff

· who has responsibility for enforcement procedures related to data security

There are identified solutions for each of the threats and vulnerabilities discussed here, including well-defined and enforced access policies, use of strong data encryption, vulnerability assessments, policies related to strong passwords, and installation of firewalls. There are companies that specialize in designing plans, procedures, and software to prevent data loss or data leakage. With data loss, the data is lost forever, either by deletion, theft, or data corruption. Data leakage allows unauthorized people to get access to the data, either intentionally or by mistake. So data loss and data leakage can be intentional or unintentional, and can be malicious or just by error (VJ, in Hoff, 2013).

How Can You Protect Your PII?

Protecting databases and the data contained within can be a costly and all-consuming activity. But what does this mean for you, the individual who uses that credit card, makes airline reservations, files taxes online, subscribes to a magazine, has been a patient in a hospital, shops at a chain store, or is a member of an online social media site?

Keep your passwords to yourself.

Do not leave a slip with a list of passwords under your computer, or anywhere where it can be viewed or taken by someone. Giving your password to a friend is not a good idea.

Use different passwords for different accounts.

Remembering multiple passwords can be a challenge, and it's often convenient to use the same password for multiple accounts, ranging from Facebook and your bank account to your Twitter page. The danger here is that a compromise of any one of these accounts could also result in the compromise of others if the same password is used for multiple accounts.

Use strong passwords.

Many of your user IDs must have strong passwords to gain entry into one or more systems. In those instances when you can choose any password configuration, pick a strong password to protect your information.

Check your credit reports annually.

Sometimes people don't learn that they are victims of identity theft until their credit rating and identity are destroyed. It's proactive to get copies of your credit reports from the credit bureaus and review them for errors. Be sure to follow up with the credit bureaus to make corrections if needed. By law, you can get one free credit report from each of the three credit bureaus every year.

Google yourself.

Enter your own name in Google, Yahoo, or other search engine and see what data comes up. Investigate any postings about yourself. Look for any suggestions that your PII may be compromised.

Remember that people can be a weak link in security.

No matter how secure you make your passwords and how careful you are with your technology, there is always a human element to protecting your information.

Control physical access to your devices.

It's important to not leave laptops and other mobile devices unattended in public locations, like a coffee shop or other location with free Wi-Fi. An unattended machine is at risk, for both theft and other security threats. When you aren't controlling physical access to your machine, you shouldn't let it out of your sight.

Remember to log out of a website when you are finished.

Whether it's your email, bank account, retail store shopping account or library account, always remember to log out when you leave the website.

Remember to lock your computer with a password when you are finished.

By requiring a password to access your computer (or other electronic device), you are protecting your information. You are also making your computer useless to a thief who cannot break password locks.

Your PII is out there, stored in multiple databases. Obviously, you cannot implement security measures for the company, business, or government agency that holds your PII. But are there many measures you can take to better protect yourself? Here are a few rules that you can implement:

Summary

We have been looking at databases—their purpose, structure, uses, and security applications in the business arena. A database is a computer-based collection of related pieces of data organized so that the data can readily be accessed, managed, and updated. A database is composed of tables, queries, reports, and other forms that are generated from data in the tables. The tables are composed of records, and the records contain fields.

Database software, also called a database management system (DBMS), allows users to interact with the database at all levels of the hierarchy. Some of the interactions are characterized as data entry forms, queries, reports, file maintenance, and wizards.

Database Models

All databases are composed of the elements identified above. But the organization of the records and tables can be different.

Hierarchical and network models  are less flexible, not allowing for easy changes to the structure of the database. In the relational model, all the information about a person, place, event, etc., is stored in related records, but the information is spread across different tables. The types of data used to describe an individual, place, thing, or event are linked by a common key field, called the primary key. In the object-oriented database model, the data itself is conceived of as objects. An object can consist of data (character, numeric, etc.), or it can be instructions on what to do with that data. The objects are joined by the use of pointers. The final database model is the multidimensional database model, built and used optimally for data warehouse and online analytical processing applications.

Integrity and Validity of Data

Data integrity means that the data is verified as correct, up-to-date or timely, organized in a useful way, and accessible. Data validity is satisfied by comparing data being entered to a set of rules to ensure that the entry complies with those rules.

Databases and Security

The confidentiality, integrity, and availability (CIA) of the data in a database need to be protected. Confidentiality can be lost if an unauthorized person gains entry or access to a database, or if a person who is authorized to view selected records in a database accesses other records he or she should not view. If the data is altered by someone unauthorized, the result is a loss of data integrity. And if those who need to have access to the database and its services are blocked from doing so, there is a resulting loss of availability.

Risk Assessments

In the business environment, a thorough risk assessment should occur and be periodically reviewed. The assessment should address who has access to and control of the database, database security software, and training.

Protecting PII

Protecting databases and the data contained within can be costly and all-consuming. Obviously, you cannot implement security measures for the company, business, or government agency that holds your personally identifiable information (PII). But there are measures you can take to better protect yourself, including controlling physical access to the computer and using strong passwords.

References

Ashik, M. (Ashiq JA). (2015, June 8). Insider vs. outsider threats: Identify and prevent [Blog post]. Retrieved from http://resources.infosecinstitute.com/insider-vs-outsider-threats-identify-and-prevent/.

Bruemmer, M. (2014, January 21). How mobile devices can imperil your organization's cyber security [Blog post]. Retrieved from http://www.experian.com/blogs/data-breach/2014/01/21/how-mobile-devices-can-imperil-your-organizations-cyber-security/

Dartnell, J. (2014, April 20). EMC: Digital universe data to grow tenfold by 2020. Retrieved from http://www.cnmeonline.com/news/emc-digital-universe-data-to-grow-tenfold-by-2020/

Hoff, C. (2013, April 2). Is there a difference between data loss and data leakage prevention? [Blog post]. Retrieved from http://www.rationalsurvivability.com/blog/2008/06/is-there-a-difference-between-data-loss-and-data-leakage-prevention/.

Manes, C. (2015, September 23). If you keep it, encrypt it [Blog post]. Retrieved from  http://www.gfi.com/blog/if-you-keep-it-encrypt-it/.

NIBusinessinfo.co.uk. (n.d.). Knowledge management and business growth. Retrieved from  https://www.nibusinessinfo.co.uk/content/basic-sources-knowledge

Nichols, R. (2007). Eleven specific solutions to today's most common database security threats and vulnerabilities. Retrieved from http://aim.uoregon.edu/news/ebriefing/eleven_solutions_to_database_security_threats.php.

Nuramn, A. (2011). Database security. Retrieved from http://www.brighthub.com/computing/smb-security/articles/61400.aspx.

Quast, L. (2012). Why knowledge management is important to the success of your company. Retrieved from https://www.forbes.com/sites/lisaquast/2012/08/20/why-knowledge-management-is-important-to-the-success-of-your-company/#36b146103681

Risk Based Security (RBS). (2014, February 18). Data breach quickview. Retrieved from  https://www.riskbasedsecurity.com/reports/2013-DataBreachQuickView.pdf

PricewaterhouseCoopers. (2008, July). Data loss prevention: Keeping sensitive data out of the wrong hands. Retrieved from http://www.pwc.com/us/en/increasing-it-effectiveness/assets/data_loss_prevention.pdf

Rouse, M. (n.d.). Knowledge management (KM). Retrieved from  http://searchdomino.techtarget.com/definition/knowledge-management

Seiner, R. S. (2002). Business impact of knowledge management. Retrieved from http://tdan.com/business-impact-of-knowledge-management/4943

Shelly, G. B., & Vermaat, M. E. (2013). Discovering computers 2013. Boston: Course Technology.

Spamlaws. (2016). Why data security is of paramount Importance. Retrieved from http://www.spamlaws.com/data-security-importance.html

UVA Computer Science System Administration Database. (n.d.). Blocking brute force attacks. Retrieved from http://www.cs.virginia.edu/~csadmin/gen_support/brute_force.php

Vishen, N. (2013, April 20). Largest databases of the world [Blog post]. Retrieved from http://neeraj-dba.blogspot.com/2013/04/largest-databases-of-world.html.

Williams, B. K., & Sawyer, S. C. (2013). Using Information Technology. New York: McGraw-Hill.

Licenses and Attributions

Hierarchical Model by Janet Zimmer is available under a Creative Commons Attribution 3.0 United Stateslicense.

Network Model by Janet Zimmer is available under a Creative Commons Attribution 3.0 United Stateslicense.

Relational Model by Janet Zimmer is available under a Creative Commons Attribution 3.0 United Stateslicense.

Object-Oriented Database by Janet Zimmer is available under a Creative Commons Attribution 3.0 United States license. Art is used with permission from Microsoft.

Print

CRM, Data Warehouses and Data Mining/Analytics

Introduction to CRM

Customer relationship management (CRM) is a foundation element for business knowledge/intelligence. We will describe how CRM can be used, what makes it work and who is using it, and whether it has been as successful as many had hoped.

Observing Consumer Patterns

A man walks into a convenience store to pick up diapers at his wife's request. While he's there, he happens to pick up a six-pack of soda as well. Meanwhile, back at the convenience store headquarters, a data analyst poring through data in a data warehouse sees this and recognizes that this pairing is emerging as a pattern.

Opportunity? You bet. The data analyst makes two recommendations to the marketing department. First, move the diapers and the soda closer together. Second, place similar items that men in this age group might also be inclined to purchase in between the diapers and the soda.

Welcome to the world of customer relationship management, or CRM.

Traditional CRM

Relationship diagram for CRM. Center circle contains "Customer." Three connected circles to "Customer" contain "Sales." Marketing," and "Service and Support."

Traditional CRM

Source: Janet Zimmer, Creative Commons

Customer relationship management (CRM), a strategy used by companies, also goes by the name of relationship marketing or customer management. The definition of CRM is broad because it includes many facets of business-to-customer relationships. Robust CRM systems are supported by software suites that help with the management of all the data acquired and used by the system. But the following might be considered the primary focus of any CRM system (Rouse, Ehrens, & Kiwak, 2006):

· providing a company's marketing department with information needed to identify and target the company's best customers, design effective marketing campaigns, and provide the sales team with quality leads

· optimizing the information shared among departments, which results in an increased number of sales and new accounts, better management of existing accounts, and supporting the use of the latest communication devices (for example, allowing orders to be made over mobile phones)

· improving customer satisfaction by supporting the development of individualized relationships with customers; might also include providing the highest level of service to the most profitable customers

· obtaining and sharing with employees the information and processes necessary for them to effectively build relationships with their customers through understanding and identifying the customer's needs

In summary, the primary applications that are supported by a CRM are:

· acquisition—obtaining new customers

· retention—retaining current customers

· loyalty—developing customer loyalty to the company/product

· profitability—increasing company profits by serving the customer

· service—addressing customer inquiries and resolving issues

Relationships Among CRM, Data Warehouses, and Data Mining

Relationship diagram in the form of puzzle pieces. Center piece is "Data warehouse." Pieces at all connect to the center (but separately) are "Data analysis tools," Customer support records," "Purchasing history," "Customer demographic data," and "Data mining tools."

Data Warehouse Relationship Diagram

Source: Janet Zimmer, Creative Commons

The example cited in the introduction is one facet of CRM—data mining customers' purchasing patterns. Data mining is the process of looking at the data stored in a company's database to determine if statistically relevant trends exist. By identifying these trends and patterns, companies can develop strategies to better serve customers and increase sales.

Another example of CRM might be the evaluation of data purchased from a company that specializes in collecting demographic data on purchasers, including location, age, gender, ethnicity, home ownership, employment status, and income level, to determine which individuals might want the company's product or services.

CRM can improve services and products in other ways. For instance, if an organization offers a call center that provides customer support, tracking the kind of support that is provided most frequently might lead to solutions that could prevent the problems from occurring in the first place.

Data Analytics—How It Works

Almost all CRM applications involve using a large relational database, sometimes referred to as a data warehouse. This is where the raw data about customers, products, transactions, demographics, and other information is stored. Typically, the data warehouse gets its information in real time, or nearly so, from systems used to conduct transactions between the company and the customer—point-of-sale (POS) systems, e-commerce web applications, inventory management systems, and others.

Data from the data warehouse is retrieved, organized into categories, and reviewed to support identification and analysis of data patterns. So data analytics is referred to as "qualitative and quantitative techniques and processes used to enhance productivity and business gain" ("Data analytics," n.d.).

Data analytics is primarily used in applications that involve the business-to-customer environment and includes information about customers, business processes, market economics, or practical experience.

Using complex statistical analysis software programs known as data mining tools, data analysts are able to query the data warehouse in many ways. For instance, an analyst might ask the data mining tool to retrieve from the database all purchases made during the week of June 15 in which two specific products were purchased together in stores on the East Coast. Once the records are returned, the analyst would ask the tool to show only those purchases in which a statistically relevant correlation between the two items existed.

Sound fascinating? That is only the beginning. Consider this: Why not design the data mining tool to run specific queries such as this one on all data, once a day, and send an email to the analyst if anything interesting turns up? In other words, why not build "triggers" into the system that alert the analyst to anything that might be considered an anomaly, good or bad? Why not have the data mining tool do all the work?

By now, you have probably determined that sophisticated CRM data mining tools do just that. Although you may have never heard of these data analysis tools, here is a list of the most commonly used ones (Vohra, 2017):

Open Source Analytics Tools

Commercial Analytics Tools

R: The most popular big data analytics tool. It integrates well with big data platforms with large data sets. R is known for a steep learning curve.

SAS: For a long time, the leading data analytics tool (but costly). It is versatile and easy to learn and provides specialized modules SAS analytics for IOT (Internet of things), SAS Anti-money Laundering, and SAS Analytics Pro for Midsize Business.

Python: Released in the early 1990s, Python covers a host of statistical and mathematical functions. Useful in the analysis phase of analytics, Python can also be used as a data-gathering tool on the internet using a technique known as "web scraping." Data can be extracted or gathered from nearly any website to analyze content, but data-centric websites and social media sites are often the focus of web scraping. The analysis phase of social media data is also known as social media analytics.

Tableau: Great for creating visualizations and dashboards. More robust in visualizations and can handle much more data than Excel.

Apache Spark: Its focus is on unstructured data or huge data volumes. It integrates easily with Hadoop, an open-source Java-based framework that supports large data sets (Rouse, Stedman, & Bigelow, n.d.).

Excel: Most widely used analytics tool. More accessible for nonanalytics professionals, who will usually not have access to tools like SAS or R on their machines.

Apache Storm: Used for moving data or when the data is continuous. Works well with real-time analytics or stream processing.

QlikView: Another popular visualization tool.

Pig and Hive: Most companies that work with Big Data and leverage the Hadoop platform use Pig and/or Hive.

Splunk: Visualization tool with a web interface that makes it easy to use.

CRM Is Big Business

Offering good customer service and cultivating customer relationships makes good business sense. It is more cost-effective to retain current customers than to attract new ones, and companies that make good customer experience a priority have found that the practice leads to higher profits. Establishing and maintaining a good relationship with customers is critical.

What are major benefits for a company that uses CRM to improve relationships with customers?

· CRM software can be used to monitor how long customers have been with a company, as well as their purchases and use of the company's services. Rewarding customer loyalty can improve the company's financial picture (Salesforce.com, n.d.). Think about credit cards, frequent flyer programs, special offers for loyal customers, and other rewards programs.

· Customers assess a company on more than products and services—they also gauge how the company deals with complaints and other issues. CRM systems allow for a more rapid response since customer questions can go quickly to the proper department, so employees can help (Salesforce.com, n.d.)

This ability to resolve complaints gives customers a positive perception. For customers who have had a negative experience with customer service, over 70 percent will decrease business with that company or even switch companies altogether (Barbier, Noronha, & Dixit, 2013). So it is even possible that a company might forego profits in order to address customer satisfaction first. For example, product recalls and timely fixes or product replacements by a manufacturer might result in a loss of profitability. Beyond just the safety considerations, however, customers who feel they have been served well by the recall/replacement may return to the same manufacturer for their next product.

Traditional CRM Versus Social CRM (SCRM)

Customers can now communicate with companies through chat on a website and even social media accounts such as Facebook and Twitter. CRM systems that include social media integration are now a must for many companies (Salesforce.com, n.d.). These systems are known as social CRM, or SCRM. An article in Harvard Business Review indicates that 79 percent of businesses already use social media or are planning a presence, although not all of them feel they are using such a vehicle effectively (Geek4Green, n.d.).

Traditional CRM

relationship diagram for CRM. Center circle contains "Customer." Three connected circles to "Customer" contain "Sales." Marketing," and "Service and Support." Citation: Source: Janet Zimmer, Creative Commons

Traditional CRM

Source: Janet Zimmer, Creative Commons

In traditional CRM, there is little collaboration between the customer and the company. Marketing's focus is to push messages to the customers to generate sales. There is definitely a service and support component, of course, which does involve the customer directly.

Social CRM (SCRM)

Three layered rectangles. The innermost rectangle contains "Customer." The next layer contains "Customer Empowerment" and "Advocacy." The outmost rectangle contains "Sales," "Service," "Support," and "Public Relations & Marketing."

Social CRM

Source: Janet Zimmer, Creative Commons

In contrast, SCRM invites the customer to collaborate with the company in solving business problems, primarily through interaction with online social media sites. This format empowers customers to shape their own experiences and build customer relationships directly with the company. Companies such as Coca-Cola and Dell maintain such sites. Dell reports that customers had posted over 18,000 new product ideas and almost 100,000 comments. Nearly 500 of the ideas had actually been implemented by Dell (Reynolds, 2012).

SCRM is a vehicle for direct and indirect advertising as well. Some social media sites display banner ads promoting companies or services. These ads can be directed to everyone who visits the site, or just to certain visitors who match particular demographics. Another marketing technique is the use of "fans" of a particular site, product, or company. When you "like" or "friend" a certain page on Facebook, for example, you are added to a fan base which, in turn, promotes awareness about the company or product.

Participation by companies in the social media environment has resulted in a newer branch of CRM called customer experience management (CEM). Online surveys filled out after purchasing products online or using services such as an airline flight feed into the company's management of the customer experience. Loyalty or reward programs are also a means of managing the customer experience. The customer is no longer a passive recipient of the company's services but an integral part of the customer experience and relationship development.

Various social media advertising strategies are available for a company that employs such sites—its own or others—for promoting the company and its products and services. Among these are (Reynolds, 2012):

· direct advertising via banner ads on social media sites

· sending ads to a person's network of friends or other contacts

· increasing brand awareness through groups or fans of a particular site or product

· using the company's own social networking site

· viral marketing,  in which individuals pass along embedded marketing ads to others, promoting the tool being used to send the message

Here are some examples:

· Businesses pay Facebook to show ads to people who might be interested in their message. This would be an example of direct advertising.

· Each time a user sends a message using Twitter, a note is attached to the end suggesting that the recipient create a Twitter account. You might consider this viral marketing.

· When you sign onto your favorite social networking site, you see a message from your friend, Mike, who just went to see an Oscar-nominated movie and thought it was a "must see!" This would be considered sending ads to a person's network of friends.

· A company with an easily recognized product brand has a Facebook page where fans of the product can "friend" the page and post comments. Friending or liking a page is an example of increasing brand awareness through groups or fans of a particular site or product

· An example of using the company's own social networking site might involve asking someone visiting an online ordering site to link to the business's Facebook page where additional ads or links may be found for similar products.

Trends for the Future of CRM

CRM and the software that supports it is not a static product. To address changes in customer expectations, companies that use CRMs must be sensitive to the following issues and tasks in making sure the CRM system is effective and efficient:

· Customers expect more. Members of Generation Y—the children of Baby Boomers, born between 1977 and 1994 and coming of age between 1998 and 2006—often demand ways to contact a company beyond a phone call. Those interaction vehicles include web chat, smartphone applications, and social media, according to a 2013-2014 study from Dimension Data, whose author, Andrew McNair, noted that "Generation Y customers are now reporting that the telephone is their fourth choice" when dealing with customer support (Earls, 2014).

· Keeping valuable staff. Experienced customer service staff are leaving their positions, according to McNair in the study (Earls, 2014), and McNair noted that training, support, and up-to-date tools are needed to retain CRM staff (Earls, 2014). Because customers are increasingly relying on the use of mobile devices and social media, both for communication and for sharing thoughts (including positive or negative reviews of a company, its products, and its service record), call center agents, service representatives, and sales personnel will see a corresponding increase in duties because they are no longer answering only telephone calls. An agent, whether in customer service or sales, needs to know how to handle the various means by which customers interact with the company personnel and use the consumer's information accordingly to solve problems (Earls, 2014; McKoen, 2012).

· Privacy. Companies are learning more from their customers based on digital interactions, and users may be willing to give up that data if they understand why and how it is being used. But they also expect the data to be protected. Companies need to foster trust with those customers (Earls, 2014).

· Mining the social media inputs. On social media, customers often post honest insights about products and services. Ordinarily, companies pay for surveys to gather this same data, which is available for free if the content can be extracted from these postings. Thus, social media analytics, the gathering of data from blogs and social media websites for analysis in order to make business decisions, is gaining importance. These tools help marketers, sales personnel, and contact center agents observe customers' social output and respond accordingly to any talk about their brand (McKoen, 2012).

· Cleaning up the data. Organizations need to invest in efforts into cleaning up CRM data. Data that is inaccurate or duplicative can hamper call center or sales employees (Earls, 2014).

 

References

Barbier, J., Noronha, A., & Dixit, A. (2013, March). Assessing the economic value of making the right customer satisfaction decisions and the impact of dissatisfaction on churn. Retrieved from http://www.cisco.com/web/about/ac79/docs/re/Value-of-Customer-Satisfaction.pdf.

Data analytics. (n.d.). In Technopedia. Retrieved from https://www.techopedia.com/definition/26418/data-analytics

Earls, A. (2014, January). Predicting the future of CRM in 2014 and beyond . Retrieved from http://searchcrm.techtarget.com/feature/Predicting-the-future-of-CRM-in-2014-and-beyond.

Geek4Green. (n.d.). Social media: What most companies don't know. Retrieved from http://www.slideshare.net/Geek4Green/social-media-insights-what-most-companies-brands-dont-know.

McKoen, A. (2012, December 27). Top five CRM trends you should know about . Retrieved from http://searchcrm.techtarget.com/photostory/2240175337/Top-five-CRM-trends-you-should-know-about/1/CRM-industry-trends#contentCompress.

Reynolds, G. (2012). Ethics in information technology. Boston: Course Technology Engage Learning.

Rouse, M., Stedman, C., & Bigelow, S. J.(n.d.). Hadoop. Retrieved from http://searchcloudcomputing.techtarget.com/definition/Hadoop

Rouse, M., Ehrens, T., & Kiwak, K. (2006, November). CRM (customer relationship management). Retrieved from http://searchcrm.techtarget.com/definition/CRM.

Salesforce.com. (n.d.). What is CRM? Retrieved from http://www.salesforce.com/uk/crm/what-is-crm.jsp.

Licenses and Attributions

Traditional CRM by Janet Zimmer is available under a Creative Commons Attribution 3.0 United Stateslicense.

Data Warehouse Relationship Diagram by Janet Zimmer is available under a Creative Commons Attribution 3.0 United States license.

 

Social CRM Diagram by Janet Zimmer is available under a Creative Commons Attribution 3.0 United States license.

The Complete Information System

Keeping in mind that our focus is on a computer-based information system, we will look at the information system (IS) from two viewpoints—that of its function and that of its structural components. From the functional perspective, an information system is a medium for recording and storing data, and disseminating information that has been extracted from this data. This perspective focuses on what users do with the information that is accessed via the IS.

From a structural perspective, an information system is "a collection of multiple pieces of equipment involved in the dissemination of information. Hardware, software, computer system connections and information, information system users, and the system’s housing are all part of an IS" ("Information System," n.d.)

Note that there are two distinct components or parts of the IS—the humans that design, develop, and use the IS, and the technical components that comprise the IS structure. From either perspective, the purpose of an information system is to collect, store, retrieve, and distribute information used to support decision making, analyze issues, present complex topics in a visual format, and/or even provide the basis for creating new products or services.

All of this data flows through three processes:

· input (collection of the raw data)

· processing that data into information; that is, converting the data into a form that can be understood

· output (providing the information to those who will use it)

There is also a feedback loop which takes the output that has been evaluated and returns to be added to or corrected as input.

Circular flow diagram: Input-data, instructions → Processing- conversion to information → Output – reports, solutions, graphics → Control- decision makers or auto-controls → picture of personnel. The loop continues back to Input. Includes image of woman working at a computer.

Circular Flow Diagram

Diagram adapted from Fuad, 2017. Image credit: valentinrussanov/Signature Collection/iStock

The Components of a Computer-Based Information System

Fuad (2017) lists five primary components of an IS:

1. Hardware resources: These resources are all the physical equipment and associated devices, machines, and media. The list of equipment includes not only the computers themselves and peripherals such as keyboard and mouse for input, monitors and printers for output, but also the data media—that is, all tangible objects on which data is recorded. The data media comes in many forms such as sheets of paper, CDs and DVDs, flash drives, etc. Optical character recognition (also optical character reader, OCR) devices may be used for converting text on forms to a digital format.

2. Software resources: This set of resources includes not only a computer's operating system, and programs or apps, but also the operating instructions that are employed by the end users of an IS. User manuals (how to use an app or how to fill out a data entry form) are examples of a software resource.

3. Data resources: This is the raw material that an information system requires. It can come in many forms:

· Numbers, alphabetical, or other characters that describe a transaction or event

· Text data—sentences, paragraphs used in communication

· Image data—photos, shapes, figures

· Audio data—human speech or other sounds that are recorded for a specific purpose (for example, bird calls or songs, muffler noise abatement levels)

4. Network resources: Remember that the purpose of an IS is to collect, store, retrieve, and distribute information that is used to support decision making, analyze issues, present complex topics in a visual format, and/or even provide the basis for creating new products or services. This often requires telecommunications networks: internet (connection to the World Wide Web), intranet (private communication networks inside a business), or extranets (private connections between businesses). This category includes the physical communication media such as twisted pair wire, coaxial cable, fiber-optic cable, microwave systems, and communication satellite systems. The people, hardware, software, and data resources that support the operation and use of the internal network(s) or access to the internet are also required.

5. People resources: This category includes the end users and IS specialists.

· The end users are those who use the information provided by the IS. Most of us are end users of information systems. But more specifically, end users are those with careers in such fields as accounting, sales, engineering, management, banking, airline reservations, or human resources.

· IS specialists develop, operate, and support an information system. These are the system analysts (designers), programmers, testers, computer operators, data specialists, and data entry personnel.

Structural Differences in Information Systems

There are different types of information systems, depending on whatever activities the IS needs to support. One way to categorize different systems is by the type of decisions that need to be made: operational, tactical, or strategic (Kimble, n.d.).

1. Operational

· The transaction processing system (TPS) is probably the system you interact with most often. This system collects data from user inputs and then generates outputs based on the data collected. The data collection is typically obtained through automated or semiautomated activities and basic transactions. For example, you decide upon a product and place an order for that product with an online seller. All the information related to that order (size, color, shipping preference, cost, banking or payment information, shipping address) is collected and the order processing begins. Behind the scenes, inventory updates are made, and your information may also be shared with an operations support system, resulting in emails from the company or suggestions for other products you may like. Examples of TPS include payroll systems, reservation systems, order processing systems, and personal banking activities. The TPS is used to generate information that is shared with and used for other systems (both within and external to the company). Some of this data is shared or sold to third-party companies for their use.

· An operations support system  (OSS) converts business data (such as financial transactions, "hits" or "likes" on a website, orders placed, etc.) into information that can be used via data mining. This is the foundation information for customer relationship management (CRM) or social customer relationship management (SCRM).

2. Tactical

· A management information system  (MIS) is used by lower management in problem solving and making decisions and to ensure the smooth running of the organization. Structured procedures and accessible data for making decisions are in place. This type of system also allows managers to see trends and overall performance by evaluating current with previous outputs. This type of IS deals with the past and present rather than the future. Examples of an MIS include personnel (human resource management—HRM) systems, inventory control systems, and sales. A management reporting system is typically included.

3. Strategic

· A decision support system  (DSS) pulls data from sources. This data is then reviewed by higher management, which makes long-term determinations based on the compiled data. A DSS must be flexible to handle the fact that there may be no clear procedures for making the decision and if the factors to be considered in the decision can be readily identified in advance. Typically customized reports are generated based upon a particular set of data and a particular output format.

· A knowledge management system (KMS) is used to disseminate or share the knowledge generated by all other systems. A KMS serves as a central repository for the retention of a business's knowledge and is used to improve performance and consistency and enable a speedy response to inquiries from clients and partners.

Apps That Support Information Systems

This set of software resources starts with:

· a computer's operating system

· the programs or apps that allows both the development and maintenance of an IS

· the interface that allows the end user to use the IS

Every general-purpose computer in whatever form—mainframe, workstation, desktop, laptop, tablet, smartphone—requires an operating system (OS). The OS is the heart of the computer, enabling all the other components (hardware and software) to perform the functions assigned to them. The OS is the software component that manages the hardware pieces and all of the other software, enabling both parts to perform the functions for which they are designed. The OS also controls some aspects of security—specifically, allowing only authorized users, via a user ID and password that are entered, to access the system.

Regardless of the platform (device), the OS coordinates the use of the system hardware with the application programs that enable the user to perform tasks. In the same way an OS is loaded on hardware that is compatible with that particular OS, application programs can be loaded only on top of an OS with which the application program is compatible.

Computers that support an IS would also use a network operating system that supports a number of computers that are linked via a network. Most often, one machine, a server, is the computer that controls access to the resources (hardware and software) used by the other computers on the network. This server may also provide a centralized storage area for data. The other computers on the network are called clients. A network OS (e.g., Microsoft Windows Server, Apple OS X Server, Linux Server) is a different software package than that used on a standalone OS.

Operational Software Applications

It is probably the case that your only experience with TPS systems is via the interface you use. Behind the scenes, the TPS software can be put into three categories (Bernstein, n.d.):

· A front-end program is an app that sends and receives menus and forms. It is the interface that the user sees and offers the user options to choose. It collects the user's input.

· A request controller receives messages from front-end programs and then, in turn, initiates the proper transaction programs.

· A transaction server performs the work the user requested. It is typically connected to a database that may collect the input data and initiate other programs. It may also return a reply that is sent back to the device on which the front-end program is loaded (Bernstein, n.d.).

Tactical Software Applications

An MIS uses several different types of software apps, depending on the focus of the business. Function-specific apps include:

· customer relationship management (CRM)

· enterprise resource planning (ERP)

· supply chain management (SCM)

· human resource management (HRM) 

· database management systems (DBMS).

The best software for any business is software that helps the business increase or measure productivity in order to run its operations better, cut costs, and replace paper processes (Mohamed, n.d.).

Data gathered and processed by the TPS is sent to the MIS which, in turn, produces routine reports used by management. The software is used to generate these reports. These reports include:

· summary reports that show totals and trends.

· exception reports that can display out-of-the-ordinary data.

· periodic reports, generated on a schedule (daily, weekly, monthly, quarterly, etc.) and which typically are printed.

· demand reports, generated outside a normally scheduled periodic report, in response to a specific request for certain information.

Strategic Software Applications: DSS and KMS

With the introduction of integrated computerized decision support systems to support day-to-day operating activities, managers can "download and analyze sales data, create reports, and analyze and evaluate forecasting results. DSS can help managers perform tasks, such as allocating resources, comparing budget to actual results, drilling down to analyze results, projecting revenues, and evaluating scenarios" (Power, Examples of DSS, 2002).  Executive dashboards and scorecards can be used to track operations and support strategic decision making based on facts and data and not on hunches and gut instincts.

Most of the newer and updated DSS have the following attributes in common (Power, Using Computerized DSS, 2002):

1. Real-time access to rich media/data enables many remote users to collaborate.

2. DSS applications can be accessed anywhere and anytime.

3. Large data sets that include historical data are easily accessed.

4. Excellent graphs and charts are available for viewing the extracted data.

5. Real-time updates to the data are available when needed.

The biggest players (and perhaps the most expensive to implement) in the DSS field are SAP and Oracle. SAP is the largest in the market. and features accounting and distribution software suites as well as software systems for manufacturing, human resources, payroll, and customer relationship management (ERPsoftware360, n.d.). Oracle, the second largest, develops marketing and enterprise resource planning software, as well as CRM and SCM software ("Oracle Corporation," n.d.).

You can review many other options at http://www.capterra.com/knowledge-management-software/.

As with DSS, KMS-based tools need to match the needs of the organization to a tool category. Several of the most highly recommended knowledge management software suites include:

· Zendesk—a cloud-based application that can be implemented for small or large numbers of customers

· eXo Platform—used by large enterprises, midsize businesses, and public administrations

· Confluence—organizes a repository of information, opinions, and knowledge that helps in answering questions and creating how-to documents.

Remember, however:

KM is about managing people, culture, and organizational practices & structures. Effective KM initiatives are therefore never exclusively technology driven. However, in conjunction with sound practice, KM tools are invaluable at providing support to KM initiatives and at facilitating interaction, exchange of ideas, locating experts, and storing knowledge in both structured and unstructured forms (Frost, 2017)

Today, these tools serve as a competitive advantage within the knowledge sharing field.

Summary

The heart of almost every information system is a database. You may be familiar with a simple database management system, Microsoft Office’s Access. Excel could also be used as a database in an information system since it is a repository of information that can support information and knowledge discovery, decision making, and visual analytics.

Information systems is in the center oval with People, Networks, Software, Hardware, and Data in ovals around the center one, all with arrows pointing to the Information systems oval.

Information Systems

Source: Mbaknowl.com. (n.d). Used with permission.

The purpose for gathering data is to provide information, and that information, in turn, can be used for decision making. Regardless of the type of information system, all information systems involve hardware and software that support the sharing or processing of data into information. And every IS also must include the people who support the system, as well as those who benefit from the information that is generated and shared.

In your next assigned reading, you will be introduced to the system development life cycle. This process outlines ways in which an information system may and should be developed.

References

Bernstein, P. (n.d.). Transaction processing system examples and SOA approaches. Retrieved from http://searchdatamanagement.techtarget.com/feature/Transaction-processing-system-examples-and-SOA-approaches

ERPsoftware360. (n.d.). Top 5 client-server ERP software applications. Retrieved from http://erpsoftware360.com/erp-software.htm

Frost, A. (2017). KM Tools. Retrieved from http://www.knowledge-management-tools.net/knowledge-management-tools.html.

Fuad, S. (2017, January 22). MIS Lecture 3. Retrieved from http://uotechnology.edu.iq/ce/Lectures/SarmadFuad-MIS/MIS_Lecture_3.pdf

Information system (IS). (n.d.). In Technopedia. Retrieved from https://www.techopedia.com/definition/24142/information-system-is

Kimble, C. (n.d.). Information systems and strategy. Retrieved from http://www.chris-kimble.com/Courses/World_Med_MBA/Types-of-Information-System.html

Kline, J. (2015, May 7). What are the best knowledge management tools out there? Retrieved from https://www.quora.com/What-are-the-best-knowledge-management-tools-out-there

Mohamed, A. (n.d.). The best software for small businesses (SMEs) - essential guide. Retrieved from http://www.computerweekly.com/feature/The-best-software-for-small-businesses-SMEs-Essential-Guide

Oracle Corporation. (n.d.). In Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Oracle_Corporation

Power, D. (2002). Examples of decision support systems (DSS) aiding business decision-making. In Decision support basics. Santa Barbara: Greenwood. Retrieved from http://searchbusinessanalytics.techtarget.com/tutorial/How-decision-support-systems-DSS-can-help-business-decision-making

Power, D. (2002). Using computerized decision support systems, and the history of DSS. In Decision support systems: Concepts and resources for managers. Santa Barbara: Greenwood. Retrieved from http://searchbusinessanalytics.techtarget.com/tutorial/Using-computerized-decision-support-systems-and-the-history-of-DSS

 

Licenses and Attributions

Fuad, S. (2017, January 22). MIS Lecture 3. Retrieved from http://uotechnology.edu.iq/ce/Lectures/SarmadFuad-MIS/MIS_Lecture_3.pdf

The System Development Life Cycle (SDLC)

What Is the SDLC?

The system development life cycle (SDLC) is a structured methodology and process that guides the development of an information system. SDLC is based on a series of related activities that are combined into phases, sometimes called life-cycle phases. The phases represent a state or stage in the life of an information system. Generally speaking, an information system life cycle proceeds from requirements gathering to design and development to operations and maintenance to decommissioning. Each successive phase leverages the documentation and knowledge gained from the previous phases. The figure below shows the general flow of a basic SDLC.

The basic system development life cycle illustrated from top to bottom with arrows connecting components: Requirements Analysis to Design, Development, Operation, Maintenance, Decommissioning, and Feedback.

Basic System Development Life Cycle

The main purpose of using SDLC is to promote quality during the design, development, and implementation effort. When SDLC is used properly, an information system is more reliable and cost-effective because project activities are planned, documented, tracked, and controlled. To ensure that the information system will meet the stated requirements, SDLC also includes predefined reviews, inspections, and audits for the life-cycle processes and deliverables to identify variances and recommend changes.

Using the SDLC Acronym

As with most acronyms, there can be some confusion associated with using SDLC. Within the information technology industry, SDLC may also be used for:

· Synchronous Data Link Control—A communications protocol that divides network functions into clearly defined layers.

· Software development life cycle—Also known as software development process (SDP), this is the set of life-cycle phases associated with software programs.

For the purposes of this module, SDLC will be used as defined in the first section of this module.

Why Is SDLC Important in the Development of an Information System?

An information system does not consist solely of the software and hardware an organization uses. Effective use of technology is also dependent on having a solid set of processes and procedures for meeting business objectives, delivering products and services, and enabling continuous process improvement. Another important component of an information system is the trained, skilled people who use the technology, processes, and procedures to operate in and manage the organization.

The relationship between the technology, processes, procedures, and people is symbiotic: any change to one component will have some effect on the others. For example, introducing a new human resources information system into an organization without considering how it might affect the organization's processes and procedures could doom the system to failure before it is fully deployed. A key aspect of using SDLC is considering all components of an information system throughout the entire project. This holistic approach is one of the main reasons why using SDLC is increasingly becoming a critical success factor for implementing today's complex, high-stakes information systems.

Because implementing these systems is an expensive, multiyear effort, SDLC is also an important organizational tool to ensure that information system resources are implemented in a fiscally responsible and efficient manner. A life-cycle approach ensures that there is a clear plan and process for:

· identifying and validating organizational requirements early in the project

· designing and developing the system based on the approved requirements

· deploying and transitioning the completed system to the user community

· operating, maintaining, and updating the system once it is deployed

· decommissioning the system when it is no longer required or when it is replaced

The SDLC Phases

The SDLC phases are the sequence of activities associated with the life cycle of an information system. Although the number of SDLC activities can vary depending on the type and complexity of the information-system project or the SDLC model used, there are some common guidelines that allow the activities to be grouped into clearly defined phases. These recommended guidelines are outlined below.

· Complete a preliminary investigation, requirements analysis, and system recommendation.

· Specify a detailed design based on an approved set of requirements.

· Develop the system according to the approved design specification.

· Test the system and gain user acceptance.

· Install, operate, and maintain the accepted system.

· Update or replace the system as organizational goals and requirements change.

· Decommission the system when it is no longer needed.

· Document, report on, and approve each phase of the SDLC before beginning the next phase.

Following these common guidelines helps mitigate the risk that the design and development effort will get out of control either through missed requirements, schedule delays, or cost overruns. Because the guidelines require interaction with stakeholders throughout the project, they also prevent surprises when the system is rolled out to the user community. As you read through the approaches in the following sections, see if you can identify these common steps.

Four-Phase Approach

This approach divides the life cycle into four major phases. It may be used when an organization has a good understanding of its requirements or the type of information system being implemented.

The figure below shows the four phases and some of the key activities associated with each phase.

https://leocontent.umuc.edu/content/dam/learning-resources/applied-science/information-science/FourPhaseApproach.jpg

Four-Phase Approach

Planning

Analysis

Design

Implementation

Those responsible for creating the system must first determine whether the system is actually needed, and what is will do for the organization if it is created. A basic idea of how to build the system is explored. A project plan is created and a project manager assigned.

Again, the responsible group must first decide whether to go ahead with the project and whether the resources needed to complete the work are even available. A feasibility study needs to be completed which justifies the need to replace an existing system, identifies the improvements that will be made if the new system is created, and reviews whether there is budget and resources sufficient to create the system.

In the design phase, a detailed plan for creation of the system is developed. This plan is then implemented and hopefully results in creation of all the components needed to complete the system.

Implementation consists of both the actual product development and installation of the product for the users. This phase should also include documentation, user manuals, training, and actual maintenance of the system, and any future updates or expansion of the system.

Source: Four Phases of SDLC? (n.d.).

Nine-Phase Approach

This approach divides the life cycle into nine phases. The following table shows the phases and some of the key activities associated with each phase. As you read through the table, compare it with the four-phase approach. Note that a more granular approach is taken for the preliminary investigation, requirements analysis, and system recommendation portions of the project.

Organizations may use this approach when implementing an unfamiliar type of information system. The nine-phase approach is also more appropriate for implementing an information system that will be used across all business units within an organization.

Nine-Phase Approach

SDLC Phases

Key Activities

Initiation phase

· Develop business case

· Identify project sponsor

· Appoint project manager

· Develop concept proposal

· Review and approve concept proposal

System concept development phase

· Analyze business need

· Form project team

· Plan project

· Develop project-acquisition strategy

· Identify and analyze risks

· Obtain funding and resources

· Document phase efforts

· Review and approve phase documents

Planning phase

· Refine acquisition strategy

· Analyze project schedule

· Document internal processes

· Establish agreements with stakeholders

· Develop project-management plan

· Review and approve project-management plan

Requirements analysis phase

· Define functional requirements

· Define technical requirements

· Conduct reviews and approve requirements

Design phase

· Design system

· Design business processes

· Outline operations and maintenance manuals

· Outline deployment plan

· Conduct design reviews

· Approve system design

Development phase

· Refine and complete software requirements

· Refine and complete software design

· Acquire and install hardware

· Code and test software

· Conduct hardware- and software-qualification testing

· Install software

· Test system qualification

· Complete plans and support documentation

· Test and review documentation

· Develop deployment plan

· Obtain approval and acceptance of all development documentation

Integration and test phase

· Conduct subsystem/system testing

· Conduct security testing

· Conduct user-acceptance testing

· Review and finalize development-phase documentation

· Obtain user acceptance

Implementation phase

· Communicate deployment plan

· Execute training plan

· Perform data entry, migration, and conversion

· Install new system

· Perform post implementation evaluation

· Obtain approval to operate the system

Operations and maintenance phase

· Transition project to operations

· Operate system

· Perform data and software administration

· Perform system and software maintenance

· Identify problems, recommend modifications, and update the system

· Monitor organizational changes, recommend modifications, and update the system

Ten-Phase Approach

The U.S. Department of Justice (DOJ) uses a 10-phase SDLC approach on its information system implementation projects. Like the nine-phase approach, this approach emphasizes the preliminary investigation, requirements analysis, and system recommendation project activities. The main difference between the DOJ approach and the nine-phase approach is that the DOJ approach also includes a phase to dispose of the information system when it is no longer needed.

SDLC Models

Waterfall Model

The waterfall model is often used to represent the SDLC process. This linear, sequential model is often considered to be the foundation and origin of today's SDLC methodology. Although there is disagreement as to when the model was first introduced, the general consensus is that it has been in existence in one form or another since the 1960s.

Waterfall development is still widely used for software engineering projects because it has distinct goals for each phase of the development and requires each phase to be fully completed before the next phase can begin. Once the decision is made to go to the next phase, there is no turning back. Like a waterfall, once the water goes over the cliff (phase), it cannot flow back. The figure below graphically shows how the waterfall model works using the DOJ's ten-phase approach.

The waterfall model for an SDLC system, showing in descending order from left to right: initiation, system concept development, planning, requirements, design, development, integration and test, implementation, operations and maintenance, and disposition.

Waterfall SDLC Model

The advantage of waterfall development is that it allows for direct project manager and management control. A timeline can be established with specific deadlines for each phase, and a software solution can proceed through the development process like a product through an assembly line, and if properly managed, be delivered on time. Each phase of development proceeds in a predefined order, without any overlapping steps or turning back.

The disadvantage of waterfall development is that there is no returning to a previous phase. Once the software solution is in the design phase, it is difficult to go back and modify a feature or function that was not well thought out in the requirements phase. Today's complex, cross-functional information systems require a more iterative approach and development effort.

Fountain Model

The fountain model recognizes that overlap may be needed between some development phases, and previous phases may have to be revisited throughout the development cycle. For example, planning may need to be fully completed prior to beginning requirements analysis. Once planning is completed, the requirements analysis, design, and development phases may have activities that must overlap to ensure the system is properly built. Like water in a fountain, details about the information system are pushed up through the phases, but at any time the details may flow back through the previous phases to be refreshed and refined as more is learned about the system. The figure below graphically shows the way the fountain model works, using the DOJ's 10-phase approach.

The Fountain Model, showing how details can be pushed up like water in a fountain to the various phases in the DOJ 10-phase approach: initiation, planning, design, integration and test, operations and maintenance, implementation, disposition of old system, development, requirements, and concept development.

Fountain SDLC Model Based on DOJ 10-Phase Approach

Source: Janet Zimmer

At first, the fountain model can be confusing. If you have never used the model on a project, you may not understand that the overlapping phases and curved arrows demonstrate the highly iterative nature of this life-cycle approach.

The advantage of fountain development is that changes can be made to the components of the information system as the project team learns more about what is actually needed or uncovers gaps in the concept, requirements, or design.

The disadvantage of this model is that it may take more time and cost more to complete the information system. Without strong project management, the information system theoretically may never be completed if the project team gets caught in a loop of ever-increasing scope and continuously changing requirements.

Build-and-Fix Model

Build and fix is recognized as the crudest, least structured model in the SDLC family. In this model, the solution is developed without any proper preliminary investigation, requirements analysis, or design. In essence, the solution is built (think of this as a working model or prototype) and modified as often as necessary until it satisfies the customer's needs. The figure below graphically shows the way this model, which uses only the development and operational phases of the four-phase approach, works. Some of the study and design phase activities may be completed during a highly iterative modify phase.

Build and Fix Model for the SDLC, with two circles "build a solution" and "modify to satisfy customer's needs" connected with circular arrows. A third circle, "maintenance," is below the two circles and is connected with an arrow from the "modify" circle.

Build-and-Fix SDLC Model

Source: Janet Zimmer

The advantage of the build-and-fix model is that it provides an efficient framework for extremely small, low-priority development efforts that involve a single customer. In some cases, it may be necessary to use the build-and-fix model when there is not enough time for a more rigorous approach. The highly iterative nature of this model ensures intense and frequent customer involvement in the development of the information system.

The disadvantage of this approach is that the cost is usually greater than if a preliminary investigation, requirements analysis, and detailed design had been completed. This is an extremely open-ended, risky approach that requires careful management and control. Organizations are strongly discouraged from using this SDLC model except for small, low-priority projects.

Rapid Application Development (RAD) or Rapid Prototyping Model

In most cases, rapid application development (RAD) is used when developing a software solution that is heavily dependent on the organization's business processes and the end users' knowledge and understanding of those processes. In essence, the end users can provide better feedback about the system requirements by examining a live system rather than commenting on the associated documentation.

In a sense, the RAD is also a type of working model or prototype of the solution that allows the end user to see how various requirements have been implemented as the product is developed. It is analogous to working with a tailor who is making a custom suit for you. At various stages of the suit's construction, you may have to return to the tailor's shop, try on the unfinished pieces, and provide feedback that is used to update the measurements and complete the suit. The type of suit and the fit you expect may dictate how many iterations are needed before the tailor's work is done.

RAD is made possible by the significant advances in the software development environment that allow for more rapid code generation and faster modifications to application screens and other user interfaces. The figure below graphically shows how the RAD model works using a modified version of the DOJ's ten-phase approach.

RAD / Rapid Prototyping Model, shown with two circles. At upper right "initiation and planning" circle has an arrow pointing to five circles in a pentagon shape, titled “requirements,” “design,” “development,” “integration and test,” “implementation. An arrow is pointing to the circule “operations and maintenance," which also has an arrow pointing at it.

RAD/Rapid Prototyping Model

Source: Janet Zimmer

The advantage of the RAD model is that it can result in a lower level of rejection when the information system is placed into production. End users are given the opportunity to work with the screens online in a production-like environment, which means a significant number of design and development errors can be caught earlier in the process. The model also allows end users to be heavily involved in the software development effort and take ownership of the finished product.

The disadvantage of this model is that RAD could lead to cost and schedule overruns. Another downside is the propensity of the end user to increase the scope and add new requirements during the development effort. Some end users may think that because it is easy for the developer to produce the basic screens that it is just as easy to add extra enhancements. Without strong project leadership, participants can lose sight of the goal of producing an optimal, useful system and instead attempt to develop a gold-plated application that goes beyond the organization's requirements. For this reason, the project team may use a blend of RAD prototyping and the traditional waterfall approach.

Agile Model

The agile model is, in some ways, similar to the build-and-fix and RAD models in that multiple releases of the product are made, each with small incremental additions that lead up to a final product. Each release is tested by the customers and requires a close working relationship between customers, developers, and testers. That interaction with the customer can also be its downfall if the customer is not sure of or clear about the direction in which the project is heading, potentially resulting in false starts or dramatic changes in requirements as the project progresses.

The figure below graphically shows how the agile model might work.

The Agile Model, depicted with a section called “initiation” on the left pointing below to three circles labeled 1st version, 2nd version, and “nth” version. Those circles each contain components labeled “planning,” “demo,” “development,” and “test.”

Agile Model

Source: Janet Zimmer

The SDLC and Testing

Regardless of the model used in the system development, testing should be incorporated into every phase of the life cycle. These are some of the types of testing that should be incorporated into the development cycles:

Unit test: This test focuses on just one of those subsystems to ensure that it operates correctly and produces the results according to specifications for the system.

Integration test: This test uses real data and tests whether each of units continue to work properly with the "live" data used by the various subsystems. This type of test works with multiple units to see that output from one unit is properly applied to another unit and produces expected results.

System test: This test determines whether all of the components of the system work together. This is especially important if different work units are creating pieces or subsystems of the project. These subsystems must be able to work seamlessly with all other subsystems to which they are connected. Where the integration test might test closely related units, the system test involves using data across all units to ensure expected results.

Acceptance test: Although this might be used only after the system test is verified, the end users may get involved in smaller versions of unit or integration tests to ensure that the system is working according to the users' specifications. The user may not accept the product if it does not perform according to those specifications.

Relationship Between SDLC and Project Management

Project management is a profession and discipline that uses a systematic process to plan, manage, execute, and control projects. Project managers are found in just about every commercial and noncommercial environment, including construction, education, financial services, government, medicine, manufacturing, nonprofit, technology, and utilities environments.

The project-management process uses the same structure and rigor found in the SDLC phases models. A typical project-management process may include the steps shown in the figure below.

The Project Management Process, shown in a table graphic. Under the heading: The Client or Stakeholders, there are five columns. Under "Initiation" there are "Clients' Concepts and Ideas" and "Develop a Business Case for the Project." Under "Initial Planning," there are "Suggested Approaches," "Identification of Outcomes" and "Prioritizing the Desired Outcomes." Under "Final Planning," there are "Schedule Approval," "Client Approval," "Budget Development and Approval," and "Coordination with Other Departments for Products and Services." Under "Project Development" there are "Procurement of Required Products & Services," "Progress Reports," "Quality Assurance," "Risk Assessment," "Change Requests," and "List of Final Deliverables." Under "Project Completion" there are "Customer Acceptance and Satisfaction," "Lessons Learned" and "Benefits Realized."

Project Management Process

Source: Janet Zimmer

Although many projects do not require an SDLC approach, most information technology projects do. Specifically, when some form of information system is needed, SDLC is required and project management is needed to plan, schedule, and control the associated activities. As an information system moves through its life-cycle phases, it may spawn several projects. For example, there may be separate projects to:

· determine the business need,

· find, analyze, evaluate, and select a vendor

· define what needs to be done to update an aging system during the operations and maintenance phase

· dispose of an information system that is no longer required

In most cases, the project ends when the information system moves into the operations and maintenance phase. In all cases, it is project management that brings order and organization to information-system-development efforts.

Summary

SDLC is the progression through a series of stages or states of an information system. It lasts from the conception of the system to its disposition. The number of life-cycle phases can vary from system to system and according to the needs of the organization. SDLC models are tools that allow project and development teams to correctly follow the SDLC stages required to develop the various types of information systems. Project management is used to plan, schedule, and control the SDLC phases associated with a selected model.

Sources 

Elliott, R. K. (2006). Sorting out SDLC terminology. Retrieved from http://www.managingsoftwaredevelopment.com/Ed001article2.htm

Four phases of SDLC? (n.d.). Retrieved from http://www.answers.com/Q/Four_phases_of_SDLC

Justice Management Division. (2003, January). Systems development life cycle guidance document. Washington, DC: Department of Justice.

Mulcahy, R. (2005). PMP exam prep (5th ed.). Lakewood, CO: RMC Publications, Inc.

Office of the Chief Information Officer. (2006, February). Smithsonian information technology plan FY 2006–FY 2011. Washington, DC: Smithsonian Institution.

Office of Information Technology. (2006, August). Systems development life cycle (SDLC). Volume 2: SDLC Phases. Annapolis, MD: Maryland Department of Budget & Management.

Project Management Institute. (2004). A guide to the project management body of knowledge (PMBOK Guide) (3rd ed.). Newtown Square, PA: Project Management Institute, Inc.

Schwalbe, K. (2005). Information technology project management (4th ed.). Boston: Thomson Course Technology.

Shelly, G. B., Cashman, T. J., & Vermaat, M. E. (2007). Discovering computers 2007: A gateway to information. Boston: Thomson Course Technology.