Week 6

profilebemp1130
ch6.16.2.docx

 

 

Page 215

section 6.1

Data, Information, and Databases

LEARNING OUTCOMES

6.1Explain the four primary traits that determine the value of information.

6.2Describe a database, a database management system, and the relational database model.

6.3Identify the business advantages of a relational database.

6.4Explain the business benefits of a data-driven website.

THE BUSINESS BENEFITS OF HIGH-QUALITY INFORMATION

LO 6.1: Explain the four primary traits that determine the value of information.

Information is powerful. Information can tell an organization how its current operations are performing and help it estimate and strategize about how future operations might perform. The ability to understand, digest, analyze, and filter information is key to growth and success for any professional in any industry. Remember that new perspectives and opportunities can open up when you have the right data that you can turn into information and ultimately business intelligence.

Information is everywhere in an organization. Managers in sales, marketing, human resources, and management need information to run their departments and make daily decisions. When addressing a significant business issue, employees must be able to obtain and analyze all the relevant information so they can make the best decision possible. Information comes at different levels, formats, and granularities.  Information granularity  refers to the extent of detail within the information (fine and detailed or coarse and abstract). Employees must be able to correlate the different levels, formats, and granularities of information when making decisions. For example, a company might be collecting information from various suppliers to make needed decisions, only to find that the information is in different levels, formats, and granularities. One supplier might send detailed information in a spreadsheet, whereas another supplier might send summary information in a Word document, and still another might send a collection of information from emails. Employees will need to compare these differing types of information for what they commonly reveal to make strategic decisions.  Figure 6.4  displays the various levels, formats, and granularities of organizational information.

Successfully collecting, compiling, sorting, and finally analyzing information from multiple levels, in varied formats, and exhibiting different granularities can provide tremendous insight into how an organization is performing. Exciting and unexpected results can include potential new markets, new ways of reaching customers, and even new methods of doing business. After understanding the different levels, formats, and granularities of information, managers next want to look at the four primary traits that help determine the value of information (see  Figure 6.5 ).

Information Type: Transactional and Analytical

As discussed previously in the text, the two primary types of information are transactional and analytical. Transactional information encompasses all of the information contained within a single business process or unit of work, and its primary purpose is to support daily operational tasks. Organizations need to capture and store transactional information to perform operational tasks and repetitive decisions such as analyzing daily sales reports and production schedules to determine how much inventory to carry. Consider Walmart, which handles more than 1 million customer transactions every hour, and Facebook, which keeps track of 400 million active users (along with their photos, friends, and web links). In addition, every time a cash register rings up a sale, a deposit or withdrawal is made from an ATM, or a receipt is given at the gas pump, the transactional information must be captured and stored.

Page 216

FIGURE 6.4

Levels, Formats, and Granularities of Organizational Information

Analytical information encompasses all organizational information, and its primary purpose is to support the performance of managerial analysis tasks. Analytical information is useful when making important decisions such as whether the organization should build a new manufacturing plant or hire additional sales personnel. Analytical information makes it possible to do many things that previously were difficult to accomplish, such as spot business trends, prevent diseases, and fight crime. For example, credit card companies crunch through billions of transactional purchase records to identify fraudulent activity. Indicators such as charges in a foreign country or consecutive purchases of gasoline send a red flag highlighting potential fraudulent activity.

Walmart was able to use its massive amount of analytical information to identify many unusual trends, such as a correlation between storms and Pop-Tarts. Yes, Walmart discovered an increase in the demand for Pop-Tarts during the storm season. Armed with that valuable information, the retail chain was able to stock up on Pop-Tarts that were ready for purchase when customers arrived.  Figure 6.6  displays different types of transactional and analytical information.

FIGURE 6.5

The Four Primary Traits of the Value of Information

Page 217

Information Timeliness

Timeliness is an aspect of information that depends on the situation. In some firms or industries, information that is a few days or weeks old can be relevant, whereas in others information that is a few minutes old can be almost worthless. Some organizations, such as 911 response centers, stock traders, and banks, require up-to-the-second information. Other organizations, such as insurance and construction companies, require only daily or even weekly information.

Real-time information  means immediate, up-to-date information.  Real-time systems  provide real-time information in response to requests. Many organizations use real-time systems to uncover key corporate transactional information. The growing demand for real-time information stems from organizations’ need to make faster and more effective decisions, keep smaller inventories, operate more efficiently, and track performance more carefully. Information also needs to be timely in the sense that it meets employees’ needs, but no more. If employees can absorb information only on an hourly or daily basis, there is no need to gather real-time information in smaller increments.

Most people request real-time information without understanding one of the biggest pitfalls associated with real-time information—continual change. Imagine the following scenario: Three managers meet at the end of the day to discuss a business problem. Each manager has gathered information at different times during the day to create a picture of the situation. Each manager’s picture may be different because of the time differences. Their views on the business problem may not match because the information they are basing their analysis on is continually changing. This approach may not speed up decision making, and it may actually slow it down. Business decision makers must evaluate the timeliness of the information for every decision. Organizations do not want to find themselves using real-time information to make a bad decision faster.

Information Quality

Business decisions are only as good as the quality of the information used to make them.  Information inconsistency  occurs when the same data element has different values. Take for example the amount of work that needs to occur to update a customer who had changed her last name due to marriage. Changing this information in only a few organizational systems will lead to data inconsistencies causing customer 123456 to be associated with two last names.  Information integrity issues  occur when a system produces incorrect, inconsistent, or duplicate data. Data integrity issues can cause managers to consider the system reports invalid and will make decisions based on other sources.

FIGURE 6.6

Transactional versus Analytical Information

Page 218

FIGURE 6.7

Five Common Characteristics of High-Quality Information

To ensure that your systems do not suffer from data integrity issues, review  Figure 6.7  for the five characteristics common to high-quality information: accuracy, completeness, consistency, timeliness, and uniqueness.  Figure 6.8  provides an example of several problems associated with using low-quality information, including:

1. Completeness. The customer’s first name is missing.

2.Another issue with completeness. The street address contains only a number and not a street name.

3. Consistency. There may be a duplication of information since there is a slight difference between the two customers in the spelling of the last name. Similar street addresses and phone numbers make this likely.

FIGURE 6.8

Example of Low-Quality Information

Page 219

APPLY YOUR KNOWLEDGE

BUSINESS DRIVEN MIS

Determining Information Quality Issues

Real People magazine is geared toward working individuals and provides articles and advice on everything from car maintenance to family planning. The magazine is currently experiencing problems with its distribution list. More than 30 percent of the magazines mailed are returned because of incorrect address information, and each month it receives numerous calls from angry customers complaining that they have not yet received their magazines. Below is a sample of Real People’s customer information. Create a report detailing all the issues with the information, potential causes of the information issues, and solutions the company can follow to correct the situation.

4. Accuracy. This may be inaccurate information because the customer’s phone and fax numbers are the same. Some customers might have the same number for phone and fax, but the fact that the customer also has this number in the email address field is suspicious.

5.Another issue with accuracy. There is inaccurate information because a phone number is located in the email address field.

6.Another issue with completeness. The information is incomplete because there is not a valid area code for the phone and fax numbers.

Nestlé uses 550,000 suppliers to sell more than 100,000 products in 200 countries. However, due to poor information, the company was unable to evaluate its business effectively. After some analysis, it found that it had 9 million records of vendors, customers, and materials, half of which were duplicated, obsolete, inaccurate, or incomplete. The analysis discovered that some records abbreviated vendor names, and other records spelled out the vendor names. This created multiple accounts for the same customer, making it impossible to determine the true value of Nestlé’s customers. Without being able to identify customer profitability, a company runs the risk of alienating its best customers. 2

Knowing how low-quality information issues typically occur can help a company correct them. Addressing these errors will significantly improve the quality of company information and the value to be extracted from it. The four primary reasons for low-quality information are:

1.Online customers intentionally enter inaccurate information to protect their privacy.

2.Different systems have different information entry standards and formats.

3.Data-entry personnel enter abbreviated information to save time or erroneous information by accident.

4.Third-party and external information contains inconsistencies, inaccuracies, and errors.

Page 220

Understanding the Costs of Using Low-Quality Information Using the wrong information can lead managers to make erroneous decisions. Erroneous decisions in turn can cost time, money, reputations, and even jobs. Some of the serious business consequences that occur due to using low-quality information to make decisions are:

Inability to track customers accurately.

Difficulty identifying the organization’s most valuable customers.

Inability to identify selling opportunities.

Lost revenue opportunities from marketing to nonexistent customers.

The cost of sending undeliverable mail.

Difficulty tracking revenue because of inaccurate invoices.

Inability to build strong relationships with customers.

Understanding the Benefits of Using High-Quality Information High-quality information can significantly improve the chances of making a good decision and directly increase an organization’s bottom line. One company discovered that even with its large number of golf courses, Phoenix, Arizona, is not a good place to sell golf clubs. An analysis revealed that typical golfers in Phoenix are tourists and conventioneers who usually bring their clubs with them. The analysis further revealed that two of the best places to sell golf clubs in the United States are Rochester, New York, and Detroit, Michigan. Equipped with this valuable information, the company was able to place its stores strategically and launch its marketing campaigns.

High-quality information does not automatically guarantee that every decision made is going to be a good one, because people ultimately make decisions and no one is perfect. However, such information ensures that the basis of the decisions is accurate. The success of the organization depends on appreciating and leveraging the true value of timely and high-quality information.

Information Governance

Information is a vital resource, and users need to be educated on what they can and cannot do with it. To ensure that a firm manages its information correctly, it will need special policies and procedures establishing rules on how the information is organized, updated, maintained, and accessed. Every firm, large and small, should create an information policy concerning data governance.  Data governance  refers to the overall management of the availability, usability, integrity, and security of company data.  Master data management (MDM)  is the practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems. MDM is commonly included in data governance. A company that supports a data governance program has a defined a policy that specifies who is accountable for various portions or aspects of the data, including its accuracy, accessibility, consistency, timeliness, and completeness. The policy should clearly define the processes concerning how to store, archive, back up, and secure the data. In addition, the company should create a set of procedures identifying accessibility levels for employees. Then, the firm should deploy controls and procedures that enforce government regulations and compliance with mandates such as Sarbanes-Oxley.

STORING INFORMATION USING A RELATIONAL DATABASE MANAGEMENT SYSTEM

LO 6.2: Describe a database, a database management system, and the relational database model.

The core component of any system, regardless of size, is a database and a database management system. Broadly defined, a  database  maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses). A  database management system (DBMS)  creates, reads, updates, and deletes data in a database while controlling access and security. Managers send requests to the DBMS, and the DBMS performs the actual manipulation of the data in the database. Companies store their information in databases, and managers access these systems to answer operational questions such as how many customers purchased Product A in December or what the average sales were by region. Two primary tools are available for retrieving information from a DBMS. First is a  query-by-example (QBE) tool  that helps users graphically design the answer to a question against a database. Second is a  structured query language (SQL)  that asks users to write lines of code to answer questions against a database. Managers typically interact with QBE tools, and MIS professionals have the skills required to code SQL.  Figure 6.9  displays the relationship between a database, a DBMS, and a user. Some of the more popular examples of DBMS include MySQL, Microsoft Access, SQL Server, FileMaker, Oracle, and FoxPro.

Page 221

APPLY YOUR KNOWLEDGE

BUSINESS DRIVEN DEBATE

Excel or Access?

Excel is a great tool with which to perform business analytics. Your friend, John Cross, owns a successful publishing company specializing in Do It Yourself books. John started the business 10 years ago and has slowly grown to 50 employees and $1 million in sales. John has been using Excel to run the majority of his business, tracking book orders, production orders, shipping orders, and billing. John even uses Excel to track employee payroll and vacation dates. To date, Excel has done the job, but as the company continues to grow, the tool is becoming inadequate.

You believe John could benefit from moving from Excel to Access. John is skeptical of the change because Excel has done the job up to now, and his employees are comfortable with the current processes and technology. John has asked you to prepare a presentation explaining the limitations of Excel and the benefits of Access. In a group, prepare the presentation that will help convince John to make the switch.

data element (or data field)  is the smallest or basic unit of information. Data elements can include a customer’s name, address, email, discount rate, preferred shipping method, product name, quantity ordered, and so on.  Data models  are logical data structures that detail the relationships among data elements by using graphics or pictures.

Metadata  provides details about data. For example, metadata for an image could include its size, resolution, and date created. Metadata about a text document could contain document length, data created, author’s name, and summary. Each data element is given a description, such as Customer Name; metadata is provided for the type of data (text, numeric, alphanumeric, date, image, binary value) and descriptions of potential predefined values such as a certain area code; and finally the relationship is defined. A  data dictionary  compiles all of the metadata about the data elements in the data model. Looking at a data model along with reviewing the data dictionary provides tremendous insight into the database’s functions, purpose, and business rules.

DBMS use three primary data models for organizing information—hierarchical, network, and the relational database, the most prevalent. A  relational database model  stores information in the form of logically related two-dimensional tables. A  relational database management system  allows users to create, read, update, and delete data in a relational database. Although the hierarchical and network models are important, this text focuses only on the relational database model.

FIGURE 6.9

Relationship of Database, DBMS, and User

Page 222

Storing Data Elements in Entities and Attributes

For flexibility in supporting business operations, managers need to query or search for the answers to business questions such as which artist sold the most albums during a certain month. The relationships in the relational database model help managers extract this information.  Figure 6.10  illustrates the primary concepts of the relational database model—entities, attributes, keys, and relationships. An  entity  (also referred to as a table) stores information about a person, place, thing, transaction, or event. The entities, or tables, of interest in  Figure 6.10  are TRACKS, RECORDINGS, MUSICIANS, and CATEGORIES. Notice that each entity is stored in a different two-dimensional table (with rows and columns).

Attributes  (also called columns or fields) are the data elements associated with an entity. In  Figure 6.10 , the attributes for the entity TRACKS are TrackNumber, TrackTitle, TrackLength, and RecordingID. Attributes for the entity MUSICIANS are MusicianID, MusicianName, MusicianPhoto, and MusicianNotes. A  record  is a collection of related data elements (in the MUSICIANS table, these include “3, Lady Gaga,  gag.tiff , Do not bring young kids to live shows”). Each record in an entity occupies one row in its respective table.

Creating Relationships Through Keys

To manage and organize various entities within the relational database model, you use primary keys and foreign keys to create logical relationships. A  primary key  is a field (or group of fields) that uniquely identifies a given record in a table. In the table RECORDINGS, the primary key is the field RecordingID that uniquely identifies each record in the table. Primary keys are a critical piece of a relational database because they provide a way of distinguishing each record in a table; for instance, imagine you need to find information on a customer named Steve Smith. Simply searching the customer name would not be an ideal way to find the information because there might be 20 customers with the name Steve Smith. This is the reason the relational database model uses primary keys to identify each record uniquely. Using Steve Smith’s unique ID allows a manager to search the database to identify all information associated with this customer.

FIGURE 6.10

Primary Concepts of the Relational Database Model

Page 223

APPLY YOUR KNOWLEDGE

BUSINESS DRIVEN START-UP

2 Trillion Rows of Data Analyzed Daily—No Problem

eBay is the world’s largest online marketplace, with 97 million global users selling anything to anyone at a yearly total of $62 billion—more than $2,000 every second. Of course with this many sales, eBay is collecting the equivalent of the Library of Congress worth of data every three days that must be analyzed to run the business successfully. Luckily, eBay discovered Tableau!

Tableau started at Stanford when Chris Stolte, a computer scientist; Pat Hanrahan, an Academy Award–winning professor; and Christian Chabot, a savvy business leader, decided to solve the problem of helping ordinary people understand big data. The three created Tableau, which bridged two computer science disciplines: computer graphics and databases. No more need to write code or understand the relational database keys and categories; users simply drag and drop pictures of what they want to analyze. Tableau has become one of the most successful data visualization tools on the market, winning multiple awards, international expansion, and millions in revenue and spawning multiple new inventions. 3

Tableau is revolutionizing business analytics, and this is only the beginning. Visit the Tableau website and become familiar with the tool by watching a few of the demos. Once you have a good understanding of the tool, create three questions eBay might be using Tableau to answer, including the analysis of its sales data to find patterns, business insights, and trends.

foreign key  is a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables. For instance, Black Eyed Peas in  Figure 6.10  is one of the musicians appearing in the MUSICIANS table. Its primary key, MusicianID, is “2.” Notice that MusicianID also appears as an attribute in the RECORDINGS table. By matching these attributes, you create a relationship between the MUSICIANS and RECORDINGS tables that states the Black Eyed Peas (MusicianID 2) have several recordings, including The E.N.D., Monkey Business, and Elepunk. In essence, MusicianID in the RECORDINGS table creates a logical relationship (who was the musician that made the recording) to the MUSICIANS table. Creating the logical relationship between the tables allows managers to search the data and turn it into useful information.

Coca Cola Relational Database Example

Figure 6.11  illustrates the primary concepts of the relational database model for a sample order of soda from Coca Cola.  Figure 6.11  offers an excellent example of how data is stored in a database. For example, the order number is stored in the ORDER table, and each line item is stored in the ORDER LINE table. Entities include CUSTOMER, ORDER, ORDER LINE, PRODUCT, and DISTRIBUTOR. Attributes for CUSTOMER include Customer ID, Customer Name, Contact Name, and Phone. Attributes for PRODUCT include Product ID, Description, and Price. The columns in the table contain the attributes.

Consider Hawkins Shipping, one of the distributors appearing in the DISTRIBUTOR table. Its primary key, Distributor ID, is DEN8001. Distributor ID also appears as an attribute in the ORDER table. This establishes that Hawkins Shipping (Distributor ID DEN8001) was responsible for delivering orders 34561 and 34562 to the appropriate customer(s). Therefore, Distributor ID in the ORDER table creates a logical relationship (who shipped what order) between ORDER and DISTRIBUTOR.

Page 224

FIGURE 6.11

Potential Relational Database for Coca-Cola Bottling Company of Egypt (TCCBCE)

Page 225

USING A RELATIONAL DATABASE FOR BUSINESS ADVANTAGES

LO 6.3: Identify the business advantages of a relational database.

Many business managers are familiar with Excel and other spreadsheet programs they can use to store business data. Although spreadsheets are excellent for supporting some data analysis, they offer limited functionality in terms of security, accessibility, and flexibility and can rarely scale to support business growth. From a business perspective, relational databases offer many advantages over using a text document or a spreadsheet, as displayed in  Figure 6.12 .

Increased Flexibility

Databases tend to mirror business structures, and a database needs to handle changes quickly and easily, just as any business needs to be able to do. Equally important, databases need to provide flexibility in allowing each user to access the information in whatever way best suits his or her needs. The distinction between logical and physical views is important in understanding flexible database user views. The  physical view of information  deals with the physical storage of information on a storage device. The  logical view of information  focuses on how individual users logically access information to meet their own particular business needs.

In the database illustration from  Figure 6.10 , for example, one user could perform a query to determine which recordings had a track length of four minutes or more. At the same time, another user could perform an analysis to determine the distribution of recordings as they relate to the different categories. For example, are there more R&B recordings than rock, or are they evenly distributed? This example demonstrates that although a database has only one physical view, it can easily support multiple logical views that provide for flexibility.

Consider another example—a mail-order business. One user might want a report presented in alphabetical format, in which case, the last name should appear before first name. Another user, working with a catalog mailing system, would want customer names appearing as first name and then last name. Both are easily achievable but different logical views of the same physical information.

Increased Scalability and Performance

In its first year of operation, the official website of the American Family Immigration History Center,  www.ellisisland.org , generated more than 2.5 billion hits. The site offers immigration information about people who entered America through the Port of New York and Ellis Island between 1892 and 1924. The database contains more than 25 million passenger names that are correlated to 3.5 million images of ships’ manifests. 4

The database had to be scalable to handle the massive volumes of information and the large numbers of users expected for the launch of the website. In addition, the database needed to perform quickly under heavy use. Some organizations must be able to support hundreds or thousands of users, including employees, partners, customers, and suppliers, who all want to access and share the same information. Databases today scale to exceptional levels, allowing all types of users and programs to perform information-processing and information-searching tasks.

FIGURE 6.12

Business Advantages of a Relational Database

Page 226

Reduced Information Redundancy

Information redundancy  is the duplication of data, or the storage of the same data in multiple places. Redundant data can cause storage issues along with data integrity issues, making it difficult to determine which values are the most current or most accurate. Employees become confused and frustrated when faced with incorrect information causing disruptions to business processes and procedures. One primary goal of a database is to eliminate information redundancy by recording each piece of information in only one place in the database. This saves disk space, makes performing information updates easier, and improves information quality.

Increased Information Integrity (Quality)

Information integrity  is a measure of the quality of information.  Integrity constraints  are rules that help ensure the quality of information. The database design needs to consider integrity constraints. The database and the DBMS ensures that users can never violate these constraints. There are two types of integrity constraints: (1) relational and (2) business critical.

Relational integrity constraints  are rules that enforce basic and fundamental information-based constraints. For example, a relational integrity constraint would not allow someone to create an order for a nonexistent customer, provide a markup percentage that was negative, or order zero pounds of raw materials from a supplier. A  business rule  defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer. Stating that merchandise returns are allowed within 10 days of purchase is an example of a business rule.  Business-critical integrity constraints  enforce business rules vital to an organization’s success and often require more insight and knowledge than relational integrity constraints. Consider a supplier of fresh produce to large grocery chains such as Kroger. The supplier might implement a business-critical integrity constraint stating that no product returns are accepted after 15 days past delivery. That would make sense because of the chance of spoilage of the produce. Business-critical integrity constraints tend to mirror the very rules by which an organization achieves success.

The specification and enforcement of integrity constraints produce higher-quality information that will provide better support for business decisions. Organizations that establish specific procedures for developing integrity constraints typically see an increase in accuracy that then increases the use of organizational information by business professionals.

Increased Information Security

Managers must protect information, like any asset, from unauthorized users or misuse. As systems become increasingly complex and highly available over the Internet on many devices, security becomes an even bigger issue. Databases offer many security features, including passwords to provide authentication, access levels to determine who can access the data, and access controls to determine what type of access they have to the information.

For example, customer service representatives might need read-only access to customer order information so they can answer customer order inquiries; they might not have or need the authority to change or delete order information. Managers might require access to employee files, but they should have access only to their own employees’ files, not the employee files for the entire company. Various security features of databases can ensure that individuals have only certain types of access to certain types of information.

Security risks are increasing as more and more databases and DBMS systems are moving to data centers run in the cloud. The biggest risks when using cloud computing are ensuring the security and privacy of the information in the database. Implementing data governance policies and procedures that outline the data management requirements can ensure safe and secure cloud computing.

Page 227

APPLY YOUR KNOWLEDGE

BUSINESS DRIVEN ETHICS AND SECURITY

Unethical Data Mining

Mining large amounts of data can create a number of benefits for business, society, and governments, but it can also create a number of ethical questions surrounding an invasion of privacy or misuse of information. Facebook recently came under fire for its data mining practices as it followed 700,000 accounts to determine whether posts with highly emotional content are more contagious. The study concluded that highly emotional texts are contagious, just as with real people. Highly emotional positive posts received multiple positive replies whereas highly emotional negative posts received multiple negative replies. Although the study seems rather innocent, many Facebook users were outraged; they felt the study was an invasion of privacy because the 700,000 accounts had no idea Facebook was mining their posts. As a Facebook user, you willingly consent that Facebook owns every bit and byte of data you post and, once you press submit, Facebook can do whatever it wants with your data. Do you agree or disagree that Facebook has the right to do whatever it wants with the data its 1.5 billion users post on its site? 5

DRIVING WEBSITES WITH DATA

LO 6.4: Explain the business benefits of a data-driven website.

content creator  is the person responsible for creating the original website content. A  content editor  is the person responsible for updating and maintaining website content.  Static information  includes fixed data incapable of change in the event of a user action.  Dynamic information  includes data that change based on user actions. For example, static websites supply only information that will not change until the content editor changes the information. Dynamic information changes when a user requests information. A dynamic website changes information based on user requests such as movie ticket availability, airline prices, or restaurant reservations. Dynamic website information is stored in a  dynamic catalog , or an area of a website that stores information about products in a database.

Websites change for site visitors depending on the type of information they request. Consider, for example, an automobile dealer. The dealer would create a database containing data elements for each car it has available for sale, including make, model, color, year, miles per gallon, a photograph, and so on. Website visitors might click Porsche and then enter their specific requests such as price range or year made. Once the user hits Go, the website automatically provides a custom view of the requested information. The dealer must create, update, and delete automobile information as the inventory changes.

data-driven website  is an interactive website kept constantly updated and relevant to the needs of its customers using a database. Data-driven capabilities are especially useful when a firm needs to offer large amounts of information, products, or services. Visitors can become quickly annoyed if they find themselves buried under an avalanche of information when searching a website. A data-driven website can help limit the amount of information displayed to customers based on unique search requirements. Companies even use data-driven websites to make information in their internal databases available to customers and business partners.

There are a number of advantages to using the web to access company databases. First, web browsers are much easier to use than directly accessing the database by using a custom-query tool. Second, the web interface requires few or no changes to the database model. Finally, it costs less to add a web interface in front of a DBMS than to redesign and rebuild the system to support changes. Additional data-driven website advantages include:

Easy to manage content: Website owners can make changes without relying on MIS professionals; users can update a data-driven website with little or no training.

Page 228

FIGURE 6.13

Zappos.com—A Data-Driven Website

FIGURE 6.14

BI in a Data-Driven Website

Page 229

Easy to store large amounts of data: Data-driven websites can keep large volumes of information organized. Website owners can use templates to implement changes for layouts, navigation, or website structure. This improves website reliability, scalability, and performance.

Easy to eliminate human errors: Data-driven websites trap data-entry errors, eliminating inconsistencies while ensuring that all information is entered correctly.

Zappos credits its success as an online shoe retailer to its vast inventory of nearly 3 million products available through its dynamic data-driven website. The company built its data-driven website catering to a specific niche market: consumers who were tired of finding that their most-desired items were always out of stock at traditional retailers. Zappos’ highly flexible, scalable, and secure database helped it rank as the most available Internet retailer.  Figure 6.13  displays the Zappos data-driven website illustrating a user querying the database and receiving information that satisfies the user’s request. 6

Companies can gain valuable business knowledge by viewing the data accessed and analyzed from their websites.  Figure 6.14  displays how running queries or using analytical tools, such as a PivotTable, on the database that is attached to the website can offer insight into the business, such as items browsed, frequent requests, items bought together, and so on.

section 6.2

Business Intelligence

LEARNING OUTCOMES

6.5Identify the advantages of using business intelligence to support managerial decision making.

6.6Define data warehousing and data marts and explain how they support business decisions.

6.7Describe the three organizational methods for analyzing big data.

SUPPORTING DECISIONS WITH BUSINESS INTELLIGENCE

LO 6.5: Identify the advantages of using business intelligence to support managerial decision making.

Many organizations today find it next to impossible to understand their own strengths and weaknesses, let alone their biggest competitors, because the enormous volume of organizational data is inaccessible to all but the MIS department. Organization data include far more than simple structured data elements in a database; the set of data also includes unstructured data such as voice mail, customer phone calls, text messages, video clips, and numerous new forms of data such as tweets from Twitter.

The Problem: Data Rich, Information Poor

An ideal business scenario would be as follows. As a business manager on his way to meet with a client reviews historical customer data, he realizes that the client’s ordering volume has substantially decreased. As he drills down into the data, he notices the client had a support issue with a particular product. He quickly calls the support team to find out all of the information and learns that a replacement for the defective part can be shipped in 24 hours. In addition, he learns that the client has visited the website and requested information on a new product line. Armed with all this information, the business manager is prepared for a productive meeting with his client. He now understands the client’s needs and issues, and he can address new sales opportunities with confidence.

For many companies, the preceding example is simply a pipe dream. Attempting to gather all of the client information would actually take hours or even days to compile. With so much data available, it is surprisingly hard for managers to get information, such as inventory levels, past order history, or shipping details. Managers send their information requests to the MIS department where a dedicated person compiles the various reports. In some situations, responses can take days, by which time the information may be outdated and opportunities lost. Many organizations find themselves in the position of being data rich and information poor. Even in today’s electronic world, managers struggle with the challenge of turning their business data into business intelligence.

Page 230

APPLY YOUR KNOWLEDGE

BUSINESS DRIVEN INNOVATION

News Dots

Gone are the days of staring at boring spreadsheets and trying to understand how the data correlate. With innovative data visualization tools, managers can arrange different ways to view the data, providing new forms of pattern recognition not offered by simply looking at numbers. Slate, a news publication, developed a new data visualization tool called News Dots, that offers readers a different way of viewing the daily news through trends and patterns. The News Dots tool scans about 500 stories a day from major publications and then tags the content with important keywords such as people, places, companies, and topics. Surprisingly, the majority of daily news overlaps as the people, places, and stories are frequently connected. Using News Dots, you can visualize how the news fits together, almost similar to a giant social network. News Dots uses circles (or dots) to represent the tagged content and arranges them according to size. The more frequently a certain topic is tagged, the larger the dot and its relationship to other dots. The tool is interactive and users simply click a dot to view which stories mention that topic and which other topics it connects to in the network such as a correlation among the U.S. government, Federal Reserve, Senate, bank, and Barack Obama. 7

How can data visualization help identify trends? What types of business intelligence could you identify if your college used a data visualization tool to analyze student information? What types of business intelligence could you identify if you used a data visualization tool to analyze the industry in which you plan to compete?

The Solution: Business Intelligence

Employee decisions are numerous, and they include providing service information, offering new products, and supporting frustrated customers. Employees can base their decisions on data, experience, or knowledge and preferably a combination of all three. Business intelligence can provide managers with the ability to make better decisions. A few examples of how different industries use business intelligence include:

Airlines: Analyze popular vacation locations with current flight listings.

Banking: Understand customer credit card usage and nonpayment rates.

Health care: Compare the demographics of patients with critical illnesses.

Insurance: Predict claim amounts and medical coverage costs.

Law enforcement: Track crime patterns, locations, and criminal behavior.

Marketing: Analyze customer demographics.

Retail: Predict sales, inventory levels, and distribution.

Technology: Predict hardware failures.

Figure 6.15  displays how organizations using BI can find the cause to many issues and problems simply by asking “Why?” The process starts by analyzing a report such as sales amounts by quarter. Managers will drill down into the report looking for why sales are up or why sales are down. Once they understand why a certain location or product is experiencing an increase in sales, they can share the information in an effort to raise enterprisewide sales. Once they understand the cause for a decrease in sales, they can take effective action to resolve the issue. Here are a few examples of how managers can use BI to answer tough business questions:

Page 231

FIGURE 6.15

How BI Can Answer Tough Customer Questions

Where has the business been? Historical perspective offers important variables for determining trends and patterns.

Where is the business now? Looking at the current business situation allows managers to take effective action to solve issues before they grow out of control.

Where is the business going? Setting strategic direction is critical for planning and creating solid business strategies.

Ask a simple question—such as who is my best customer or what is my worst-selling product—and you might get as many answers as you have employees. Databases, data warehouses, and data marts can provide a single source of “trusted” data that can answer questions about customers, products, suppliers, production, finances, fraud, and even employees. They can also alert managers to inconsistencies or help determine the causes and effects of enter-prisewide business decisions. All business aspects can benefit from the added insights provided by business intelligence, and you, as a business student, will benefit from understanding how MIS can help you make intelligent decisions.

THE BUSINESS BENEFITS OF DATA WAREHOUSING

LO 6.6: Define data warehousing and data marts and explain how they support business decisions.

In the 1990s as organizations began to need more timely information about their business, they found that traditional management information systems were too cumbersome to provide relevant information efficiently and effectively. Most of the systems were in the form of operational databases that were designed for specific business functions, such as accounting, order entry, customer service, and sales, and were not appropriate for business analysis for the reasons shown in  Figure 6.16 .

During the latter half of the 20th century, the numbers and types of operational databases increased. Many large businesses found themselves with information scattered across multiple systems with different file types (such as spreadsheets, databases, and even word processing files), making it almost impossible for anyone to use the information from multiple sources. Completing reporting requests across operational systems could take days or weeks using antiquated reporting tools that were ineffective for running a business. From this idea, the data warehouse was born as a place where relevant information could be stored and accessed for making strategic queries and reports.

data warehouse  is a logical collection of information, gathered from many operational databases, that supports business analysis activities and decision-making tasks. The primary purpose of a data warehouse is to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis. A key idea within data warehousing is to collect information from multiple systems in a common location that uses a universal querying tool. This allows operational databases to run where they are most efficient for the business, while providing a common location using a familiar format for the strategic or enterprisewide reporting information.

Page 232

FIGURE 6.16

Reasons Business Analysis Is Difficult from Operational Databases

Data warehouses go even a step further by standardizing information. Gender, for instance can be referred to in many ways (Male, Female, M/F, 1/0), but it should be standardized on a data warehouse with one common way of referring to each data element that stores gender (M/F). Standardization of data elements allows for greater accuracy, completeness, and consistency and increases the quality of the information in making strategic business decisions. The data warehouse then is simply a tool that enables business users, typically managers, to be more effective in many ways, including:

Developing customer profiles.

Identifying new-product opportunities.

Improving business operations.

Identifying financial issues.

Analyzing trends.

Understanding competitors.

Understanding product performance. (See  Figure 6.17  for the three core concepts of data warehousing.)

DATA MARTS

Businesses collect a tremendous amount of transactional information as part of their routine operations. Marketing, sales, and other departments would like to analyze these data to understand their operations better. Although databases store the details of all transactions (for instance, the sale of a product) and events (hiring a new employee), data warehouses store that same information but in an aggregated form more suited to supporting decision-making tasks. Aggregation, in this instance, can include totals, counts, averages, and the like.

Page 233

FIGURE 6.17

Three Core Concepts of Data Warehousing

The data warehouse modeled in  Figure 6.18  compiles information from internal databases (or transactional and operational databases) and external databases through extraction, transformation, and loading.  Extraction, transformation, and loading (ETL)  is a process that extracts information from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse. The data warehouse then sends portions (or subsets) of the information to data marts. A  data mart  contains a subset of data warehouse information. To distinguish between data warehouses and data marts, think of data warehouses as having a more organizational focus and data marts as having a functional focus.  Figure 6.18  provides an illustration of a data warehouse and its relationship to internal and external databases, ETL, and data marts.

FIGURE 6.18

Data Warehouse Model

Page 234

Multidimensional Analysis

A relational database contains information in a series of two-dimensional tables. In a data warehouse and data mart, information contains layers of columns and rows. For this reason, most data warehouses and data marts are multidimensional databases. A dimension is a particular attribute of information. Each layer in a data warehouse or data mart represents information according to an additional dimension. An  information cube  is the common term for the representation of multidimensional information.  Figure 6.19  displays a cube (cube a) that represents store information (the layers), product information (the rows), and promotion information (the columns).

After creating a cube of information, users can begin to slice and dice the cube to drill down into the information. The second cube (cube b) in  Figure 6.19  displays a slice representing promotion II information for all products at all stores. The third cube (cube c) in  Figure 6.19 displays only information for promotion III, product B, at store 2. By using multidimensional analysis, users can analyze information in a number of ways and with any number of dimensions. Users might want to add dimensions of information to a current analysis, including product category, region, and even forecasted versus actual weather. The true value of a data warehouse is its ability to provide multidimensional analysis that allows users to gain insights into their information.

Data warehouses and data marts are ideal for off-loading some of the querying against a database. For example, querying a database to obtain an average of sales for Product B at Store 2 while Promotion III is under way might create a considerable processing burden for a database, increasing the time it takes another person to enter a new sale into the same database. If an organization performs numerous queries against a database (or multiple databases), aggregating that information into a data warehouse will be beneficial.

Information Cleansing or Scrubbing

Dirty data  is erroneous or flawed data (see  Figure 6.20 ). The complete removal of dirty data from a source is impractical or virtually impossible. According to Gartner Inc., dirty data is a business problem, not an MIS problem. Over the next two years, more than 25 percent of critical data in Fortune 1000 companies will continue to be flawed; that is, the information will be inaccurate, incomplete, or duplicated.

Obviously, maintaining quality information in a data warehouse or data mart is extremely important. To increase the quality of organizational information and thus the effectiveness of decision making, businesses must formulate a strategy to keep information clean.  Information cleansing or scrubbing  is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information.

FIGURE 6.19

A Cube of Information for Performing a Multidimensional Analysis on Three Stores for Five Products and Four Promotions

Page 235

APPLY YOUR KNOWLEDGE

BUSINESS DRIVEN DISCUSSION

Butterfly Effects

The butterfly effect, an idea from chaos theory in mathematics, refers to the way a minor event—like the movement of a butterfly’s wing—can have a major impact on a complex system like the weather. Dirty data can have the same impact on a business as the butterfly effect. Organizations depend on the movement and sharing of data throughout the organization, so the impact of data quality errors are costly and far-reaching. Such data issues often begin with a tiny mistake in one part of the organization, but the butterfly effect can produce disastrous results, making its way through MIS systems to the data warehouse and other enterprise systems. When dirty data or low-quality data enters organizational systems, a tiny error such as a spelling mistake can lead to revenue loss, process inefficiency, and failure to comply with industry and government regulations. Explain how the following errors can affect an organization:

A cascading spelling mistake

Inaccurate customer records

Incomplete purchasing history

Inaccurate mailing address

Duplicate customer numbers for different customers

 

Specialized software tools exist that use sophisticated procedures to analyze, standardize, correct, match, and consolidate data warehouse information. This step is vitally important because data warehouses often contain information from several databases, some of which can be external to the organization. In a data warehouse, information cleansing occurs first during the ETL process and again once the information is in the data warehouse. Companies can choose information cleansing software from several vendors, including Oracle, SAS, Ascential Software, and Group 1 Software. Ideally, scrubbed information is accurate and consistent.

FIGURE 6.20

Dirty Data Problems

Page 236

Looking at customer information highlights why information cleansing is necessary. Customer information exists in several operational systems. In each system, all the details could change—from the customer ID to contact information—depending on the business process the user is performing (see  Figure 6.21 ).

Figure 6.22  displays a customer name entered differently in multiple operational systems. Information cleansing allows an organization to fix these types of inconsistencies in the data warehouse.  Figure 6.23  displays the typical events that occur during information cleansing.

FIGURE 6.21

Contact Information in Operational Systems

FIGURE 6.22

Standardizing a Customer Name in Operational Systems

Page 237

FIGURE 6.23

Information Cleansing Activities

FIGURE 6.24

The Cost of Accurate and Complete Information

Achieving perfect information is almost impossible. The more complete and accurate a company wants its information to be, the more it costs (see  Figure 6.24 ). Companies may also trade accuracy for completeness. Accurate information is correct, whereas complete information has no blanks. A birth date of 2/31/10 is an example of complete but inaccurate information (February 31 does not exist). An address containing Denver, Colorado, without a zip code is an example of accurate information that is incomplete. Many firms complete  data quality audits to determine the accuracy and completeness of its data. Most organizations determine a percentage of accuracy and completeness high enough to make good decisions at a reasonable cost, such as 85 percent accurate and 65 percent complete.

THE POWER OF BIG DATA ANALYTICS

LO 6.7: Describe the three organizational methods for analyzing big data.

Companies are collecting more data than ever. Historically, data were housed in functional systems that were not integrated, such as customer service, finance, and human resources. Today companies can gather all of the functional data together by the zetabyte, but finding a way to analyze the data is incredibly challenging.  Figure 6.25  displays the three methods organizations are using to dissect, analyze, and understand organizational data.

Page 238

FIGURE 6.25

Three Organizational Methods for Analyzing Big Data

Data Mining

Data mining  is the process of analyzing data to extract information not offered by the raw data alone. Data mining can also begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down) or the reverse (drilling up). Companies use data-mining techniques to compile a complete picture of their operations, all within a single view, allowing them to identify trends and improve forecasts. Consider Best Buy, which used data-mining tools to identify that 7 percent of its customers accounted for 43 percent of its sales, so the company reorganized its stores to accommodate those customers.

To perform data mining, users need data-mining tools.  Data-mining tools  use a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making. Data mining uncovers trends and patterns, which analysts use to build models that, when exposed to new information sets, perform a variety of information analysis functions. Data-mining tools for data warehouses help users uncover business intelligence in their data.  Figure 6.26  displays the data-mining analysis methods used to uncover patterns and trends for business analysis such as:

Analyzing customer buying patterns to predict future marketing and promotion campaigns.

Building budgets and other financial information.

Detecting fraud by identifying deceptive spending patterns.

Finding the best customers who spend the most money.

Keeping customers from leaving or migrating to competitors.

Promoting and hiring employees to ensure success for both the company and the individual.

Page 239

FIGURE 6.26

Data Mining Analysis Methods

Data mining enables these companies to determine relationships among such internal factors as price, product positioning, or staff skills, and external factors such as economic indicators, competition, and customer demographics. In addition, it enables companies to determine the impact on sales, customer satisfaction, and corporate profits and to drill down into summary information to view detailed transactional data. With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual’s purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.

Netflix uses data mining to analyze each customer’s film-viewing habits to provide recommendations for other customers with Cinematch, its movie recommendation system. Using Cinematch, Netflix can present customers with a number of additional movies they might want to watch based on the customer’s current preferences. Netflix’s innovative use of data mining provides its competitive advantage in the movie rental industry. Data mining uses specialized technologies and functionalities such as query tools, reporting tools, multidimensional analysis tools, statistical tools, and intelligent agents to uncover patterns displayed in  Figure 6.27 .

FIGURE 6.27

Data-Mining Techniques

Page 240

Big Data Analytics

Structured data  has a defined length, type, and format and includes numbers, dates, or strings such as Customer Address. Structured data is typically stored in a traditional system such as a relational database or spreadsheet and accounts for about 20 percent of the data that surrounds us. The sources of structured data include:

Machine-generated data , created by a machine without human intervention. Machine-generated structured data includes sensor data, point-of-sale data, and web log (blog) data.

Human-generated data  is data that humans, in interaction with computers, generate. Human-generated structured data includes input data, click-stream data, or gaming data.

Unstructured data  is not defined, does not follow a specified format, and is typically free-form text such as emails, Twitter tweets, and text messages. Unstructured data accounts for about 80 percent of the data that surrounds us. The sources of unstructured data include:

Machine-generated unstructured data: satellite images, scientific atmosphere data, and radar data.

Human-generated unstructured data: text messages, social media data, and emails.

Big data  is a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools. The four common characteristics of big data are detailed in  Figure 6.28 . Big data requires sophisticated tools to analyze all the unstructured information from millions of customers, devices, and machine interactions. Big data are analyzed for marketing trends in business as well as in the fields of manufacturing, medicine, and science.

FIGURE 6.28

Four Common Characteristics of Big Data

Page 241

Distributed computing  processes and manages algorithms across many machines in a computing environment. Big data tools use distributed computing to store and analyze data across databases stored around the globe. Traditional analytical tools focus on basic business intelligence, including querying and reporting of historical data against a relational database. Traditional data-mining tools focus on history and explain where the organization has been.  Advanced analytics  focuses on forecasting future trends and producing insights using sophisticated quantitative methods, including statistics, descriptive and predictive data mining, simulation, and optimization. Advanced analytics uses data patterns to make forward-looking predictions to explain to the organization where it is headed. A  data scientist  extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information.  Figure 6.29  displays the techniques a data scientist will use to perform big data advanced analytics.

Data Visualization

Traditional bar graphs and pie charts are boring and at best confusing and at worst misleading. As databases and graphics collide more and more, people are creating infographics (information graphics), which display information graphically so it can be easily understood.  Infographics  present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format. Inforgraphics are exciting and quickly convey a story users can understand without having to analyze numbers, tables, and boring charts. Great data visualizations provide insights into something new about the underlying patterns and relationships. Just think of the periodic table of elements and imagine if you had to look at an Excel spreadsheet showing each element and the associated attributes in a table format. This would be not only difficult to understand but easy to misinterpret. By placing the elements in the visual periodic table, you quickly grasp how the elements relate and the associated hierarchy. Infographics perform the same function for business data as the periodic table does for chemical elements.

FIGURE 6.29

Big Data Advanced Analytical Techniques

Page 242

APPLY YOUR KNOWLEDGE

BUSINESS DRIVEN GLOBALIZATION

Integrity Information Inc.

Congratulations! You have just been hired as a consultant for Integrity Information Inc., a start-up business intelligence consulting company. Your first job is to help work with the sales department in securing a new client, The Warehouse. The Warehouse has been operating in the United States for more than a decade, and its primary business is to sell wholesale low-cost products. The Warehouse is interested in hiring Integrity Information Inc. to clean up the data that are stored in its U.S. database. To determine how good your work is, the client would like your analysis of the following spreadsheet. The Warehouse is also interested in expanding globally and wants to purchase several independent wholesale stores located in Australia, Thailand, China, Japan, and the United Kingdom. Before the company moves forward with the venture, it wants to understand what types of data issues it might encounter as it begins to transfer data from each global entity to the data warehouse. Please create a list detailing the potential issues The Warehouse can anticipate encountering as it consolidates the global databases into a single data warehouse. 8

Page 243

Analysis paralysis  occurs when the user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome. In the time of big data, analysis paralysis is a growing problem. One solution is to use data visualizations to help people make decisions faster.  Data visualization  describes technologies that allow users to see or visualize data to transform information into a business perspective. Data visualization is a powerful way to simplify complex data sets by placing data in a format that is easily grasped and understood far quicker than the raw data alone.  Data visualization tools  move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more. Data visualization tools can help uncover correlations and trends in data that would otherwise go unrecognized.  Business intelligence dashboards  track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis. The majority of business intelligence software vendors offer a number of data visualization tools and business intelligence dashboards. A  data artist  is a business analytics specialist who uses visual tools to help people understand complex data.

Big data is one of the most promising technology trends occurring today. Of course, notable companies such as Facebook, Google, and Netflix are gaining the most business insights from big data currently, but many smaller markets are entering the scene, including retail, insurance, and health care. Over the next decade, as big data starts to improve your everyday life by providing insights into your social relationships, habits, and careers, you can expect to see the need for data scientists and data artists dramatically increase.

LEARNING OUTCOME REVIEW

Learning Outcome 6.1: Explain the four primary traits that determine the value of information.

Information is data converted into a meaningful and useful context. Information can tell an organization how its current operations are performing and help it estimate and strategize about how future operations might perform. It is important to understand the different levels, formats, and granularities of information along with the four primary traits that help determine the value of information, which include (1) information type: transactional and analytical; (2) information timeliness; (3) information quality; and (4) information governance.

Learning Outcome 6.2: Describe a database, a database management system, and the relational database model.

A database maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses). A database management system (DBMS) creates, reads, updates, and deletes data in a database while controlling access and security. A DBMS provides methodologies for creating, updating, storing, and retrieving data in a database. In addition, a DBMS provides facilities for controlling data access and security, allowing data sharing and enforcing data integrity. The relational database model allows users to create, read, update, and delete data in a relational database.

Learning Outcome 6.3: Identify the business advantages of a relational database.

Many business managers are familiar with Excel and other spreadsheet programs they can use to store business data. Although spreadsheets are excellent for supporting some data analysis, they offer limited functionality in terms of security, accessibility, and flexibility and can rarely scale to support business growth. From a business perspective, relational databases offer many advantages over using a text document or a spreadsheet, including increased flexibility, increased scalability and performance, reduced information redundancy, increased information integrity (quality), and increased information security.

Page 244

Learning Outcome 6.4: Explain the business benefits of a data-driven website.

A data-driven website is an interactive website kept constantly updated and relevant to the needs of its customers using a database. Data-driven capabilities are especially useful when the website offers a great deal of information, products, or services because visitors are frequently annoyed if they are buried under an avalanche of information when searching a website. Many companies use the web to make some of the information in their internal databases available to customers and business partners.

Learning Outcome 6.5: Identify the advantages of using business intelligence to support managerial decision making.

Many organizations today find it next to impossible to understand their own strengths and weaknesses, let alone their biggest competitors, due to enormous volumes of organizational data being inaccessible to all but the MIS department. Organization data include far more than simple structured data elements in a database; the set of data also includes unstructured data such as voice mail, customer phone calls, text messages, video clips, along with numerous new forms of data, such as tweets from Twitter. Managers today find themselves in the position of being data rich and information poor, and they need to implement business intelligence systems to solve this challenge.

Learning Outcome 6.6: Define data warehousing and data marts and explain how they support business decisions.

A data warehouse is a logical collection of information, gathered from many different operational databases, that supports business analysis and decision making. The primary value of a data warehouse is to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis.

Learning Outcome 6.7: Describe the three organizational methods for analyzing big data.

Data mining, big data analytics, and data visualization are the three methods organizations are using to dissect, analyze, and understand organizational data. Data mining is the process of analyzing data to extract information not offered by the raw data alone. Data mining can also begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down), or the reverse (drilling up). Big data is a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools. Data visualization describes technologies that allow users to see or visualize data to transform information into a business perspective.

OPENING CASE QUESTIONS

1. Knowledge:List the reasons a business would want to display information in a graphic or visual format.

2. Comprehension:Describe how a business could use a business intelligence digital dashboard to gain an understanding of how the business is operating.

3. Application:Explain how a marketing department could use data visualization tools to help with the release of a new product.

4. Analysis:Categorize the five common characteristics of high-quality information and rank them in order of importance for Hotels.com.

5. Synthesis:Develop a list of some possible entities and attributes located in the Hotels.com database.

6. Evaluate:Assess how Hotels.com is using BI to identify trends and change associated business processes.

Page 245

KEY TERMS

Advanced analytics

Analysis paralysis

Attribute

Big data

Business-critical integrity constraint

Business rule

Business intelligence dashboard

Content creator

Content editor

Data dictionary

Data element (or data field)

Data governance

Data mart

Data mining

Data model

Data quality audit

Data visualization

Data visualization tools

Data warehouse

Database

Database management system (DBMS)

Data-driven website

Data-mining tool

Data artist

Data scientist

Dirty data

Distributed computing

Dynamic catalog

Dynamic information

Entity

Extraction, transformation, and loading (ETL)

Foreign key

Human-generated data

Infographic (or information graphic)

Information cleansing or scrubbing

Information cube

Information granularity

Information inconsistency

Information integrity

Information integrity issues

Information redundancy

Integrity constraint

Logical view of information

Machine-generated data

Master data management (MDM)

Metadata

Physical view of information

Primary key

Query-by-example (QBE) tool

Real-time information

Real-time system

Record

Relational database management system

Relational database model

Relational integrity constraint

Static information

Structured data

Structured query language (SQL)

Time-series information

Unstructured data

REVIEW QUESTIONS

1.How does a database turn data elements into information?

2.Why does a business need to be concerned with the quality of its data?

3.How can data governance help protect a business from hackers?

4.Why would a company care about the timeliness of its data?

5.What are the five characteristics common to high-quality information?

6.What is data governance and its importance to a company?

7.What are the four primary traits that help determine the value of information?

8.What is the difference between an entity and an attribute?

9.What are the advantages of a relational database?

10.What are the advantages of a data-driven website?

11.What is a data warehouse and why would a business want to implement one?

12.Why would you need to use multidimensional analysis?

13.What is the purpose of information cleansing (or scrubbing)?

14.Why would a department want a data mart instead of just accessing the entire data warehouse?

15.Why would a business be data rich but information poor?

Page 246

CLOSING CASE ONE

Data Visualization: Stories for the Information Age

At the intersection of art and algorithm, data visualization schematically abstracts information to bring about a deeper understanding of the data, wrapping it in an element of awe. Although the practice of visually representing information is arguably the foundation of all design, a newfound fascination with data visualization has been emerging. After The New York Times and The Guardian recently opened their online archives to the public, artists rushed to dissect nearly two centuries’ worth of information, elevating this art form to new prominence.

For artists and designers, data visualization is a new frontier of self-expression, powered by the proliferation of information and the evolution of available tools. For enterprise, it is a platform for displaying products and services in the context of the cultural interaction that surrounds them, reflecting consumers’ increasing demand for corporate transparency.

“Looking at something ordinary in a new way makes it extraordinary,” says Aaron Koblin, one of the more recent pioneers of the discipline. As technology lead of Google’s Creative Labs in San Francisco, he spearheaded the search giant’s Chrome Experiments series designed to show off the speed and reliability of the Chrome browser.

Forget Pie Charts and Bar Graphs

Data visualization has nothing to do with pie charts and bar graphs. And it’s only marginally related to infographics, information design that tends to be about objectivity and clarification. Such representations simply offer another iteration of the data—restating it visually and making it easier to digest. Data visualization, on the other hand, is an interpretation, a different way to look at and think about data that often exposes complex patterns or correlations.

Data visualization is a way to make sense of the ever-increasing stream of information with which we’re bombarded and provides a creative antidote to the analysis paralysis that can result from the burden of processing such a large volume of information. “It’s not about clarifying data,” says Koblin. “It’s about contextualizing it.”

Today algorithmically inspired artists are reimagining the art-science continuum through work that frames the left-brain analysis of data in a right-brain creative story. Some use data visualization as a bridge between alienating information and its emotional impact—see Chris Jordan’s portraits of global mass culture. Others take a more technological angle and focus on cultural utility—the Zoetrope project offers a temporal and historical visualization of the ephemeral web. Still others are pure artistic indulgence—like Koblin’s own Flight Patterns project, a visualization of air traffic over North America.

How Business Can Benefit

There are real implications for business here. Most cell phone providers, for instance, offer a statement of a user’s monthly activity. Most often it’s an overwhelming table of various numerical measures of how much you talked, when, with whom, and how much it cost. A visual representation of this data might help certain patterns emerge, revealing calling habits and perhaps helping users save money.

Companies can also use data visualization to gain new insight into consumer behavior. By observing and understanding what people do with the data—what they find useful and what they dismiss as worthless—executives can make the valuable distinction between what consumers say versus what they do. Even now, this can be a tricky call to make from behind the two-way mirror of a traditional qualitative research setting.

It’s essential to understand the importance of creative vision along with the technical mastery of software. Data visualization isn’t about using all the data available, but about deciding which patterns and elements to focus on, building a narrative, and telling the story of the raw data in a different, compelling way.

Page 247

Ultimately, data visualization is more than complex software or the prettying up of spreadsheets. It’s not innovation for the sake of innovation. It’s about the most ancient of social rituals: storytelling. It’s about telling the story locked in the data differently, more engagingly, in a way that draws us in, makes our eyes open a little wider and our jaw drop ever so slightly. And as we process it, it can sometimes change our perspective altogether.9

Questions

1.Identify the effects poor information might have on a data visualization project.

2.How does data visualization use database technologies?

3.How could a business use data visualization to identify new trends?

4.What is the correlation between data mining and data visualization?

5.Is data visualization a form of business intelligence? Why or why not?

6.What security issues are associated with data visualization?

7.What might happen to a data visualization project if it failed to cleanse or scrub its data?

CLOSING CASE TWO

Zillow

Zillow.com is an online, web-based real estate site helping homeowners, buyers, sellers, renters, real estate agents, mortgage professionals, property owners, and property managers find and share information about real estate and mortgages. Zillow allows users to access, anonymously and free of charge, the kinds of tools and information previously reserved for real estate professionals. Zillow’s databases cover more than 90 million homes, which represents 95 percent of the homes in the United States. Adding to the sheer size of its databases, Zillow recalculates home valuations for each property every day, so it can provide historical graphs on home valuations over time. In some areas, Zillow is able to display 10 years of valuation history, a value-added benefit for many of its customers. This collection of data represents an operational data warehouse for anyone visiting the website.

As soon as Zillow launched its website, it immediately generated a massive amount of traffic. As the company expanded its services, the founders knew the key to its success would be the site’s ability to process and manage massive amounts of dataquickly, in real time. The company identified a need for accessible, scalable, reliable, secure databases that would enable it to continue to increase the capacity of its infrastructure indefinitely without sacrificing performance. Zillow’s traffic continues to grow despite the weakened real estate market; the company is experiencing annual traffic growth of 30 percent, and about a third of all U.S. mortgage professionals visit the site in a given month.

Data Mining and Business Intelligence

Zestimate values on Zillow use data-mining features for spotting trends across property valuations. Data mining also allows the company to see how accurate Zestimate values are over time. Zillow has also built the industry’s first search by monthly payment, allowing users to find homes that are for sale and rent based on a monthly payment they can afford. Along with the monthly payment search, users can also enter search criteria such as the number of bedrooms or bathrooms.

Zillow also launched a new service aimed at changing the way Americans shop for mortgages. Borrowers can use Zillow’s new Mortgage Marketplace to get custom loan quotes from lenders without having to give their names, addresses, phone numbers, or Social Security numbers, or field unwanted telephone calls from brokers competing for their business. Borrowers reveal their identities only after contacting the lender of their choice. The company is entering a field of established mortgage sites such as LendingTree.com and Experian Group’sLowermybills.com, which charge mortgage companies for borrower information. Zillow, which has an advertising model, says it does not plan to charge for leads.

Page 248

For mortgage companies, the anonymous leads come free; they can make a bid based on information provided by the borrower, such as salary, assets, credit score, and the type of loan. Lenders can browse borrower requests and see competing quotes from other brokers before making a bid.10

Questions

1.List the reasons Zillow would need to use a database to run its business.

2.Describe how Zillow uses business intelligence to create a unique product for its customers.

3.How could the marketing department at Zillow use a data mart to help with the release of a new product launch?

4.Categorize the five common characteristics of high-quality information and rank them in order of importance to Zillow.

5.Develop a list of some possible entities and attributes of Zillow’s mortgage database.

6.Assess how Zillow uses a data-driven website to run its business.

CRITICAL BUSINESS THINKING

1.Information–Business Intelligence or a Diversion from the Truth? President Obama used part of his commencement address at Virginia’s Hampton University to criticize the flood of incomplete information or downright incorrect information that flows in the 24-hour news cycle. The president said, “You’re coming of age in a 24/7 media environment that bombards us with all kinds of content and exposes us to all kinds of arguments, some of which don’t always rank all that high on the truth meter. With iPods and iPads and Xboxes and PlayStations—none of which I know how to work—information becomes a distraction, a diversion, a form of entertainment, rather than a tool of empowerment, rather than the means of emancipation.”11     Do you agree or disagree with President Obama’s statement? Who is responsible for verifying the accuracy of online information? What should happen to companies that post inaccurate information? What should happen to individuals who post inaccurate information? What should you remember when reading or citing sources for online information?

2.Illegal Database Access Goldman Sachs has been hit with a $3 million lawsuit by a company that alleges the brokerage firm stole intellectual property from its database that had market intelligence facts. The U.S. District Court for the Southern District of New York filed the lawsuit in 2010 claiming Goldman Sachs employees used other people’s access credentials to log on to Ipreo’s proprietary database, dubbed Bigdough. Offered on a subscription basis, Bigdough provides detailed information on more than 80,000 contacts within the financial industry. Ipreo complained to the court that Goldman Sachs employees illegally accessed Bigdough at least 264 times in 2008 and 2009.12     Do you agree or disagree with the lawsuit? Should Goldman Sachs be held responsible for rogue employees’ behavior? What types of policies should Goldman Sachs implement to ensure that this does not occur again?

3.Data Storage Information is one of the most important assets of any business. Businesses must ensure information accuracy, completeness, consistency, timeliness, and uniqueness. In addition, business must have a reliable backup service. In part thanks to cloud computing, there are many data hosting services on the Internet. These sites offer storage of information that can be accessed from anywhere in the world.     These data hosting services include Hosting (www.hosting.com), Mozy (www.mozy.com), My Docs Online (www.mydocsonline.com), and Box (www.box.net). Visit a few of these sites along with several others you find through research. Which sites are free? Are there limits to how much you can store? If so, what is the limit? What type of information can you store (video, text, photos, etc.)? Can you allow multiple users with different passwords to access your storage area? Are you contractually bound for a certain duration (annual, etc.)? Are different levels of services provided such as personal, enterprise, and work group? Does it make good business sense to store business data on the Internet? What about personal data?

Page 249

4.Gathering Business Intelligence When considering new business opportunities, you need knowledge about the competition. One of the things many new business owners fail to do is gather business intelligence on their competitors, such as how many there are and what differentiates each of them. You may find there are too many and that they would be tough competition for you. Or, you may find that there are few competitors and the ones who are out there offer very little value.     Generate a new business idea you could launch on the Internet. Research the Internet to find similar business in the area you have chosen. How many sites did you find that are offering the same products or services you are planning to offer? Did you come across any sites from another country that have a unique approach that you did not see on any of the sites in your own country? How would you use this information in pursuing your business idea?

5.Free Data! The U.S. Bureau of Labor Statistics states that its role is as the “principal fact-finding agency for the federal government in the broad field of labor economics and statistics.” And the data that the bureau provides via its website are available to anyone, free. This can represent a treasure trove of business intelligence and data mining for those who take advantage of this resource. Visit the website www.bls.gov. What type of information does the site provide? What information do you find most useful? What sort of information concerning employment and wages is available? How is this information categorized? How would this type of information be helpful to a business manager? What type of demographic information is available? How could this benefit a new start-up business? 13

6.Explaining Relational Databases You have been hired by Vision, a start-up clothing company. Your manager, Holly Henningson, is unfamiliar with databases and their associated business value. Henningson has asked you to create a report detailing the basics of databases. She would also like you to provide a detailed explanation of relational databases along with their associated business advantages.

7.Entities and Attributes Martex Inc. is a manufacturer of athletic equipment, and its primary lines of business include running, tennis, golf, swimming, basketball, and aerobics equipment. Martex currently supplies four primary vendors, including Sam’s Sports, Total Effort, The Underline, and Maximum Workout. Martex wants to build a database to help it organize its products. In a group, identify the different types of entities, attributes, keys, and relationships Martex will want to consider when designing its relational database.

8.Compiling Information You are currently working for the Public Transportation Department of Chatfield. The department controls all forms of public transportation, including buses, subways, and trains. Each department has about 300 employees and maintains its own accounting, inventory, purchasing, and human resource systems. Generating reports across departments is a difficult task and usually involves gathering and correlating the information from the many databases. It typically takes about two weeks to generate the quarterly balance sheets and profit and loss statements. Your team has been asked to compile a report recommending what the Public Transportation Department of Chatfield can do to alleviate its information and system issues. Be sure that your report addresses the various reasons departmental reports are presently difficult to obtain as well as how you plan to solve this problem. 14

Page 250

9.Information Timeliness Information timeliness is a major consideration for all organizations. Organizations need to decide the frequency of backups and the frequency of updates to a data warehouse. In a team, describe the timeliness requirements for backups and updates to a data warehouse for each of the following:

Weather tracking systems

Car dealership inventories

Vehicle tire sales forecasts

Interest rates

Restaurant inventories

Grocery store inventories

10.Improving Information Quality HangUps Corporation designs and distributes closet organization structures. The company operates five systems—order entry, sales, inventory management, shipping, and billing. The company has severe information quality issues, including missing, inaccurate, redundant, and incomplete information. The company wants to implement a data warehouse containing information from the five systems to help maintain a single customer view, drive business decisions, and perform multidimensional analysis. Identify how the organization can improve its information quality when it begins designing and building its data warehouse.

ENTREPRENEURIAL CHALLENGE

BUILD YOUR OWN BUSINESS

1.Provide an example of your business data that fits each of the five common characteristics of high-quality information. Explain why each characteristic is important to your business data and what might happen if your business data were of low quality. (Be sure to identify your business and the name of your company.)

2.Identify the different entities and their associated attributes that would be found in your potential relational database model for your sales database.

3.Identify the benefits of having a data warehouse for your business. What types of data marts would you want to extract from your data warehouse to help you run your business and make strategic decisions?

APPLY YOUR KNOWLEDGE BUSINESS PROJECTS

PROJECT I Mining the Data Warehouse

Alana Smith is a senior buyer for a large wholesaler that sells different types of arts and crafts to greeting card stores such as Hallmark. Smith’s latest marketing strategy is to send all of her customers a new line of handmade picture frames from Russia. All of her information supports her decision for the new line. Her analysis predicts that the frames should sell an average of 10 to 15 per store, per day. Smith is excited about the new line and is positive it will be a success.

Page 251

One month later, Smith learns the frames are selling 50 percent below expectations and averaging between five and eight frames sold daily in each store. She decides to access the company’s data warehouse information to determine why sales are below expectations. Identify several dimensions of information that Smith will want to analyze to help her decide what is causing the problems with the picture frame sales.

PROJECT IIDifferent Dimensions

The focus of data warehousing is to extend the transformation of data into information. Data warehouses offer strategic level, external, integrated, and historical information so businesses can make projections, identify trends, and make key business decisions. The data warehouse collects and stores integrated sets of historical information from multiple operational systems and feeds them to one or more data marts. It may also provide end user access to support enterprisewide views of information.

You are currently working on a marketing team for a large corporation that sells jewelry around the world. Your boss has asked you to look at the following dimensions of data to determine which ones you want in your data mart for performing sales and market analysis (seeFigure AYK.1). As a team, categorize the different dimensions, ranking them from 1 to 5, with 1 indicating that the dimension offers the highest value and must be in your data mart and 5 indicating that the dimension offers the lowest value and does not need to be in your data mart.

PROJECT IIIUnderstanding Search

Pretend that you are a search engine. Choose a topic to query. It can be anything such as your favorite book, movie, band, or sports team. Search your topic on Google, pick three or four pages from the results, and print them out. On each printout, find the individual words from your query (such as “Boston Red Sox” or “The Godfather”) and use a highlighter to mark each word with color. Do that for each of the documents that you print out. Now tape those documents on a wall, step back a few feet, and review your documents.

FIGURE AYK.1

Data Warehouse Data

Page 252

If you did not know what the rest of a page said and could judge only by the colored words, which document do you think would be most relevant? Is there anything that would make a document look more relevant? Is it better for the words to be in a large heading or to occur several times in a smaller font? Do you prefer the words to be at the top or the bottom of the page? How often do the words need to appear? Come up with two or three things you would look for to see whether a document matched a query well. This exercise mimics search engine processes and should help you understand why a search engine returns certain results over others.

PROJECT IVPredicting Netflix

Netflix Inc., the largest online movie rental service, provides more than 12 million subscribers access to more than 100,000 unique DVD titles along with a growing on-demand library in excess of 10,000 choices. Data and information are so important to Netflix that it created The Netflix Prize, an open competition for anyone who could improve the data used in prediction ratings for films (an increase of 10 percent), based on previous ratings. The winner would receive a $1 million prize.

The ability to search, analyze, and comprehend information is vital for any organization’s success. It certainly was for Netflix—it was happy to pay anyone $1 million to improve the quality of its information. In a group, explain how Netflix might use databases, data warehouses, and data marts to predict customer movie recommendations. Here are a few characteristics you might want to analyze to get you started:

Customer demographics

Movie genre, rating, year, producer, and type

Actor information

Internet access

Location for mail pickup

PROJECT VThe Crunch Factory

The Crunch Factory is one of the fourth-largest gyms operating in Australia, and each gym operates its own system with its own database. Unfortunately, the company failed to develop any data-capturing standards and now faces the challenges associated with low-quality enterprisewide information. For example, one system has a field to capture email addresses, but another system does not. Duplicate customer information among the different systems is another major issue, and the company continually finds itself sending conflicting or competing messages to customers from different gyms. A customer could also have multiple accounts within the company, one representing a membership, another representing additional classes, and yet another for a personal trainer. The Crunch Factory has no way to identify that the different customer accounts are actually for the same customer.

To remain competitive and be able to generate business intelligence, The Crunch Factory has to resolve these challenges. The Crunch Factory has just hired you as its data quality expert. Your first task is to determine how the company can turn its low-quality information into high-quality business intelligence. Create a plan that The Crunch Factory can implement that details the following:

Challenges associated with low-quality information

Benefits associated with high-quality information

Recommendations on how the company can clean up its data

PROJECT VIToo Much of a Good Thing

The Castle, a premium retailer of clothes and accessories, created an enterprisewide data warehouse so all its employees could access information for decision making. The Castle soon discovered that it is possible to have too much of a good thing. The Castle employees found themselves inundated with data and unable to make any decisions, a common occurrence called analysis paralysis. When sales representatives queried the data warehouse to determine whether a certain product in the size, color, and category was available, they would get hundreds of results showing everything from production orders to supplier contracts. It became easier for the sales representatives to look in the warehouse themselves than to check the system. Employees found the data warehouse was simply too big, too complicated, and contained too much irrelevant information.

Page 253

The Castle is committed to making its data warehouse system a success and has come to you for help. Create a plan that details the value of the data warehouse to the business, how it can be easier for all employees to use, and the potential business benefits the company can derive from its data warehouse.

PROJECT VIITwitter Buzz

Technology tools that can predict sales for the coming week, decide when to increase inventory, and determine when additional staff is required are extremely valuable. Twitter is not just for tweeting your whereabouts anymore. Twitter and other social-media sites have become great tools for gathering business intelligence on customers, including what they like, dislike, need, and want. Twitter is easy to use, and businesses can track every single time a customer makes a statement about a particular product or service. Good businesses turn this valuable information into intelligence spotting trends and patterns in customer opinion.

Do you agree that a business can use Twitter to gain business intelligence? How many companies do you think are aware of Twitter and exactly how they can use it to gain BI? How do you think Twitter uses a data warehouse? How do you think companies store Twitter information? How would a company use Twitter in a data mart? How would a company use cubes to analyze Twitter data?

AYK APPLICATION PROJECTS

If you are looking for Access projects to incorporate into your class, try any of the following after reading this chapter.