company

discussionsandcaseanalysis.zip

Home >Business & Finance homework help >Economics homework help >company

20180610221923module_6_services_to_support_knowledge_sharing_in_complex.pdf

Services to Support Knowledge Sharing in Complex Business Networks, Big Data as the Source

Abdussalam Ali and Igor Hawryszkiewycz University of Technology Sydney, Sydney, Australia [email protected] [email protected] Abstract: Big Data has become a buzzword that refers to the complex and massive data that is either structured or unstructured and not easy to be captured and processed by traditional tools and software applications. The term refers to the large size of data that is created through the activities using of Information and communication technology (ICT). Big Data in our research is the big container to be explored to discover knowledge and information based on the searching context. Big Data may include both explicit and tacit sources of information and knowledge. Businesses should consider all sources in the business environment when discovering and capturing knowledge by accessing this data. Business networks, as a type of social networks, are the mechanism for performing knowledge sharing and transfer for innovation and gaining competitive advantages. The goal of our research is to design and implement a generic model of services that support and coordinate business networks to discover, capture, create and share knowledge. Implementing these services involves considering Big Data as the source of information and knowledge. This model is to be implemented as a platform of services in the cloud. Although the model and services are generic, the platform is to be customisable for business’s special needs. These services will support businesses in terms of creating collaborative environments and adapting to any changes that are happening in the business environment while collaboration is in operation. A prototype of services is to be implemented in the cloud environment. This prototype is to be tested by experts through a case study to measure the success and performance of the model. Keywords: big data, knowledge sharing, business networks, cloud computing

1. Introduction Big Data has emerged through the development of Information and communication technology (ICT) and people using it. The term refers to the large size of data and information around the world because of its fast growth through the utilisation of advanced technology and huge storage systems. Nearly 20 quintillion ( ) bytes of data are produced every day, containing unstructured, semi‐structured and structured content. This content includes text and multimedia such as voice, audio and images (Barnaghi, Sheth & Henson 2013). Organisations have started to explore the huge volume of data. This data is not organised adequately in a database manner (Davenport, Barth & Bean 2012). Knowledge Management (KM) is one of disciplines affected by this emergence. Businesses explore Big Data to capture the information and create knowledge for innovation and competitive advantages. In the last few decades businesses have started to introduce technology to support their knowledge management strategies, and KM systems have been developed accordingly. Technology is considered a success factor for KM by many researches and studies, such as Wong (2005), Moffett et al. (2003) and Davenport et al. (1998). The study of Moffett et al. (2003) is based on a survey of 1000 British companies. They found that most of these companies introduced the technology as a main component to support KM. Wong (2005) study, based on small‐medium enterprises (SMEs), states that pretty‐implemented technology is one of main factors that support the success of KM. Alazmi and Zairi (2003) is a paper based on 15 literatures to study the critical factors affecting KM. Their findings show that components related to the technology factor represent the highest percentage (17%) compared to other components mentioned in the literatures. Although these researches state that technology is a critical factor to make KM successful in the firm, many present the issues of technology related to KM systems. These issues can be summarised as followed:

Technology does not serve more than being a storage and repository of information. That is because the KM systems are designed in the same way and tradition of designing and implementing the information systems ((Currie & Maire 2004), (Nunes et al. 2006) and (Birkinshaw 2001)). In addition, IT systems operate as storages for explicit knowledge more than the tacit type, as the tacit type is difficult to gather ((Nunes et al. 2006) and (Birkinshaw 2001)).

476

Abdussalam Ali and Igor Hawryszkiewycz

This leads to an argument of overlooking the “social interaction” phenomena when introducing IT to support KM. Although social interaction is important for people to exchange information and knowledge, IT is considered to be a replacement for “social interaction” (Birkinshaw 2001). McDermont (1999) reports that knowledge is different from information in many aspects. As a result, the KM systems cannot be designed and implemented based on information systems concepts.

The other limitation that can be mentioned here is that KM systems are not up to date and do not cater for emerging needs ((Fischer & Ostwald 2001) and (Van Zolingen, Streumer & Stooker 2001)). One of these emergences is Big Data as it is the main source of knowledge and knowledge creation, and consists of both sources of knowledge, tacit and explicit.

The aim of our research is to implement a generic model of services that support the social interaction in knowledge sharing and transfer. From a Big Data perspective, these services will support knowledge discovery and capturing. In our model we consider Big Data as the big container that contains all information and knowledge sources. This information and knowledge is in soft or hard form, online or offline and explicit or tacit. Services to be implemented that support businesses to explore and discover knowledge resides within Big Data. Section 2 “Big Data as the Knowledge Source” presents more information about this aspect. The other component to be mentioned here is business networks. Business networks, as a type of social networks, are the mechanism for performing knowledge sharing and transfer for innovation and gaining competitive advantages. The model is supposed to coordinate and manage these networks for creating and sharing the knowledge. In addition, the model (Ali, Hawryszkiewycz & Chen 2014) flexibly provides businesses and business units with the ability to quickly share and analyse knowledge to address emerging business needs in their environment. This is based on the fact that businesses, these days, operate in complex environments. The services need to be generic and reconfigurable as knowledge needs cannot be anticipated in today’s dynamic environment. Hasgall (2012) performed a study based on a questionnaire to understand how social networks are effective in supporting organisations to adapt and respond to changes in their complex environment. The findings lead to a conclusion that social networks support employees by providing them with knowledge. This knowledge can be integrated into the firm and can increase the sensitivity of the workers to the environmental changes. Consequently a flexible approach is needed where knowledge flows and responsibilities can be easily changed without the need to reprogram systems. A typical scenario may be a new partner entering a network, decisions to develop new products and services that require new expertise, or simply improving workflows. Each of these not only brings in new knowledge but often also requires the rearrangement of responsibility for processing the knowledge. Networks also exist within businesses where different business units network to create new products and services for business clients.

2. Big data as the knowledge source Kabir and Carayannis (2013) characterise Big Data as the data which is too large and is not easy to capture and analyse by traditional technology and tools. Authors consider Big Data as a main resource to create knowledge with continuous growth. This is because of many factors, including continuous innovations in IT hardware and software. It is a massive and large lake that can be used as resource to create knowledge (Kabir & Carayannis 2013). Agrawa et al. (2011) present the challenges of Big Data as a source of knowledge to support decision making and innovation. These challenges are:

Heterogeneity which refers to the difference in the data format, and even if the data captured is in the same format, differences will exist in terms of data structure and organisation.

Scale as the main characteristic of Big Data is its big size and volume. Managing these large volumes and its continuous growth is one of the challenges of Big Data.

Timeliness, which means that the speed of data and growing rate increases by introducing the new technology and its continuous development. The challenge here is how to co‐op this increase and growing rates in terms of discovering and capturing the data.

477

Abdussalam Ali and Igor Hawryszkiewycz

Privacy is one of big concerns in the context of Big Data. Dealing with privacy is both a technical and social

issue. Acquiring personal data, for example, will raise many questions regarding privacy and at which level this data can be used and published.

Sorting out how to solve these issues is not in our research scope. However, it may be included in our framework for future development. Big Data in our model is the big container to be explored to discover knowledge and information based on the searching context. Big Data may include both explicit and tacit sources of information and knowledge. Internet, online databases, electronic sheets, electronic documents, hard disk drives and offline printed documents and files all compose the Big Data container. On the other hand, experts, skilled people, consultants, managers, workers and communities of practise members are all examples of tacit sources in the Big Data container. Information in the Big Data container is presented in different formats, including database records, word and pdf documents, video, audio, images, etc. Capturing and analysing knowledge from these format types is done by specialised applications. In our research we may support these types of format in terms of discovering them, indexing them and knowing how to link these sources with the knowledge created based on them. Recommender service may be implemented to support knowledge discovery in the future for other explorers and users.

Kabir and Carayannis (2013) present their “Big Data Strategy Framework” as in Figure 1.

Figure 1: Big data strategy framework (Kabir & Carayannis 2013)

The authors show that infrastructure, team building and knowledge base as being the main components and aspects of their framework. Infrastructure includes technology as one of its subcomponents. Teams should be created based on the business’s objectives. The knowledge that is created by businesses should support innovation and competitive advantage and be considered as new knowledge for future use and share (Kabir & Carayannis 2013). Our model is not based on this framework, but it supports our previous arguments that the other factors such as social factors and the environment should be considered as well. In conclusion, our aim is to design and implement a generic model of services that support and coordinate business networks to discover, capture, create and share knowledge. Implementing these services involves considering Big Data as the source of information and knowledge.

3. The proposed model This paper proposes a model to manage discovering, capturing, organising and sharing knowledge between business networks within a complex environment. The paper sees knowledge sharing as predominantly a socio‐technical issue and Big Data as the source of knowledge. The model provides the flexibility needed in today’s environment. In our model, any business creates its own groups and organisations to gather knowledge and information. The organisation in our model is defined according to Living Systems Theory (LST) as a group of groups that deal with one or more gathering projects ((Ali & Hawryszkiewwycz 2012) and (Miller 1965)). Each group within the organisation processes the knowledge and information. Thus if new groups are created to respond to some event, knowledge must quickly flow to these groups.

478

Abdussalam Ali and Igor Hawryszkiewycz

The strategies toward the model implementation can be described as follows:

3.1 Generic knowledge management functions, elements and activities Boundary roles must often define the knowledge elements to be managed and the assignment of these responsibilities to roles in their business unit. Referring to Ali et al. (2014), knowledge management functions (KMF) have been defined throughout many literatures. These literatures include and Fernandez and Sabherwal (2010) , Awad and Ghaziri (Awad & Ghaziri 2004), Daklir (2011) and the functions can be described as follows:

Discovering: The process of finding where the knowledge resides.

Gathering: Fernandez and Sabherwal (2010) define gathering as the process of obtaining knowledge from the tacit (individuals) and explicit (such as manuals) sources.

Filtering: It is the process of minimising the knowledge and/or information gathered by rejecting the redundancy (Dakilir 2011).

Organising: The process of composing the knowledge so that it can be easily retrieved and used to make decisions (Awad & Ghaziri 2004).

Sharing: It is a way of transferring knowledge between individuals and groups ((Awad & Ghaziri 2004) and (Fernandez & Sabherwal 2010)).

In this paper we have illustrated these functions by joining them to Big Data as shown in Figure 2. Formally, there may be any number of knowledge elements, such as sales, purchases, proposals and so on. So we might define a knowledge element K(sale) or K(purchase). It may be a latest sale or some new idea. Each of these knowledge elements will go through the functions in Figure 2. We use the notation Discover(K(sale)), Gather(K(sale)) and Discover(K(purchase)). We call these knowledge processing activities. Thus any knowledge element goes through all the KMFs. A knowledge processing activity is a knowledge processing function applied to a knowledge element. From Figure 2 the following points can be highlighted:

These functions are not sequential. For example, while organising the user asks for more captured or discovered knowledge.

While a specific function is running the user can acquire Big Data for more knowledge.

Knowledge creation can happen at any stage.

Figure 2: Knowledge management functions

479

Abdussalam Ali and Igor Hawryszkiewycz

3.2 Allocations: Our goal is to develop a framework that provides choices for changing allocations as systems evolve. The goal is to provide the flexibility to reconfigure the requirements as needs change. That provides the ability for networks to share knowledge by assigning responsibilities. The following choices are possible:

Type 1 Allocation (knowledge management function specialists) ‐ Allocate all activities of the same knowledge management function to one group.

Type 2 Allocation (knowledge element specialists) ‐ Allocate all knowledge processing activities on the same knowledge element to one organisation. The organisation then distributes the different knowledge processing activities to different groups.

Type 3 – Each functional unit has its own knowledge processing organisation or group

Type 4 – Totally open (Hybrid )

Figure 3: Type 1 allocation

Allocations are at two levels – allocation of the knowledge activity to the group, followed by the allocation of action tasks to roles in the group. As an example, the model for type 1 allocation is shown in Figure 3. Figure 3 illustrates an organisation of three groups which gathers knowledge by assigning roles to them. There are actors participating in more than one network. For example, the coordinators participate in the organisational network and also in the group network. The model does not at this stage include the agencies used in the exchange of information. The goal is to create these agencies through a cloud platform. The output of the research is to implement a platform based on cloud technology to support these roles within the groups. The major functions of this platform are as following:

Creating and resigning the groups.

Supporting group members to access the platform’s services to finish the role’s tasks.

Enabling knowledge sharing between the groups and organisations.

Supporting collaboration between businesses and enterprises.

4. Model implementation Our goal is to implement the model on the cloud environment to support knowledge sharing across a road community. The implementation will create the services for implementing the relationships between roles within business networks for effective knowledge discovery, gathering and sharing. These services can be categorised as administrative services and processing services. Administrative services are those modules that support the

480

Abdussalam Ali and Igor Hawryszkiewycz

administrative activities. Examples of this are creating groups and organisations, creating roles, assigning/resigning the roles to/from the organisations and groups, and creating users and assigning them to roles.

Figure 4: High level illustration of the model

The processing services are the modules which are accessed by the users within the groups and organisations. These services support the knowledge management functions shown in Figure 2. Our model should provide services for businesses to create their collaborative environments either within the business itself or in collaboration with other businesses. Services are supposed to provide the business with the capability to adapt to the changes that occur in the business itself or in the environment. This adaptation includes managing groups, organisations, roles and users as well as managing the relationships between these sets. Knowledge management functions and activities should be considered in this operation. Our goal is to make these services configurable and that new required services can be implemented upon business request and added to the system at any time. In addition, roles and their responsibilities can be modified as well. Figure 4 illustrates a scenario of gathering knowledge for a “new course project” in a faculty. Big Data is the container of sources, as defined previously that can be explored by the users for knowledge based on the defined knowledge elements. The cloud environment will contain the services that support the knowledge management functions. The collaborative organisation “new course” discovers, captures and organises the knowledge. This knowledge is based on the knowledge elements defined as “subject material” and “market and prices”. Services in the cloud will support businesses to:

Creating organisations and groups (eg.: discovering, capturing and organising).

Creating roles (eg.: coordinator, material‐discoverer and organiser).

Assigning these roles to the groups/organisations.

Assigning users (eg.: John, David, Mac, etc.) to these roles.

Supporting the knowledge management functions and processes.

The services to be implemented are not considered as interfaces between the users and Big Data, as shown in Figure 4. Rather, they support users to discover through Big Data, and create and share knowledge among themselves.

481

Abdussalam Ali and Igor Hawryszkiewycz

4.1 Technology Cloud technology is the infrastructure to be used for implementing our model to gain the advantages of it. That can be achieved by implemented the model as a platform of services delivered as a Software as a Service (SaaS) to the beneficiaries. The following are some advantages that communicate our research (Marston et al. 2011): 1. The low cost of using cloud services. That allows small and medium businesses to benefit from these services. Knowledge management is not focused in small and medium businesses (Pillania 2008) and one of the reasons behind that is the high costs of dedicated knowledge management applications and systems (Nunes et al. 2006). 2. Large capacity. One of the cloud features is to provide big sizes of data storage. In our model, the businesses will not concern about the continuous scalability. This scalability is either in terms of the number of organisations and groups created within the system or the amount of knowledge produced. 3. Mobility. Cloud computing is an online based technology. That allows people to share knowledge and participate in organisations and groups even if they are in different and distant geographic areas. Services to be implemented can be categorised as following:

Management services: these services to support the creation of the objects and components of the model and maintaining the relationships between them. That include; managing and maintaining the groups and organisations, roles, users and knowledge elements.

Processing services: these services support the knowledge management functions performed. These include knowledge discovery, capturing, filtering and organising.

Sharing services: The knowledge created through the knowledge functions is subject to be shared among the users. These services support this sort of process. That will include services which manage requests, responses, broadcastings.

Notification service(s): These services support the communications between the users.

The platform is to be a browser based application. That allows access to the services used by users to create their organisations and groups, and creating and sharing the knowledge. These services allow the users to access and manage SQL tables at the back end. Testing the model will be done by creating different scenarios and to be evaluated by experts. Our test should satisfy that our generic model caters the different collaborative scenarios in terms of knowledge creation and sharing.

5. Summary and future research The paper presents a model for facilitating knowledge management in complex business systems. This paper illustrates the idea of how the model operates at the high level. Also, it gives an idea about the choice of technology to be used for implementation and the reason behind this choice. Semantics of all activities are to be defined. Accordingly, the services are to be defined and designed based on those semantics. Semantics are high level descriptions of how the model operates and how the relations between the different components sets are maintained. These semantics will define the operations that take place in the collaborative environment, including the semantics of coordination, management and KM activities. Figure 5 illustrates how our model is to be developed by time. The figure shows that we start working through business scenarios. We then define the business model and semantics accordingly. These semantics again are applied on the scenarios; evaluating these semantics and making any changes needed to the business model. The implemented services are to be tested through different scenarios and evaluated as well. Changes and modifications take place to the business model, semantics and services until the model the testing criteria is satisfied.

482

Abdussalam Ali and Igor Hawryszkiewycz

There will be continuous evaluations among these three levels until the system reaches the stability and satisfactory. In other words, business model, semantics and technical model are subject to change every time until the system reaches stability. These services are to be implemented in the cloud to take advantage of the cloud computing environment. They will be implemented as a prototype for testing and evaluation.

Figure 5: Model development

References

Agrawal, D., Bernstein, P., Bertino, E., Davidson, S. & Dayal, U. 2011, Challenges and Opportunities with Big Data, Purdue University.

Alazmi, M. & Zairi, M. 2003, 'Knowledge management critical success factors', Total Quality Management & Business Excellence, vol. 14, no. 2, pp. 199‐204.

Ali, A. & Hawryszkiewwycz, I. 2012, 'A Modelling Approach for Knowledge Management in Complex Business Systems', IADIS International Conference WWW/Internet, Madrid.

Ali, A., Hawryszkiewycz, I. & Chen, J. 2014, 'Services for Knowledge Sharing in Dynamic Business Networks', paper presented to the Australasian Software Engineering Conference, Sydney.

Awad, E.M. & Ghaziri, H.M. 2004, 'Working Smarter, Not Harder', in, Knowledge Management, Pearson Education, Inc, New Jersey, pp. 24‐5.

Barnaghi, P., Sheth, A. & Henson, C. 2013, 'From Data to Actionable Knowledge: Big Data Challenges in the Web of Things', IEEE Intelligent Systems, vol. 28, no. 6, pp. 6‐11.

Birkinshaw, J. 2001, 'Why is Knowledge Management So Difficult?', Business Strategy Review, vol. 12, no. 1, pp. 11‐8. Currie, G. & Maire, K. 2004, 'The Limits of a Technological Fix to Knowledge Management: Epistemological, Political and

Cultural Issues in the Case of Intranet Implementation', Management Learning, vol. 35, no. 1, pp. 9‐29. Dakilir, K. 2011, 'The Knowledge Management Cycle', in, Knowledge Management in Theory and Practice, 2nd edn,

Massachusetts Institute of Technology, London, pp. 31‐58. Davenport, T.H., Barth, P. & Bean, R. 2012, 'How 'Big Data' Is Different', Mit Sloan Management Review, vol. 54, no. 1, pp.

42‐47. Davenport, T.H., De Long, D.W. & Beers, M.C. 1998, 'Successful knowledge management projects', Sloan Management

Review, vol. 39, no. 2, pp. 43‐57. Fernandez, I. & Sabherwal, R. 2010, 'Knowledge Management Solutions: Processes and Systems', in, Kowledge

Management, Systems and Processes, M.E. Sharpe, Inc, New York, pp. 56‐70. Fischer, G. & Ostwald, J. 2001, 'Knowledge Management: Problems, Promises, Realities, and Challenges', IEEE Intelligent

Systems, vol. 16, no. 1, pp. 60‐72. Hasgall, A.E. 2012, 'The effectiveness of social networks in complex adaptive working environments', Journal of Systems

and Information Technology, vol. 14, no. 3, pp. 220‐35. Kabir, N. & Carayannis, E. 2013, 'Big Data, Tacit Knowledge and Organizational', Journal of Intelligence Studies in Business,

vol. 3, no. 3, pp. 54‐62. Marston, S., Li, Z., Bandyopadhyay, S., Zhang, J. & Ghalsasi, A. 2011, 'Cloud computing ‐‐ The business perspective', Decision

Support Systems, vol. 51, no. 1, pp. 176‐89.

483

Abdussalam Ali and Igor Hawryszkiewycz

McDermott, R. 1999, 'Why information technology inspired but cannot deliver knowledge management', California

Management Review, vol. 41, no. 4, pp. 103‐17. Miller, J.G. 1965, 'Living systems: Structure and process', Behavioral Science, vol. 10, no. 4, pp. 337‐79. Moffett, S., McAdam, R. & Parkinson, S. 2003, 'Technology and people factors in knowledge management: an empirical

analysis', Total Quality Management & Business Excellence, vol. 14, no. 2, pp. 215‐24. Nunes, M.B., Annansingh, F., Eaglestone, B. & Wakefield, R. 2006, 'Knowledge management issues in knowledge‐intensive

SMEs', Journal of Documentation, vol. 62, no. 1, pp. 101‐19. Pillania, R.K. 2008, 'Strategic issues in knowledge management in small and medium enterprises', Knowledge Management

Research & Practice, vol. 6, no. 4, pp. 334‐8. Van Zolingen, S.J., Streumer, J.N. & Stooker, M. 2001, 'Problems in Knowledge Management: A Case Study of a Knowledge‐

Intensive Company', International Journal of Training and Development, vol. 5, no. 3, pp. 168‐84. Wong, K.Y. 2005, 'Critical success factors for implementing knowledge management in small and medium enterprises',

Industrial Management & Data Systems, vol. 105, no. 3‐4, pp. 261‐79.

484

Copyright of Proceedings of the International Conference on Intellectual Capital, Knowledge Management & Organizational Learning is the property of Academic Conferences & Publishing International Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

20180610221922module_6_in_database_analytics.pdf

17BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

Advances in Predictive Modeling: How In-Database Analytics Will Evolve to Change the Game Sule Balkan and Michael Goul

Abstract Organizations using predictive modeling will benefit from recent efforts in in-database analytics—especially when they become mainstream, and after the advantages evolve over time as adoption of these analytics grows. This article posits that most benefits will remain under-realized until campaigns apply and adapt these enhancements for improved productiv- ity. Campaign managers and analysts will fashion in-database analytics (in conjunction with their database experts) to sup- port their most important and arduous day-to-day activities. In this article, we review issues related to building and deploying analytics with an eye toward how in-database solutions advance the technology. We conclude with a discussion of how analysts will benefit when they take advantage of the tighter coupling of databases and predictive analytics tool suites, particularly in end-to-end campaign management.

Introduction Decoupling data management from applications has provided significant advantages, mostly related to data independence. It is therefore surprising that many vendors are more tightly coupling databases and data warehouses with tool suites that support business intelligence (BI) analysts who construct and manage predictive models. These analysts and their teams construct and deploy models for guiding campaigns in areas such as marketing, fraud detection, and credit scoring, where unknown business patterns and/or inefficiencies can be discovered.

“In-database analytics” includes the embedding of predictive modeling functionalities into databases or data warehouses. It differs from “in-memory analytics,” which is

Sule Balkan is clinical assistant professor at Arizona State University,

department of information systems.

[email protected]

Michael Goul is professor and chair at Arizona State University, department

of information systems.

[email protected]

18 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

designed to minimizing disk access. In-database analytics focuses on the movement of data between the database or data warehouse and analysts’ workbenches. In the simplest form of in-database analytics, the computation of aggregates such as average, variance, and other statisti- cal summaries can be performed by parallel database engines quickly and efficiently—especially in contrast to performing computations inside an analytics tool suite with comparatively slow file management systems. In tightly coupled environments, those aggregates can be passed from the data engine to the predictive modeling tool suite when building analytical models such as statis- tical regression models, decision trees, and even neural networks. In-database analytics also enable streamlining of modeling processes.

The typical modeling processes referred to as CRISP-DM, SEMMA, and KDD contain common BI steps or phases. Knowledge Discovery in Databases (KDD) refers to the broad process of finding knowledge using data mining (DM) methods (Fayyad, Piatetski-Shapiro, Smyth, and Uthurusamy, 1996). KDD relies on using a database along with any required preprocessing, sub-sampling, and transformation of values in that database. Another version of a DM process approach was developed by SAS Institute: Sample, Explore, Modify, Model, Assess (SEMMA) refers to the lifecycle of conducting a DM project.

Another approach, CRISP-DM, was developed by a consortium of Daimler Chrysler, SPSS, and NCR. It stands for CRoss-Industry Standard Process for Data Mining, and its cycle has six stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment (Azavedo and Santos, 2008). All three methodologies address data mining processes. Even though the three methodologies are different, their common objective is to produce BI by guiding the construction of predictive models based on historical data.

A traditional way of discussing methodologies for predic- tive analytics involves a “sense, assess, and respond” cycle that organizations and managers should apply in making effective decisions (Houghton, El Sawy, Gray, Donegan, and Joshi, 2004). Using historical data to enable managers to sense what is happening in the environment has been the

foundation of the recent thrust to vitalize evidence-based management (Pfeffer and Sutton, 2006). Predictive models help managers assess and respond to the environment in ways that are informed by historical data and the patterns within that data. Predictive models help to scale responses because, for example, scoring models can be constructed to enable the embedding of decision rules into business processes. In-database analytics can streamline elements of the “sense, assess, and respond” cycle beyond those steps or phases in KDD, SEMMA, and CRISP-DM.

This article explains how basic in-database analytics will advance predictive modeling processes. However, we argue that the most important advancements will be discovered when actual campaigns are orchestrated and campaign managers access the new, more tightly coupled predictive modeling tool suites and database/data warehouse engines. We assert that the most important practical contribution of in-database analytics will occur when analysts are under pressure to produce models within time-constrained campaigns, and performances from earlier campaign steps need to be incorporated to inform follow-up campaign steps.

The next section discusses current impediments to predic- tive analytics and how in-database analytics will attempt to address them. We also discuss the benefits to be realized after more tightly coupled predictive analytics tool suites and databases/data warehouses become widely available. These benefits will be game-changers and will occur in such areas as end-to-end campaign management.

What is Wrong with Current Predictive Analytics Tool Suites? Current analytics solutions require many steps and take a great deal of time. For analysts who build, maintain, deploy, and track predictive models, the process consists of many distributed processes (distributed among analysts, tool suites, and so on). This section discusses challenges that analysts face when building and deploying predictive models.

Time-Consuming Processes To build a predictive model, an analyst may have to tap into many different data sources. Data sources must con-

PREDICTIV E MODELING

19BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

tain known values for target variables in order to be used when constructing a predictive model. All the attributes that might be independent variables in a model may reside in different tables or even different databases. It takes time and effort to collect and synthesize this data.

Once all of the needed data is merged, each of the inde- pendent variables is evaluated to ascertain the relations, correlations, patterns, and transformations that will be required. However, most of the data is not ready to be analyzed unless it has been appropriately customized. For example, character variables such as gender need to be manipulated, as do numeric variables such as ZIP code. Some continuous variables may need to be converted into scales. After all of this preparation, the modeling process continues through one of the many methodologies such as KDD, CRISP-DM, or SEMMA. For our purposes in this article, we will use SEMMA (see Figure 1).

The first step of SEMMA is data sampling and data partitioning. A random sample is drawn from a popula- tion to prevent bias in the model that will be developed. Then, a modeling data set is partitioned into training and validation data sets. Next is the Explore phase, where each explanatory variable is evaluated and its associations with other variables are analyzed. This is a time-consuming step, especially if the problem at hand requires evaluating many independent variables.

In the Modify phase, variables are transformed; outliers are identified and filtered; and for those variables that are not fully populated, missing value imputation strategies are determined. Rectifying and consolidating different analysts’ perspectives with respect to the Modify phase can be arduous and confusing. In addition, when applying transformations and inserting missing values in large data

sets, a tool suite must apply operations to all observations and then store the resulting transformations within the tool suite’s file management system.

Many techniques can be used in the Model phase of SEMMA, such as regression analysis, decision trees, and neural networks. In constructing models, many tool suites suffer from slow file management systems, which can constrain the number and quality of models that an analyst can realistically construct.

The last phase of SEMMA is the Assess phase, where all models built in the modeling phase are assessed based on validation results. This process is handled within tool suites, and it takes considerable time and many steps to complete.

Multiple Versions and Sources of the Truth Another difficulty in building and maintaining predictive models, especially in terms of campaign management, is the risk that modelers may be basing their analysis on multiple versions and sources of data. That base data is often referred to as the “truth,” and the problem is often referred to as having “multiple versions of the truth.”

To complete the time-consuming tasks of building predictive models as just described, each modeler extracts data from a data warehouse into an analytics workstation. This may create a situation where different modelers are working from different sources of truth, as modelers might extract data snapshots at different times (Gray and Watson, 2007). Also, having multiple modelers working on different predictive models can mean that each modeler is analyzing the data and creating different transformations from the same raw data without adopting a standardized method or a naming convention. This makes deploying

Input data, sampling, data partition

Ranks-plots variable selection

Transform variable, �lter outliers, missing imputation

Regression, tree, neural network

Assessment, score, report

SAMPLE EXPLORE MODIFY MODEL ASSESS

Figure 1. SEMMA methodology supported by SAS Enterprise Mining environment

20 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

multiple models very difficult, as the same raw data may be transformed in different ways using different naming conventions. It also makes transferring or sharing models across different business areas challenging.

Another difficulty relates to the computing resources on each modeler’s workbench when multiple modelers are going through similar, redundant steps of data prepara- tion, transformation, segmentation, scoring, and all the other functions that can take a great deal of disk space and CPU time.

The Challenges of Leveraging Unstructured Data and Web Data Mining in Modeling Environments Modelers often tap into readily available raw data in the database or data warehouse. However, unstructured data is rarely used during these phases because handling data in the form of text, e-mail documents, and images is computationally difficult and time consuming. Convert- ing unstructured data into information is costly in a campaign management environment, so it isn’t often done. The challenges of creating reusable and repeatable variables for deployment make using unstructured data even more difficult.

Web data mining spiders and crawlers are often used to gather unstructured data. Current analyst tool suite processes for unstructured data require that modelers understand archaic processing commands expressed in specialized, non-standard syntax. There are impediments to both gathering and manipulating unstructured data, and there are difficulties in capturing and applying predictive models that deal with unstructured data. For example, clustering models may facilitate identifying rules for detecting what cluster a new document is most closely aligned with. However, exporting that clustering rule from the predictive modeling workbench into a production environment is very difficult.

Managing BI Knowledge Worker Training and Standardization of Processes In most organizations, there is a centralized BI group that builds, maintains, and deploys multiple predictive models for different business units. This creates economies of scale, because having a centralized BI group is definitely more

cost effective than the alternative. However, the economies of scale do not cascade into standardization of processes among analyst teams. Each individual contributor usually ends up with customized versions of codes. Analysts may not be aware of the latest constructs others have advanced.

What Basic Changes Will In-Database Analytics Foster? In-database analytics’ major advantage is the efficiencies it brings to predictive model construction processes due to processing speeds made possible by harnessing parallel database/warehouse engine capabilities. Time savings are generated in the completion of computationally intensive modeling tasks. Faster transformations, missing-value imputations, model building, and assessment operations create opportunities by leaving more time available for fine-tuning model portfolios. Thanks to increasing cooperation between database/warehouse experts and predictive modeling practitioners, issues associated with non-standardized metadata may also be addressed. In addition, there is enhanced support for analyses of very large data sets. This couldn’t come at a better time, because data volumes are always growing.

In-database analytics make it easier to process and use unstructured data by converting complicated statisti- cal processes into manageable queries. Tapping into unstructured data and creating repeatable and reusable information—and combining this into the model-building process—may aid in constructing much better predictive models. For example, moving clustering rules into the database eliminates the difficulty of exporting these rules to and from tool suites. It also eliminates most temporary data storage difficulties for analyst workbenches.

Shared environments created by in-database analytics may bring business units together under common goals. As different business units tap into the same raw data, includ- ing all possible versions of transformations and metadata, productivity can be enhanced. When new ways of building models are available, updates can be made in-database. All individual contributors have access to the latest developments, and no single business unit or individual is left behind. Saving time in the labor-intensive steps of model building, working from a single source of truth,

PREDICTIV E MODELING

21BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

having access to repeatable and reusable structured and unstructured data, and making sure all the business units are working with the same standards and updates—all this makes it easier to transfer knowledge as new analysts join or move across business units. Table 1 summarizes the preliminary benefits of in-database analytics for modelers.

Context for In-Database Analytics Innovation To drive measurable business results from predictive models, SEMMA (or a similar methodology) is followed by a deployment cycle. That cycle may involve the continued application of models in a (recurring) campaign, refine- ment when model performance results are used to revise other models, making decisions on whether completely new models are required given model performance, and so on. We distinguish deployment from the SEMMA-supported phase (intelligence) because deployment often engages the broader organization and requires a predictive model (or models) to be put into actual business use. This section introduces a new methodology we created to describe deployment: “DEEPER” (Design, Embed, Empower, Performance- measurement, Evaluate, and Re-target). Figure 2 depicts the iterative relationship between SEMMA and DEEPER.

The DEEPER phases delineate, in sequential fashion, the types of activities involved in model deployment with a special emphasis on campaign management. The

design phase involves making plans for how to transition a scoring model (or models) from the tool suite (where it was developed) to actual application in a business context. It also involves thinking about how to capture the results of applying the model and storing those results for subsequent analysis. There may also be other data that a campaign manager wishes to capture, such as the time taken before seeing a response from a target. A proper design can elimi- nate missteps in a campaign. For example, if a targeted catalog mailing is enabled by a scoring model developed using SEMMA, then users must choose which deciles to target first, how to capture the results of the campaign (e.g., actual purchases or requests for new service), and what new data might be appropriate to capture during the campaign.

SEMMA

INTELLIGENCE DEPLOYMENT

DEEPER

EVALUATE PER FO

R M

A N

C E E

M POW

ER EMBED

D ES

IG N

E TA

RG ET

M EA

S U

R E

SAMPLE

E X P LO

R E M

O DI

MODEL ASSESS

Figure 2. DEEPER phases guide the deployment, adoption, evaluation, and recalibration of predictive models.

Table 1. Preliminary benefits of in-database analytics

Process Benefits

Data set creation and preparation Reduce cycle time by parallel-processing multiple functions; accurate andtimely completion of tasks by functional embedding

Data processing and model buildingby multiple analysts Eliminate multiple versions of truth and large data set movements to andfrom analytical tool suites

Unstructured data management Broaden analytics capability by streamlining repeatability and reusability

Training and standardization Create operational and analytical efficiencies; access to latest developments; automatically update metadata

22 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

Once designed, the model must be accurately embedded into business processes. Model score views must be secured; developers must ensure scores appear in user interfaces at the right time; and process managers must be able to insert scores into automated business process logic. Embedding a predictive model may require safeguards to exceptions. If there are exceptions to applications of a model, other safeguards need to be considered.

Making the results of a predictive model (e.g., a score) available to people and systems is just the first step in ensuring it is used. In the empower phase, employees may need to be trained to interpret model results; they may have to learn to look at data in a certain way using new interfaces; or they may need to learn the benefits of evidence-based management approaches as supported by predictive modeling. Similarly, if people are involved, test- ing may be required to ensure that training approaches are working as intended. The empower step ensures appropriate behaviors by both systems and people as they pertain to the embedding of the predictive model into business processes.

A campaign begins in earnest after the empower phase. Targets receive their model-prescribed treatments, and reactions are collected as planned for in the design phase of DEEPER. This reactions-directed phase, performance measurement, involves ensuring the reactions and events subsequent to a predictive model’s application are captured and stored for later analysis. The results may also be captured and made available in real-time support for campaign managers. Dashboards may be appropriate for monitoring campaign progress, and alerts may support managers in making corrections should a campaign stray from an intended path. If there is an anomaly, or when a campaign has reached a checkpoint, campaign managers take time to evaluate the effectiveness or current progress of the campaign. The objective is to address questions such as:

■■ Are error levels acceptable?

■■ Were campaign results worth the investment in the predictive analytics solution?

■■ How is actual behavior different from predicted behavior for a model or a model decile?

This is the phase when the campaign’s effectiveness and current progress are assessed.

The results of the evaluate phase of DEEPER may lead to a completely new modeling effort. This is depicted in Figure 3 by the gray background arrow leading from evaluate to the sample phase of SEMMA. This implies a transition from deployment back to what we have referred to as intelligence. However, there is not always time to return to the intelligence cycle, and minor alterations to a model might be deemed more appropriate than starting over. The latter decision is most prevalent in time-pressured, recur- ring campaigns. We refer to this phase as re-target, which requires analysts to take into account new information gathered as part of the performance management deploy- ment phase. It also takes advantage of the plans for how this response information was encoded per the design phase of deployment.

The most important consideration involves interpreting results from the campaign and managing non-performing targets. A non-performing target is one that scored high in a predictive model, for example, but that did not respond as predicted. In a recurring campaign, there may be an effort to re-target that subset. There could also be an effort to re-target the campaign to another set of targets, e.g., those initially scored into other deciles. Re-targeting can be a time-consuming process; new data sets with response results need to be made available to predictive modeling tool suites, and findings from tracking need to be incorpo- rated into decisions.

DEEPER provides the context for considering how improvements to in-database analytics can be game-chang- ers. In-database analytics can make significant inroads to DEEPER processes that take time and are under-supported by predictive modeling tool suites. However, these improve- ments will be driven by analysts who work closely with their organizations’ database experts. This combination of analyst and data management skills, experience, and knowledge will spur innovation significantly beyond current expectations.

PREDICTIV E MODELING

23BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

How Might In-Database Analytics for DEEPER Evolve? Extending in-database analytics to DEEPER processes requires considering how each DEEPER phase might be streamlined given tighter coupling between predictive modeling tool suites and databases/data warehouses. Although many of the advantages of this tighter coupling may be realized differently by different organizations, there are generic value streams to guide efforts. Here the phrase “value stream” refers to process flows central to DEEPER. This section discusses these generic value streams: (1) intelligence-to-plan, (2) plan-to-implementation, (3) implementation-to-use, (4) use-to-results, (5) results-to- evaluation, and (6) evaluation-to-decision.

In the design phase of DEEPER, planning can be facili- tated by examining possible end-user database views that could be augmented with predictive intelligence. Instead of creating new interfaces, it is possible that Web pages equipped with embedded database queries can quickly retrieve and display predictive model scores to decision makers or front-line employees. Many of these displays are already incorporated into business processes, so opportuni- ties to use the tables and queries to supply model results can streamline implementation. When additional data items need to be captured, that data may be captured at the point of sale or other customer touch points. A review of current metadata may speed up the design of a suitable deployment strategy. In addition to “pushing” model intelligence to interfaces, there may also be ways of “pulling” data from the database/warehouse to facilitate re-targeting or for initiating new SEMMA cycles.

For example, it may be possible to design queries to automate the retrieval of data items such as target response times from operational data stores. Similarly, it may be possible to use SQL to aggregate the information needed for this type of next-step analysis. For example, total sales to a customer within a specified time period can be aggregated using a query and then used in the re-targeting phase to reflect whether a target performed as predicted. In-database analytics can support the design phase because it eliminates many of the traditional bottlenecks such as complex requirements gathering and the creation of formal specification documents (including use cases). Instead,

existing use cases can be reviewed and augmented, and database/warehouse–supported metadata facilities can support the design of schema for capturing new target response data. We refer to this as an intelligence-to-plan value stream for the in-database analytics supported design deployment phase.

In the embed phase, transferring scored model results to tables is a first step in considering ways to make use of database/warehouse capabilities to support DEEPER. Once the scores are appropriately stored in tables, there are many opportunities to use queries to embed the scores into people-supported and automated business processes. For example, coding to retrieve scores for inclusion in front-line employee interfaces can be done in a manner consistent with other embedded SQL applications. This saves time in training interface developers because it implies that the same personnel who implemented the interfaces can effectively alter them to include new intelligence.

There is also no need for additional project governance functions or specialized software. In fact, database/ warehouse triggers and alerts can be used to ensure that predictive analytics are used only when model deployment assumptions are relevant. As the database/warehouse is the same place where analytic model results reside, there are numerous implementation advantages. We refer to this as a plan-to-implementation value stream for the in-database analytics supported embed deployment phase.

After implementation, testing will ensure that model results/scores are understandable to decision makers (the empower phase) and that their performance can scale when production systems are at high capacity. Such stress tests can be conducted in a manner similar to database view tests. Because of the inherent speed of database/ warehouse systems, their performance will likely exceed separate, isolated workbench performance. Global roll-out can be eased by tried-and-true database/warehouse roll-out processes. We refer to this as an implementation-to-use value stream for the in-database analytics supported empower deployment phase.

Similarly, the use-to-results value stream is that part of a campaign when actions are taken and targets respond.

24 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

In this performance management phase of deployment, dashboards can be used to track performance, database tables can automatically collect and store ongoing campaign results, queries can aggregate responses over time as part of automating responses, and many other in-database solutions can help to streamline related processes. This information is central to the evaluate phase, where the results-to-evaluation value stream can enable careful scrutiny of the predictive analytics model portfolio. Queries can be written to compare actual results to those predicted during SEMMA phases. When more than one model has been constructed in the SEMMA processes, all can be re-examined in light of the new information about responses. If-then statements can be embedded in queries to identify target segments that have responded according to business goals, and remaining non-responders can be quickly identified.

Such analysis can be done for each analytical model in the portfolio and for each decile of predicted respondents associated with those models. This has been an enormously time-consuming process in the past, but the database/ware- house query engine can conduct this type of post-analysis efficiently. Queries can also identify subsets of respondents that outperformed the predicted model performance—and those that significantly under-performed. This type of analysis can be quickly supported through queries, and it can provide significant insight for the re-target phase.

Following the results-to-evaluation value stream of the deployment cycle, the evaluation-to-decision value stream focuses on whether a new intelligence cycle (a repeat of SEMMA processes) is required. If performance results indicate major model failures, then a repeat is likely necessary to resurrect and continue a campaign. Even if there weren’t major failures, environmental changes such as economic conditions may have rendered models outdated. Data collected in the performance evaluation phase may help to streamline the decision process. If costs aren’t being recovered, then it is likely that either the campaign will cease or a new intelligence cycle is necessary.

Often a portfolio of models is created in the initial intel- ligence cycle. It may be possible to use queries to automate the process of recalculating the prior and anticipated

performance of the models in the portfolio. If models exist that were not used but appear to perform better, those models may be used in the next DEEPER cycle. Alterna- tively, a combination or pooling of models might be most appropriate. Again, automated queries might be able to provide decision support for such pooling options, and they can aid in scheduling the appropriate model for the data sets as the DEEPER cycle progresses. In addition, it may be possible to use queries to apply business rules to manage data sets, and prior results could inform the scheduling of resting periods for targets such that each target isn’t inundated with catalog mailings, for example.

Table 2 summarizes key generic value streams that can be supported by in-database analytics and briefly describes the possibilities discussed in this section. Opportunities to evolve in-database analytics are likely to be numerous.

Conclusion In-database analytics create an environment where functions are embedded and processed in parallel, thereby streamlining the steps of both intelligence (e.g., SEMMA) and deployment (e.g., DEEPER) cycles. As data sources are updated, attribute names and formats may change, yet they are sharable. In-database analytics can support quality checks and create warning messages if the range, format, and/or type of data differ from a previous version or model assumptions. If external data has attributes that were not in the data dictionary, metadata can be updated automatically. Data conversions can be handled in-database and only once instead of being repeated by multiple modelers. In-database analytics fosters stability, enhances efficiency, and improves productivity across business units.

In-database analytics will be critical to a company’s bottom line when models are deployed and there is time pressure for multiple, successive campaigns where ongoing results can be used to build updated, improved predictive models. Enhancements can be realized in a host of value streams. For example, in-database analytics can significantly reduce cycle times for rebuilding and redeploying updated models to meet campaign dead- lines. As multiple models are constructed, in-database analytics will enable managing them as a portfolio. Timely responses, tracking, and fast interpretation of

25BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

early responders to campaigns will enable companies to fine-tune business rules and react in record time.

As the fine line between intelligence and deployment cycles fades because of the fast-paced environment supported by in-database analytics, businesses may move away from the concept of campaign management into trigger- based, “lights-out” processing, where all data feeds are automatically updated and processed, and there is no need to compile data into periodic campaigns. There will be real- time decision making with instant scoring each time there is an update in one of the important independent variables. Analysts will spend their time fine-tuning model perfor- mance, building business rules, analyzing early results, monitoring data movements, and optimizing the use of multiple models—instead of dealing with the manual tasks of data preparation, data cleansing, and managing file movements and basic statistical processes that have been moved into the database/warehouse.

Although lights-out processing is not on the near-term horizon, the evolution of in-database analytics promises to move organizations in that direction. Once in the hands of analysts and their database/warehouse teams, in-database analytics will be a game-changer. ■

References Azevedo, Ana, and Manuel Felipe Santos [2008]. “KDD,

SEMMA AND CRISP-DM: A Parallel Overview.” IADIS European Conference Data Mining, pp. 182–185.

Fayyad, U. M., Gregory Piatetski-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy [1996]. Advances in Knowledge Discovery and Data Mining, AAAI Press/The MIT Press.

Gray, Paul, and Hugh J. Watson, Hugh [2007]. “What Is New in BI,” Business Intelligence Journal, Vol. 12, No. 1.

Houghton, Bob, Omar A. El Sawy, Paul Gray, Craig Donegan, and Ashish Joshi [2004]. “Vigilant Information Systems for Managing Enterprises in Dynamic Supply Chains: Real-Time Dashboards at Western Digital,” MIS Quarterly Executive, Vol. 3, No. 1.

Pfeffer, Jeffrey, and Robert I. Sutton [2006]. “Evidence Based Management,” Harvard Business Review, January.

Table 2. Generic value streams and areas for innovation with in-database analytics

Intelligence-to-plan Planning is streamlined; push and pull strategies are feasible; schema design can support planning

Plan-to-implementation Scores maintained in-database; embedded SQL in HTML can facilitate view deployment; triggers and alerts can be used to guard for exceptions

Implementation-to-use Stress testing and global rollout follow database/warehouse methodologies and rely on common human and physical resources

Use-to-results Dashboards can be readily adapted; database/warehouse tables can be used as response aggregators

Results-to-evaluation Re-examine all created models efficiently in light of response information; embed if-then logic to re-target non- responders

Evaluation-to-decision Consider applying different models; allow targeted respondents to “rest”; use database to provide decision support for deciding to re-target or re-enter the intelligence cycle

not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written

permission. However, users may print, download, or email articles for individual use.

20180610221658module_6_services_to_support_knowledge_sharing_in_complex.pdf