discussionsandcaseanalysis.zip

20180610221923module_6_services_to_support_knowledge_sharing_in_complex.pdf

Services to Support Knowledge Sharing in Complex Business  Networks, Big Data as the Source 

Abdussalam Ali and Igor Hawryszkiewycz  University of Technology Sydney, Sydney, Australia  [email protected]   [email protected]     Abstract:  Big  Data  has  become  a  buzzword  that  refers  to  the  complex  and  massive  data  that  is  either  structured  or  unstructured and not easy to be captured and processed by traditional tools and software applications. The term refers to  the large size of data that is created through the activities using of Information and communication technology (ICT). Big  Data in our research is the big container to be explored to discover knowledge and information based on the searching  context. Big Data may include both explicit and tacit sources of information and knowledge. Businesses should consider all  sources in the business environment when discovering and capturing knowledge by accessing this data. Business networks,  as a type of social networks, are the mechanism for performing knowledge sharing and transfer for innovation and gaining  competitive advantages. The goal of our research is to design and implement a generic model of services that support and  coordinate  business  networks  to  discover,  capture,  create  and  share  knowledge.  Implementing  these  services  involves  considering  Big  Data  as  the  source  of  information  and  knowledge.  This  model  is  to  be  implemented  as  a  platform  of  services in the cloud. Although the model and services are generic, the platform is to be customisable for business’s special  needs. These services will support businesses in terms of creating collaborative environments and adapting to any changes  that  are  happening  in  the  business  environment  while  collaboration  is  in  operation.  A  prototype  of  services  is  to  be  implemented in the cloud environment. This prototype is to be tested by experts through a case study to measure the  success and performance of the model.    Keywords: big data, knowledge sharing, business networks, cloud computing 

1. Introduction  Big  Data  has  emerged  through  the  development  of  Information  and  communication  technology  (ICT)  and  people using it. The term refers to the large size of data and information around the world because of its fast  growth through the utilisation of advanced technology and huge storage systems.    Nearly 20 quintillion ( ) bytes of data are produced every day, containing unstructured, semi‐structured  and structured content. This content includes text and multimedia such as voice, audio and images (Barnaghi,  Sheth  &  Henson  2013).    Organisations  have  started  to  explore  the  huge  volume  of  data.  This  data  is  not  organised adequately in a database manner (Davenport, Barth & Bean 2012). Knowledge Management (KM) is  one of disciplines affected by this emergence. Businesses explore Big Data to capture the  information and  create knowledge for innovation and competitive advantages.     In  the  last  few  decades  businesses  have  started  to  introduce  technology  to  support  their  knowledge  management  strategies,  and  KM  systems  have  been  developed  accordingly.  Technology  is  considered  a  success  factor  for  KM  by  many  researches  and  studies,  such  as  Wong    (2005),  Moffett  et  al.  (2003)  and  Davenport et al. (1998). The study of Moffett et al. (2003) is based on a survey of 1000 British companies. They  found  that most of these companies introduced the technology as a main component to support KM. Wong  (2005) study, based on small‐medium enterprises (SMEs),  states that pretty‐implemented technology is one of  main factors that support the success of KM. Alazmi and Zairi (2003) is a paper based on 15 literatures to study  the  critical  factors  affecting  KM.  Their  findings  show  that  components  related  to  the  technology  factor  represent the highest percentage (17%) compared to other components mentioned in the literatures.     Although these researches state that technology is a critical factor to make KM successful in the firm, many  present the issues of technology related to KM systems. These issues can be summarised as followed: 

Technology does not serve more than being a storage and repository of information. That is because the  KM systems are designed in the same way and tradition of designing and implementing the information  systems  ((Currie  &  Maire  2004),  (Nunes  et  al.  2006)  and  (Birkinshaw  2001)).  In  addition,  IT  systems  operate as storages for explicit knowledge more than the tacit type, as the tacit type is difficult to gather  ((Nunes et al. 2006) and (Birkinshaw 2001)). 

476

  Abdussalam Ali and Igor Hawryszkiewycz 

This  leads  to  an  argument  of  overlooking  the  “social  interaction”  phenomena  when  introducing  IT  to  support KM. Although social interaction is important for people to exchange information and knowledge,   IT is considered to be a replacement for “social interaction” (Birkinshaw 2001). McDermont (1999) reports  that  knowledge  is  different  from  information  in  many  aspects.  As  a  result,  the  KM  systems  cannot  be  designed and implemented based on information systems concepts. 

The other limitation that can be mentioned here is that KM systems are not up to date and do not cater  for emerging needs ((Fischer & Ostwald 2001) and (Van Zolingen, Streumer & Stooker 2001)). One of these  emergences is Big Data as it is the main source of knowledge and knowledge creation, and consists of both  sources of knowledge, tacit and explicit. 

The aim of our research  is to  implement a generic model of services that support the social  interaction  in  knowledge sharing and transfer. From a Big Data perspective, these services will support knowledge discovery  and  capturing.  In  our  model  we  consider  Big  Data  as  the  big  container  that  contains  all  information  and  knowledge sources. This information and knowledge is in soft or hard form, online or offline and explicit or  tacit. Services to be implemented that support businesses to explore and discover knowledge resides within  Big Data. Section 2 “Big Data as the Knowledge Source” presents more information about this aspect.     The  other  component  to  be  mentioned  here  is  business  networks.  Business  networks,  as  a  type  of  social  networks,  are  the  mechanism  for  performing  knowledge  sharing  and  transfer  for  innovation  and  gaining  competitive advantages. The model is supposed to coordinate and manage these networks for creating and  sharing the knowledge. In addition, the model (Ali, Hawryszkiewycz & Chen 2014) flexibly provides businesses  and business units with the ability to quickly share and analyse knowledge to address emerging business needs  in their environment. This is based on the fact that businesses, these days, operate in complex environments.  The  services  need  to  be  generic  and  reconfigurable  as  knowledge  needs  cannot  be  anticipated  in  today’s  dynamic environment. Hasgall  (2012) performed a study based on a questionnaire to understand how social  networks  are  effective  in  supporting  organisations  to  adapt  and  respond  to  changes  in  their  complex  environment. The findings lead to a conclusion that social networks support employees by providing them with  knowledge. This knowledge can be integrated into the firm and can increase the sensitivity of the workers to  the environmental changes.    Consequently a flexible approach is needed where knowledge flows and responsibilities can be easily changed  without  the  need  to  reprogram  systems.  A  typical  scenario  may  be  a  new  partner  entering  a  network,  decisions to develop new products and services that require new expertise, or simply improving workflows.  Each of these not only brings in new knowledge but often also requires the rearrangement of responsibility for  processing the knowledge. Networks also exist within businesses where different business units network to  create new products and services for business clients. 

2. Big data as the knowledge source  Kabir and Carayannis (2013) characterise Big Data as the data which is too large and is not easy to capture and  analyse by traditional technology and tools. Authors consider Big Data as a main resource to create knowledge  with continuous growth. This is because of many factors, including continuous innovations in IT hardware and  software. It is a massive and large lake that can be used as resource to create knowledge (Kabir & Carayannis  2013).     Agrawa et al. (2011) present the challenges of Big Data as a source of knowledge to support decision making  and innovation. These challenges are: 

Heterogeneity which refers to the difference in the data format, and even if the data captured is in the  same format, differences will exist in terms of data structure and organisation. 

Scale as the main characteristic of Big Data is its big size and volume. Managing these large volumes and  its continuous growth is one of the challenges of Big Data. 

Timeliness,  which  means  that  the  speed  of  data  and  growing  rate  increases  by  introducing  the  new  technology and its continuous development. The challenge here is how to co‐op this increase and growing  rates in terms of discovering and capturing the data. 

477

  Abdussalam Ali and Igor Hawryszkiewycz 

  Privacy is one of big concerns in the context of Big Data. Dealing with privacy is both a technical and social 

issue. Acquiring personal data, for example, will raise many questions regarding privacy and at which level  this data can be used and published. 

Sorting  out  how  to  solve  these  issues  is  not  in  our  research  scope.  However,  it  may  be  included  in  our  framework  for  future  development.  Big  Data  in  our  model  is  the  big  container  to  be  explored  to  discover  knowledge  and  information  based  on  the  searching  context.  Big  Data  may  include  both  explicit  and  tacit  sources of  information and knowledge.  Internet, online databases, electronic sheets, electronic documents,  hard disk drives and offline printed documents and files all compose the Big Data container. On the other  hand, experts, skilled people, consultants, managers, workers and communities of practise members are all  examples  of  tacit  sources  in  the  Big  Data  container.  Information  in  the  Big  Data  container  is  presented  in  different formats, including database records, word and pdf documents, video, audio, images, etc. Capturing  and analysing knowledge from these format types is done by specialised applications. In our research we may  support these types of format  in terms of discovering them,  indexing them and knowing how to  link these  sources with the knowledge created based on them. Recommender service may be implemented to support  knowledge discovery in the future for other explorers and users.   

Kabir and Carayannis (2013) present their “Big Data Strategy Framework” as in Figure 1. 

 

Figure 1: Big data strategy framework (Kabir & Carayannis 2013) 

The authors show that infrastructure, team building and knowledge base as being the main components and  aspects of their framework. Infrastructure includes technology as one of its subcomponents. Teams should be  created  based  on  the  business’s  objectives.  The  knowledge  that  is  created  by  businesses  should  support  innovation and competitive advantage and be considered as new knowledge for future use and share (Kabir &  Carayannis 2013).    Our model is not based on this framework, but it supports our previous arguments that the other factors such  as social factors and the environment should be considered as well.    In conclusion, our aim  is to design and  implement a generic model of services that support and coordinate  business networks to discover, capture, create and share knowledge.  Implementing these services  involves  considering Big Data as the source of information and knowledge. 

3. The proposed model  This paper proposes a model to manage discovering, capturing, organising and sharing knowledge between  business  networks  within  a  complex  environment.  The  paper  sees  knowledge  sharing  as  predominantly  a  socio‐technical issue and Big Data as the source of knowledge.    The model provides the flexibility needed in today’s environment. In our model, any business creates its own  groups  and  organisations  to  gather  knowledge  and  information.  The  organisation  in  our  model  is  defined  according to Living Systems Theory (LST) as a group of groups that deal with one or more gathering projects  ((Ali  &  Hawryszkiewwycz  2012)  and  (Miller  1965)).  Each  group  within  the  organisation  processes  the  knowledge  and  information.  Thus  if  new  groups  are  created  to  respond  to  some  event,  knowledge  must  quickly flow to these groups. 

478

  Abdussalam Ali and Igor Hawryszkiewycz 

The strategies toward the model implementation can be described as follows: 

3.1 Generic knowledge management functions, elements and activities  Boundary  roles  must  often  define  the  knowledge  elements  to  be  managed  and  the  assignment  of  these  responsibilities to roles in their business unit. Referring to Ali et al. (2014), knowledge management functions  (KMF) have been defined throughout many literatures.    These  literatures  include  and  Fernandez  and  Sabherwal  (2010)  ,  Awad  and  Ghaziri  (Awad  &  Ghaziri  2004),  Daklir (2011) and the functions can be described as follows:  

Discovering: The process of finding where the knowledge resides.  

Gathering: Fernandez and Sabherwal (2010) define gathering as the process of obtaining knowledge from  the tacit (individuals) and explicit (such as manuals) sources.  

Filtering:  It  is  the  process  of  minimising  the  knowledge  and/or  information  gathered  by  rejecting  the  redundancy (Dakilir 2011).  

Organising: The process of composing the knowledge so that it can be easily retrieved and used to make  decisions (Awad & Ghaziri 2004).  

Sharing: It is a way of transferring knowledge between individuals and groups ((Awad & Ghaziri 2004) and   (Fernandez & Sabherwal 2010)). 

In this paper we have illustrated these functions by joining them to Big Data as shown in Figure 2.    Formally, there may be any number of knowledge elements, such as sales, purchases, proposals and so on. So  we might define a knowledge element K(sale) or K(purchase). It may be a latest sale or some new idea. Each of  these knowledge elements will go through the functions  in Figure 2. We use the notation Discover(K(sale)),  Gather(K(sale)) and Discover(K(purchase)). We call these knowledge processing activities. Thus any knowledge  element goes through all the KMFs.    A knowledge processing activity is a knowledge processing function applied to a knowledge element.    From Figure 2 the following points can be highlighted: 

These  functions  are  not  sequential.  For  example,  while  organising  the  user  asks  for  more  captured  or  discovered knowledge. 

While a specific function is running the user can acquire Big Data for more knowledge. 

Knowledge creation can happen at any stage. 

 

Figure 2: Knowledge management functions 

479

  Abdussalam Ali and Igor Hawryszkiewycz 

  3.2 Allocations:  Our goal is to develop a framework that provides choices for changing allocations as systems evolve. The goal  is  to  provide  the  flexibility  to  reconfigure  the  requirements  as  needs  change.  That  provides  the  ability  for  networks to share knowledge by assigning responsibilities. The following choices are possible: 

Type  1  Allocation  (knowledge  management  function  specialists)  ‐  Allocate  all  activities  of  the  same  knowledge management function to one group. 

Type 2  Allocation  (knowledge  element  specialists)  ‐  Allocate  all  knowledge processing  activities  on  the  same knowledge element to one organisation. The organisation then distributes the different knowledge  processing activities to different groups. 

Type 3 – Each functional unit has its own knowledge processing organisation or group 

Type 4 – Totally open (Hybrid ) 

 

Figure 3: Type 1 allocation 

Allocations are at two levels – allocation of the knowledge activity to the group, followed by the allocation of  action tasks to roles in the group.     As an example, the model for type 1 allocation is shown in Figure 3. Figure 3 illustrates an organisation of three  groups which gathers knowledge by assigning roles to them.     There  are  actors  participating  in  more  than  one  network.  For  example,  the  coordinators  participate  in  the  organisational network and also in the group network.    The model does not at this stage  include the agencies used  in the exchange of  information. The goal  is to  create these agencies through a cloud platform.    The output of the research is to implement a platform based on cloud technology to support these roles within  the groups. The major functions of this platform are as following: 

Creating and resigning the groups. 

Supporting group members to access the platform’s services to finish the role’s tasks.  

Enabling knowledge sharing between the groups and organisations. 

Supporting collaboration between businesses and enterprises. 

4. Model implementation  Our goal is to implement the model on the cloud environment to support knowledge sharing across a road  community.    The implementation will create the services for implementing the relationships between roles within business  networks  for  effective  knowledge  discovery,  gathering  and  sharing.  These  services  can  be  categorised  as  administrative services and processing services. Administrative services are those modules that support the 

480

  Abdussalam Ali and Igor Hawryszkiewycz 

administrative  activities.  Examples  of  this  are  creating  groups  and  organisations,  creating  roles,  assigning/resigning the roles to/from the organisations and groups, and creating users and assigning them to  roles.  

 

Figure 4: High level illustration of the model 

The processing services are the modules which are accessed by the users within the groups and organisations.  These services support the knowledge management functions shown in Figure 2. Our model should provide  services  for  businesses  to  create  their  collaborative  environments  either  within  the  business  itself  or  in  collaboration with other businesses. Services are supposed to provide the business with the capability to adapt  to  the  changes  that  occur  in  the  business  itself  or  in  the  environment.  This  adaptation  includes  managing  groups, organisations, roles and users as well as managing the relationships between these sets. Knowledge  management functions and activities should be considered in this operation.     Our goal  is to make these services configurable and that new required services can be  implemented upon  business  request  and  added  to  the  system  at  any  time.  In  addition,  roles  and  their  responsibilities  can  be  modified as well.    Figure 4 illustrates a scenario of gathering knowledge for a “new course project” in a faculty.     Big Data is the container of sources, as defined previously that can be explored by the users for knowledge  based on the defined knowledge elements. The cloud environment will contain the services that support the  knowledge  management  functions.  The  collaborative  organisation  “new  course”  discovers,  captures  and  organises the knowledge. This knowledge is based on the knowledge elements defined as “subject material”  and “market and prices”. Services in the cloud will support businesses to: 

Creating organisations and groups (eg.: discovering, capturing and organising). 

Creating roles (eg.: coordinator, material‐discoverer and organiser). 

Assigning these roles to the groups/organisations. 

Assigning users (eg.: John, David, Mac, etc.) to these roles. 

Supporting the knowledge management functions and processes. 

The services to be implemented are not considered as interfaces between the users and Big Data, as shown in  Figure 4.  Rather, they support users to discover through Big Data, and create and share knowledge among  themselves. 

481

  Abdussalam Ali and Igor Hawryszkiewycz 

  4.1 Technology  Cloud technology  is the  infrastructure to be used for  implementing our model to gain the advantages of  it.  That can be achieved by implemented the model as a platform of services delivered as a Software as a Service  (SaaS) to the beneficiaries.  The following are some advantages that communicate our research (Marston et al.  2011):    1.  The  low  cost  of  using  cloud  services.  That  allows  small  and  medium  businesses  to  benefit  from  these  services. Knowledge management is not focused in small and medium businesses (Pillania 2008) and one of  the  reasons  behind  that  is  the  high  costs  of  dedicated  knowledge  management  applications  and  systems  (Nunes et al. 2006).  2. Large capacity. One of the cloud features is to provide big sizes of data storage. In our model, the businesses  will  not  concern  about  the  continuous  scalability.  This  scalability  is  either  in  terms  of  the  number  of  organisations and groups created within the system or the amount of knowledge produced.  3.  Mobility.  Cloud  computing  is  an  online  based  technology.  That  allows  people  to  share  knowledge  and  participate in organisations and groups even if they are in different and distant geographic areas.    Services to be implemented can be categorised as following: 

Management services: these services to support the creation of the objects and components of the model  and maintaining the relationships between them. That include; managing and maintaining the groups and  organisations, roles, users and knowledge elements. 

Processing  services:  these  services  support  the  knowledge  management  functions  performed.  These  include knowledge discovery, capturing, filtering and organising. 

Sharing services: The knowledge created through the knowledge functions is subject to be shared among  the users. These services support this sort of process. That will include services which  manage requests,   responses, broadcastings. 

Notification service(s): These services support the communications between the users. 

The platform is to be a browser based application. That allows access to the services used by users to create  their organisations and groups, and creating and sharing the knowledge. These services allow the users to  access and manage SQL tables at the back end.    Testing the model will be done by creating different scenarios and to be evaluated by experts. Our test should  satisfy that our generic model caters the different collaborative scenarios in terms of knowledge creation and  sharing. 

5. Summary and future research  The paper presents a model for facilitating knowledge management in complex business systems. This paper  illustrates the  idea of how the model operates at the high  level. Also,  it gives an  idea about the choice of  technology to be used for implementation and the reason behind this choice.     Semantics of all activities are to be defined. Accordingly, the services are to be defined and designed based on  those  semantics.  Semantics  are  high  level  descriptions  of  how  the  model  operates  and  how  the  relations  between the different components sets are maintained. These semantics will define the operations that take  place  in  the  collaborative  environment,  including  the  semantics  of  coordination,  management  and  KM  activities.     Figure 5 illustrates how our model is to be developed by time. The figure shows that we start working through  business scenarios. We then define the business model and semantics accordingly. These semantics again are  applied on the scenarios; evaluating these semantics and making any changes needed to the business model.  The  implemented services are to be tested through different scenarios and evaluated as well. Changes and  modifications take place to the business model, semantics and services until the model the testing criteria is  satisfied.   

482

  Abdussalam Ali and Igor Hawryszkiewycz 

There  will  be  continuous  evaluations  among  these  three  levels  until  the  system  reaches  the  stability  and  satisfactory.   In other words, business model, semantics and technical model are subject to change every time  until the system reaches stability.    These services are to be implemented in the cloud to take advantage of the cloud computing environment.  They will be implemented as a prototype for testing and evaluation. 

 

Figure 5: Model development 

References 

Agrawal, D., Bernstein, P., Bertino, E., Davidson, S. & Dayal, U. 2011, Challenges and Opportunities with Big Data, Purdue  University. 

Alazmi, M. & Zairi, M. 2003, 'Knowledge management critical success factors', Total Quality Management & Business  Excellence, vol. 14, no. 2, pp. 199‐204. 

Ali, A. & Hawryszkiewwycz, I. 2012, 'A Modelling Approach for Knowledge Management in Complex Business Systems',  IADIS International Conference WWW/Internet, Madrid. 

Ali, A., Hawryszkiewycz, I. & Chen, J. 2014, 'Services for Knowledge Sharing in Dynamic Business Networks', paper  presented to the Australasian Software Engineering Conference, Sydney. 

Awad, E.M. & Ghaziri, H.M. 2004, 'Working Smarter, Not Harder', in, Knowledge Management, Pearson Education, Inc,  New Jersey, pp. 24‐5. 

Barnaghi, P., Sheth, A. & Henson, C. 2013, 'From Data to Actionable Knowledge: Big Data Challenges in the Web of Things',  IEEE Intelligent Systems, vol. 28, no. 6, pp. 6‐11. 

Birkinshaw, J. 2001, 'Why is Knowledge Management So Difficult?', Business Strategy Review, vol. 12, no. 1,   pp. 11‐8.  Currie, G. & Maire, K. 2004, 'The Limits of a Technological Fix to Knowledge Management: Epistemological, Political and 

Cultural Issues in the Case of Intranet Implementation', Management Learning, vol. 35, no. 1, pp. 9‐29.  Dakilir, K. 2011, 'The Knowledge Management Cycle', in, Knowledge Management in Theory and Practice, 2nd edn, 

Massachusetts Institute of Technology, London, pp. 31‐58.  Davenport, T.H., Barth, P. & Bean, R. 2012, 'How 'Big Data' Is Different', Mit Sloan Management Review, vol. 54, no. 1, pp. 

42‐47.  Davenport, T.H., De Long, D.W. & Beers, M.C. 1998, 'Successful knowledge management projects', Sloan Management 

Review, vol. 39, no. 2, pp. 43‐57.  Fernandez, I. & Sabherwal, R. 2010, 'Knowledge Management Solutions: Processes and Systems', in, Kowledge 

Management, Systems and Processes, M.E. Sharpe, Inc, New York, pp. 56‐70.  Fischer, G. & Ostwald, J. 2001, 'Knowledge Management: Problems, Promises, Realities, and Challenges', IEEE Intelligent 

Systems, vol. 16, no. 1, pp. 60‐72.  Hasgall, A.E. 2012, 'The effectiveness of social networks in complex adaptive working environments', Journal of Systems 

and Information Technology, vol. 14, no. 3, pp. 220‐35.  Kabir, N. & Carayannis, E. 2013, 'Big Data, Tacit Knowledge and Organizational', Journal of Intelligence Studies in Business, 

vol. 3, no. 3, pp. 54‐62.  Marston, S., Li, Z., Bandyopadhyay, S., Zhang, J. & Ghalsasi, A. 2011, 'Cloud computing ‐‐ The business perspective', Decision 

Support Systems, vol. 51, no. 1, pp. 176‐89. 

483

  Abdussalam Ali and Igor Hawryszkiewycz 

  McDermott, R. 1999, 'Why information technology inspired but cannot deliver knowledge management', California 

Management Review, vol. 41, no. 4, pp. 103‐17.  Miller, J.G. 1965, 'Living systems: Structure and process', Behavioral Science, vol. 10, no. 4, pp. 337‐79.  Moffett, S., McAdam, R. & Parkinson, S. 2003, 'Technology and people factors in knowledge management: an empirical 

analysis', Total Quality Management & Business Excellence, vol. 14, no. 2, pp. 215‐24.  Nunes, M.B., Annansingh, F., Eaglestone, B. & Wakefield, R. 2006, 'Knowledge management issues in knowledge‐intensive 

SMEs', Journal of Documentation, vol. 62, no. 1, pp. 101‐19.  Pillania, R.K. 2008, 'Strategic issues in knowledge management in small and medium enterprises', Knowledge Management 

Research & Practice, vol. 6, no. 4, pp. 334‐8.  Van Zolingen, S.J., Streumer, J.N. & Stooker, M. 2001, 'Problems in Knowledge Management: A Case Study of a Knowledge‐

Intensive Company', International Journal of Training and Development, vol. 5, no. 3, pp. 168‐84.  Wong, K.Y. 2005, 'Critical success factors for implementing knowledge management in small and medium enterprises', 

Industrial Management & Data Systems, vol. 105, no. 3‐4, pp. 261‐79. 

484

Copyright of Proceedings of the International Conference on Intellectual Capital, Knowledge Management & Organizational Learning is the property of Academic Conferences & Publishing International Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

20180610221922module_6_in_database_analytics.pdf

17BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

Advances in Predictive Modeling: How In-Database Analytics Will Evolve to Change the Game Sule Balkan and Michael Goul

Abstract Organizations using predictive modeling will benefit from recent efforts in in-database analytics—especially when they become mainstream, and after the advantages evolve over time as adoption of these analytics grows. This article posits that most benefits will remain under-realized until campaigns apply and adapt these enhancements for improved productiv- ity. Campaign managers and analysts will fashion in-database analytics (in conjunction with their database experts) to sup- port their most important and arduous day-to-day activities. In this article, we review issues related to building and deploying analytics with an eye toward how in-database solutions advance the technology. We conclude with a discussion of how analysts will benefit when they take advantage of the tighter coupling of databases and predictive analytics tool suites, particularly in end-to-end campaign management.

Introduction Decoupling data management from applications has provided significant advantages, mostly related to data independence. It is therefore surprising that many vendors are more tightly coupling databases and data warehouses with tool suites that support business intelligence (BI) analysts who construct and manage predictive models. These analysts and their teams construct and deploy models for guiding campaigns in areas such as marketing, fraud detection, and credit scoring, where unknown business patterns and/or inefficiencies can be discovered.

“In-database analytics” includes the embedding of predictive modeling functionalities into databases or data warehouses. It differs from “in-memory analytics,” which is

Sule Balkan is clinical assistant professor at Arizona State University,

department of information systems.

[email protected]

Michael Goul is professor and chair at Arizona State University, department

of information systems.

[email protected]

18 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

designed to minimizing disk access. In-database analytics focuses on the movement of data between the database or data warehouse and analysts’ workbenches. In the simplest form of in-database analytics, the computation of aggregates such as average, variance, and other statisti- cal summaries can be performed by parallel database engines quickly and efficiently—especially in contrast to performing computations inside an analytics tool suite with comparatively slow file management systems. In tightly coupled environments, those aggregates can be passed from the data engine to the predictive modeling tool suite when building analytical models such as statis- tical regression models, decision trees, and even neural networks. In-database analytics also enable streamlining of modeling processes.

The typical modeling processes referred to as CRISP-DM, SEMMA, and KDD contain common BI steps or phases. Knowledge Discovery in Databases (KDD) refers to the broad process of finding knowledge using data mining (DM) methods (Fayyad, Piatetski-Shapiro, Smyth, and Uthurusamy, 1996). KDD relies on using a database along with any required preprocessing, sub-sampling, and transformation of values in that database. Another version of a DM process approach was developed by SAS Institute: Sample, Explore, Modify, Model, Assess (SEMMA) refers to the lifecycle of conducting a DM project.

Another approach, CRISP-DM, was developed by a consortium of Daimler Chrysler, SPSS, and NCR. It stands for CRoss-Industry Standard Process for Data Mining, and its cycle has six stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment (Azavedo and Santos, 2008). All three methodologies address data mining processes. Even though the three methodologies are different, their common objective is to produce BI by guiding the construction of predictive models based on historical data.

A traditional way of discussing methodologies for predic- tive analytics involves a “sense, assess, and respond” cycle that organizations and managers should apply in making effective decisions (Houghton, El Sawy, Gray, Donegan, and Joshi, 2004). Using historical data to enable managers to sense what is happening in the environment has been the

foundation of the recent thrust to vitalize evidence-based management (Pfeffer and Sutton, 2006). Predictive models help managers assess and respond to the environment in ways that are informed by historical data and the patterns within that data. Predictive models help to scale responses because, for example, scoring models can be constructed to enable the embedding of decision rules into business processes. In-database analytics can streamline elements of the “sense, assess, and respond” cycle beyond those steps or phases in KDD, SEMMA, and CRISP-DM.

This article explains how basic in-database analytics will advance predictive modeling processes. However, we argue that the most important advancements will be discovered when actual campaigns are orchestrated and campaign managers access the new, more tightly coupled predictive modeling tool suites and database/data warehouse engines. We assert that the most important practical contribution of in-database analytics will occur when analysts are under pressure to produce models within time-constrained campaigns, and performances from earlier campaign steps need to be incorporated to inform follow-up campaign steps.

The next section discusses current impediments to predic- tive analytics and how in-database analytics will attempt to address them. We also discuss the benefits to be realized after more tightly coupled predictive analytics tool suites and databases/data warehouses become widely available. These benefits will be game-changers and will occur in such areas as end-to-end campaign management.

What is Wrong with Current Predictive Analytics Tool Suites? Current analytics solutions require many steps and take a great deal of time. For analysts who build, maintain, deploy, and track predictive models, the process consists of many distributed processes (distributed among analysts, tool suites, and so on). This section discusses challenges that analysts face when building and deploying predictive models.

Time-Consuming Processes To build a predictive model, an analyst may have to tap into many different data sources. Data sources must con-

PREDICTIV E MODELING

19BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

tain known values for target variables in order to be used when constructing a predictive model. All the attributes that might be independent variables in a model may reside in different tables or even different databases. It takes time and effort to collect and synthesize this data.

Once all of the needed data is merged, each of the inde- pendent variables is evaluated to ascertain the relations, correlations, patterns, and transformations that will be required. However, most of the data is not ready to be analyzed unless it has been appropriately customized. For example, character variables such as gender need to be manipulated, as do numeric variables such as ZIP code. Some continuous variables may need to be converted into scales. After all of this preparation, the modeling process continues through one of the many methodologies such as KDD, CRISP-DM, or SEMMA. For our purposes in this article, we will use SEMMA (see Figure 1).

The first step of SEMMA is data sampling and data partitioning. A random sample is drawn from a popula- tion to prevent bias in the model that will be developed. Then, a modeling data set is partitioned into training and validation data sets. Next is the Explore phase, where each explanatory variable is evaluated and its associations with other variables are analyzed. This is a time-consuming step, especially if the problem at hand requires evaluating many independent variables.

In the Modify phase, variables are transformed; outliers are identified and filtered; and for those variables that are not fully populated, missing value imputation strategies are determined. Rectifying and consolidating different analysts’ perspectives with respect to the Modify phase can be arduous and confusing. In addition, when applying transformations and inserting missing values in large data

sets, a tool suite must apply operations to all observations and then store the resulting transformations within the tool suite’s file management system.

Many techniques can be used in the Model phase of SEMMA, such as regression analysis, decision trees, and neural networks. In constructing models, many tool suites suffer from slow file management systems, which can constrain the number and quality of models that an analyst can realistically construct.

The last phase of SEMMA is the Assess phase, where all models built in the modeling phase are assessed based on validation results. This process is handled within tool suites, and it takes considerable time and many steps to complete.

Multiple Versions and Sources of the Truth Another difficulty in building and maintaining predictive models, especially in terms of campaign management, is the risk that modelers may be basing their analysis on multiple versions and sources of data. That base data is often referred to as the “truth,” and the problem is often referred to as having “multiple versions of the truth.”

To complete the time-consuming tasks of building predictive models as just described, each modeler extracts data from a data warehouse into an analytics workstation. This may create a situation where different modelers are working from different sources of truth, as modelers might extract data snapshots at different times (Gray and Watson, 2007). Also, having multiple modelers working on different predictive models can mean that each modeler is analyzing the data and creating different transformations from the same raw data without adopting a standardized method or a naming convention. This makes deploying

Input data, sampling, data partition

Ranks-plots variable selection

Transform variable, �lter outliers, missing imputation

Regression, tree, neural network

Assessment, score, report

SAMPLE EXPLORE MODIFY MODEL ASSESS

Figure 1. SEMMA methodology supported by SAS Enterprise Mining environment

20 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

multiple models very difficult, as the same raw data may be transformed in different ways using different naming conventions. It also makes transferring or sharing models across different business areas challenging.

Another difficulty relates to the computing resources on each modeler’s workbench when multiple modelers are going through similar, redundant steps of data prepara- tion, transformation, segmentation, scoring, and all the other functions that can take a great deal of disk space and CPU time.

The Challenges of Leveraging Unstructured Data and Web Data Mining in Modeling Environments Modelers often tap into readily available raw data in the database or data warehouse. However, unstructured data is rarely used during these phases because handling data in the form of text, e-mail documents, and images is computationally difficult and time consuming. Convert- ing unstructured data into information is costly in a campaign management environment, so it isn’t often done. The challenges of creating reusable and repeatable variables for deployment make using unstructured data even more difficult.

Web data mining spiders and crawlers are often used to gather unstructured data. Current analyst tool suite processes for unstructured data require that modelers understand archaic processing commands expressed in specialized, non-standard syntax. There are impediments to both gathering and manipulating unstructured data, and there are difficulties in capturing and applying predictive models that deal with unstructured data. For example, clustering models may facilitate identifying rules for detecting what cluster a new document is most closely aligned with. However, exporting that clustering rule from the predictive modeling workbench into a production environment is very difficult.

Managing BI Knowledge Worker Training and Standardization of Processes In most organizations, there is a centralized BI group that builds, maintains, and deploys multiple predictive models for different business units. This creates economies of scale, because having a centralized BI group is definitely more

cost effective than the alternative. However, the economies of scale do not cascade into standardization of processes among analyst teams. Each individual contributor usually ends up with customized versions of codes. Analysts may not be aware of the latest constructs others have advanced.

What Basic Changes Will In-Database Analytics Foster? In-database analytics’ major advantage is the efficiencies it brings to predictive model construction processes due to processing speeds made possible by harnessing parallel database/warehouse engine capabilities. Time savings are generated in the completion of computationally intensive modeling tasks. Faster transformations, missing-value imputations, model building, and assessment operations create opportunities by leaving more time available for fine-tuning model portfolios. Thanks to increasing cooperation between database/warehouse experts and predictive modeling practitioners, issues associated with non-standardized metadata may also be addressed. In addition, there is enhanced support for analyses of very large data sets. This couldn’t come at a better time, because data volumes are always growing.

In-database analytics make it easier to process and use unstructured data by converting complicated statisti- cal processes into manageable queries. Tapping into unstructured data and creating repeatable and reusable information—and combining this into the model-building process—may aid in constructing much better predictive models. For example, moving clustering rules into the database eliminates the difficulty of exporting these rules to and from tool suites. It also eliminates most temporary data storage difficulties for analyst workbenches.

Shared environments created by in-database analytics may bring business units together under common goals. As different business units tap into the same raw data, includ- ing all possible versions of transformations and metadata, productivity can be enhanced. When new ways of building models are available, updates can be made in-database. All individual contributors have access to the latest developments, and no single business unit or individual is left behind. Saving time in the labor-intensive steps of model building, working from a single source of truth,

PREDICTIV E MODELING

21BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

having access to repeatable and reusable structured and unstructured data, and making sure all the business units are working with the same standards and updates—all this makes it easier to transfer knowledge as new analysts join or move across business units. Table 1 summarizes the preliminary benefits of in-database analytics for modelers.

Context for In-Database Analytics Innovation To drive measurable business results from predictive models, SEMMA (or a similar methodology) is followed by a deployment cycle. That cycle may involve the continued application of models in a (recurring) campaign, refine- ment when model performance results are used to revise other models, making decisions on whether completely new models are required given model performance, and so on. We distinguish deployment from the SEMMA-supported phase (intelligence) because deployment often engages the broader organization and requires a predictive model (or models) to be put into actual business use. This section introduces a new methodology we created to describe deployment: “DEEPER” (Design, Embed, Empower, Performance- measurement, Evaluate, and Re-target). Figure 2 depicts the iterative relationship between SEMMA and DEEPER.

The DEEPER phases delineate, in sequential fashion, the types of activities involved in model deployment with a special emphasis on campaign management. The

design phase involves making plans for how to transition a scoring model (or models) from the tool suite (where it was developed) to actual application in a business context. It also involves thinking about how to capture the results of applying the model and storing those results for subsequent analysis. There may also be other data that a campaign manager wishes to capture, such as the time taken before seeing a response from a target. A proper design can elimi- nate missteps in a campaign. For example, if a targeted catalog mailing is enabled by a scoring model developed using SEMMA, then users must choose which deciles to target first, how to capture the results of the campaign (e.g., actual purchases or requests for new service), and what new data might be appropriate to capture during the campaign.

SEMMA

INTELLIGENCE DEPLOYMENT

DEEPER

EVALUATE PER FO

R M

A N

C E E

M POW

ER EMBED

D ES

IG N

R

E TA

RG ET

M EA

S U

R E

SAMPLE

E X P LO

R E M

O DI

FY

MODEL ASSESS

Figure 2. DEEPER phases guide the deployment, adoption, evaluation, and recalibration of predictive models.

Table 1. Preliminary benefits of in-database analytics

Process Benefits

Data set creation and preparation Reduce cycle time by parallel-processing multiple functions; accurate andtimely completion of tasks by functional embedding

Data processing and model buildingby multiple analysts Eliminate multiple versions of truth and large data set movements to andfrom analytical tool suites

Unstructured data management Broaden analytics capability by streamlining repeatability and reusability

Training and standardization Create operational and analytical efficiencies; access to latest developments; automatically update metadata

22 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

Once designed, the model must be accurately embedded into business processes. Model score views must be secured; developers must ensure scores appear in user interfaces at the right time; and process managers must be able to insert scores into automated business process logic. Embedding a predictive model may require safeguards to exceptions. If there are exceptions to applications of a model, other safeguards need to be considered.

Making the results of a predictive model (e.g., a score) available to people and systems is just the first step in ensuring it is used. In the empower phase, employees may need to be trained to interpret model results; they may have to learn to look at data in a certain way using new interfaces; or they may need to learn the benefits of evidence-based management approaches as supported by predictive modeling. Similarly, if people are involved, test- ing may be required to ensure that training approaches are working as intended. The empower step ensures appropriate behaviors by both systems and people as they pertain to the embedding of the predictive model into business processes.

A campaign begins in earnest after the empower phase. Targets receive their model-prescribed treatments, and reactions are collected as planned for in the design phase of DEEPER. This reactions-directed phase, performance measurement, involves ensuring the reactions and events subsequent to a predictive model’s application are captured and stored for later analysis. The results may also be captured and made available in real-time support for campaign managers. Dashboards may be appropriate for monitoring campaign progress, and alerts may support managers in making corrections should a campaign stray from an intended path. If there is an anomaly, or when a campaign has reached a checkpoint, campaign managers take time to evaluate the effectiveness or current progress of the campaign. The objective is to address questions such as:

■■ Are error levels acceptable?

■■ Were campaign results worth the investment in the predictive analytics solution?

■■ How is actual behavior different from predicted behavior for a model or a model decile?

This is the phase when the campaign’s effectiveness and current progress are assessed.

The results of the evaluate phase of DEEPER may lead to a completely new modeling effort. This is depicted in Figure 3 by the gray background arrow leading from evaluate to the sample phase of SEMMA. This implies a transition from deployment back to what we have referred to as intelligence. However, there is not always time to return to the intelligence cycle, and minor alterations to a model might be deemed more appropriate than starting over. The latter decision is most prevalent in time-pressured, recur- ring campaigns. We refer to this phase as re-target, which requires analysts to take into account new information gathered as part of the performance management deploy- ment phase. It also takes advantage of the plans for how this response information was encoded per the design phase of deployment.

The most important consideration involves interpreting results from the campaign and managing non-performing targets. A non-performing target is one that scored high in a predictive model, for example, but that did not respond as predicted. In a recurring campaign, there may be an effort to re-target that subset. There could also be an effort to re-target the campaign to another set of targets, e.g., those initially scored into other deciles. Re-targeting can be a time-consuming process; new data sets with response results need to be made available to predictive modeling tool suites, and findings from tracking need to be incorpo- rated into decisions.

DEEPER provides the context for considering how improvements to in-database analytics can be game-chang- ers. In-database analytics can make significant inroads to DEEPER processes that take time and are under-supported by predictive modeling tool suites. However, these improve- ments will be driven by analysts who work closely with their organizations’ database experts. This combination of analyst and data management skills, experience, and knowledge will spur innovation significantly beyond current expectations.

PREDICTIV E MODELING

23BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

How Might In-Database Analytics for DEEPER Evolve? Extending in-database analytics to DEEPER processes requires considering how each DEEPER phase might be streamlined given tighter coupling between predictive modeling tool suites and databases/data warehouses. Although many of the advantages of this tighter coupling may be realized differently by different organizations, there are generic value streams to guide efforts. Here the phrase “value stream” refers to process flows central to DEEPER. This section discusses these generic value streams: (1) intelligence-to-plan, (2) plan-to-implementation, (3) implementation-to-use, (4) use-to-results, (5) results-to- evaluation, and (6) evaluation-to-decision.

In the design phase of DEEPER, planning can be facili- tated by examining possible end-user database views that could be augmented with predictive intelligence. Instead of creating new interfaces, it is possible that Web pages equipped with embedded database queries can quickly retrieve and display predictive model scores to decision makers or front-line employees. Many of these displays are already incorporated into business processes, so opportuni- ties to use the tables and queries to supply model results can streamline implementation. When additional data items need to be captured, that data may be captured at the point of sale or other customer touch points. A review of current metadata may speed up the design of a suitable deployment strategy. In addition to “pushing” model intelligence to interfaces, there may also be ways of “pulling” data from the database/warehouse to facilitate re-targeting or for initiating new SEMMA cycles.

For example, it may be possible to design queries to automate the retrieval of data items such as target response times from operational data stores. Similarly, it may be possible to use SQL to aggregate the information needed for this type of next-step analysis. For example, total sales to a customer within a specified time period can be aggregated using a query and then used in the re-targeting phase to reflect whether a target performed as predicted. In-database analytics can support the design phase because it eliminates many of the traditional bottlenecks such as complex requirements gathering and the creation of formal specification documents (including use cases). Instead,

existing use cases can be reviewed and augmented, and database/warehouse–supported metadata facilities can support the design of schema for capturing new target response data. We refer to this as an intelligence-to-plan value stream for the in-database analytics supported design deployment phase.

In the embed phase, transferring scored model results to tables is a first step in considering ways to make use of database/warehouse capabilities to support DEEPER. Once the scores are appropriately stored in tables, there are many opportunities to use queries to embed the scores into people-supported and automated business processes. For example, coding to retrieve scores for inclusion in front-line employee interfaces can be done in a manner consistent with other embedded SQL applications. This saves time in training interface developers because it implies that the same personnel who implemented the interfaces can effectively alter them to include new intelligence.

There is also no need for additional project governance functions or specialized software. In fact, database/ warehouse triggers and alerts can be used to ensure that predictive analytics are used only when model deployment assumptions are relevant. As the database/warehouse is the same place where analytic model results reside, there are numerous implementation advantages. We refer to this as a plan-to-implementation value stream for the in-database analytics supported embed deployment phase.

After implementation, testing will ensure that model results/scores are understandable to decision makers (the empower phase) and that their performance can scale when production systems are at high capacity. Such stress tests can be conducted in a manner similar to database view tests. Because of the inherent speed of database/ warehouse systems, their performance will likely exceed separate, isolated workbench performance. Global roll-out can be eased by tried-and-true database/warehouse roll-out processes. We refer to this as an implementation-to-use value stream for the in-database analytics supported empower deployment phase.

Similarly, the use-to-results value stream is that part of a campaign when actions are taken and targets respond.

24 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

In this performance management phase of deployment, dashboards can be used to track performance, database tables can automatically collect and store ongoing campaign results, queries can aggregate responses over time as part of automating responses, and many other in-database solutions can help to streamline related processes. This information is central to the evaluate phase, where the results-to-evaluation value stream can enable careful scrutiny of the predictive analytics model portfolio. Queries can be written to compare actual results to those predicted during SEMMA phases. When more than one model has been constructed in the SEMMA processes, all can be re-examined in light of the new information about responses. If-then statements can be embedded in queries to identify target segments that have responded according to business goals, and remaining non-responders can be quickly identified.

Such analysis can be done for each analytical model in the portfolio and for each decile of predicted respondents associated with those models. This has been an enormously time-consuming process in the past, but the database/ware- house query engine can conduct this type of post-analysis efficiently. Queries can also identify subsets of respondents that outperformed the predicted model performance—and those that significantly under-performed. This type of analysis can be quickly supported through queries, and it can provide significant insight for the re-target phase.

Following the results-to-evaluation value stream of the deployment cycle, the evaluation-to-decision value stream focuses on whether a new intelligence cycle (a repeat of SEMMA processes) is required. If performance results indicate major model failures, then a repeat is likely necessary to resurrect and continue a campaign. Even if there weren’t major failures, environmental changes such as economic conditions may have rendered models outdated. Data collected in the performance evaluation phase may help to streamline the decision process. If costs aren’t being recovered, then it is likely that either the campaign will cease or a new intelligence cycle is necessary.

Often a portfolio of models is created in the initial intel- ligence cycle. It may be possible to use queries to automate the process of recalculating the prior and anticipated

performance of the models in the portfolio. If models exist that were not used but appear to perform better, those models may be used in the next DEEPER cycle. Alterna- tively, a combination or pooling of models might be most appropriate. Again, automated queries might be able to provide decision support for such pooling options, and they can aid in scheduling the appropriate model for the data sets as the DEEPER cycle progresses. In addition, it may be possible to use queries to apply business rules to manage data sets, and prior results could inform the scheduling of resting periods for targets such that each target isn’t inundated with catalog mailings, for example.

Table 2 summarizes key generic value streams that can be supported by in-database analytics and briefly describes the possibilities discussed in this section. Opportunities to evolve in-database analytics are likely to be numerous.

Conclusion In-database analytics create an environment where functions are embedded and processed in parallel, thereby streamlining the steps of both intelligence (e.g., SEMMA) and deployment (e.g., DEEPER) cycles. As data sources are updated, attribute names and formats may change, yet they are sharable. In-database analytics can support quality checks and create warning messages if the range, format, and/or type of data differ from a previous version or model assumptions. If external data has attributes that were not in the data dictionary, metadata can be updated automatically. Data conversions can be handled in-database and only once instead of being repeated by multiple modelers. In-database analytics fosters stability, enhances efficiency, and improves productivity across business units.

In-database analytics will be critical to a company’s bottom line when models are deployed and there is time pressure for multiple, successive campaigns where ongoing results can be used to build updated, improved predictive models. Enhancements can be realized in a host of value streams. For example, in-database analytics can significantly reduce cycle times for rebuilding and redeploying updated models to meet campaign dead- lines. As multiple models are constructed, in-database analytics will enable managing them as a portfolio. Timely responses, tracking, and fast interpretation of

25BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

early responders to campaigns will enable companies to fine-tune business rules and react in record time.

As the fine line between intelligence and deployment cycles fades because of the fast-paced environment supported by in-database analytics, businesses may move away from the concept of campaign management into trigger- based, “lights-out” processing, where all data feeds are automatically updated and processed, and there is no need to compile data into periodic campaigns. There will be real- time decision making with instant scoring each time there is an update in one of the important independent variables. Analysts will spend their time fine-tuning model perfor- mance, building business rules, analyzing early results, monitoring data movements, and optimizing the use of multiple models—instead of dealing with the manual tasks of data preparation, data cleansing, and managing file movements and basic statistical processes that have been moved into the database/warehouse.

Although lights-out processing is not on the near-term horizon, the evolution of in-database analytics promises to move organizations in that direction. Once in the hands of analysts and their database/warehouse teams, in-database analytics will be a game-changer. ■

References Azevedo, Ana, and Manuel Felipe Santos [2008]. “KDD,

SEMMA AND CRISP-DM: A Parallel Overview.” IADIS European Conference Data Mining, pp. 182–185.

Fayyad, U. M., Gregory Piatetski-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy [1996]. Advances in Knowledge Discovery and Data Mining, AAAI Press/The MIT Press.

Gray, Paul, and Hugh J. Watson, Hugh [2007]. “What Is New in BI,” Business Intelligence Journal, Vol. 12, No. 1.

Houghton, Bob, Omar A. El Sawy, Paul Gray, Craig Donegan, and Ashish Joshi [2004]. “Vigilant Information Systems for Managing Enterprises in Dynamic Supply Chains: Real-Time Dashboards at Western Digital,” MIS Quarterly Executive, Vol. 3, No. 1.

Pfeffer, Jeffrey, and Robert I. Sutton [2006]. “Evidence Based Management,” Harvard Business Review, January.

Table 2. Generic value streams and areas for innovation with in-database analytics

Intelligence-to-plan Planning is streamlined; push and pull strategies are feasible; schema design can support planning

Plan-to-implementation Scores maintained in-database; embedded SQL in HTML can facilitate view deployment; triggers and alerts can be used to guard for exceptions

Implementation-to-use Stress testing and global rollout follow database/warehouse methodologies and rely on common human and physical resources

Use-to-results Dashboards can be readily adapted; database/warehouse tables can be used as response aggregators

Results-to-evaluation Re-examine all created models efficiently in light of response information; embed if-then logic to re-target non- responders

Evaluation-to-decision Consider applying different models; allow targeted respondents to “rest”; use database to provide decision support for deciding to re-target or re-enter the intelligence cycle

Copyright of Business Intelligence Journal is the property of Data Warehousing Institute and its content may

not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written

permission. However, users may print, download, or email articles for individual use.

20180610221658module_6_services_to_support_knowledge_sharing_in_complex.pdf

Services to Support Knowledge Sharing in Complex Business  Networks, Big Data as the Source 

Abdussalam Ali and Igor Hawryszkiewycz  University of Technology Sydney, Sydney, Australia  [email protected]   [email protected]     Abstract:  Big  Data  has  become  a  buzzword  that  refers  to  the  complex  and  massive  data  that  is  either  structured  or  unstructured and not easy to be captured and processed by traditional tools and software applications. The term refers to  the large size of data that is created through the activities using of Information and communication technology (ICT). Big  Data in our research is the big container to be explored to discover knowledge and information based on the searching  context. Big Data may include both explicit and tacit sources of information and knowledge. Businesses should consider all  sources in the business environment when discovering and capturing knowledge by accessing this data. Business networks,  as a type of social networks, are the mechanism for performing knowledge sharing and transfer for innovation and gaining  competitive advantages. The goal of our research is to design and implement a generic model of services that support and  coordinate  business  networks  to  discover,  capture,  create  and  share  knowledge.  Implementing  these  services  involves  considering  Big  Data  as  the  source  of  information  and  knowledge.  This  model  is  to  be  implemented  as  a  platform  of  services in the cloud. Although the model and services are generic, the platform is to be customisable for business’s special  needs. These services will support businesses in terms of creating collaborative environments and adapting to any changes  that  are  happening  in  the  business  environment  while  collaboration  is  in  operation.  A  prototype  of  services  is  to  be  implemented in the cloud environment. This prototype is to be tested by experts through a case study to measure the  success and performance of the model.    Keywords: big data, knowledge sharing, business networks, cloud computing 

1. Introduction  Big  Data  has  emerged  through  the  development  of  Information  and  communication  technology  (ICT)  and  people using it. The term refers to the large size of data and information around the world because of its fast  growth through the utilisation of advanced technology and huge storage systems.    Nearly 20 quintillion ( ) bytes of data are produced every day, containing unstructured, semi‐structured  and structured content. This content includes text and multimedia such as voice, audio and images (Barnaghi,  Sheth  &  Henson  2013).    Organisations  have  started  to  explore  the  huge  volume  of  data.  This  data  is  not  organised adequately in a database manner (Davenport, Barth & Bean 2012). Knowledge Management (KM) is  one of disciplines affected by this emergence. Businesses explore Big Data to capture the  information and  create knowledge for innovation and competitive advantages.     In  the  last  few  decades  businesses  have  started  to  introduce  technology  to  support  their  knowledge  management  strategies,  and  KM  systems  have  been  developed  accordingly.  Technology  is  considered  a  success  factor  for  KM  by  many  researches  and  studies,  such  as  Wong    (2005),  Moffett  et  al.  (2003)  and  Davenport et al. (1998). The study of Moffett et al. (2003) is based on a survey of 1000 British companies. They  found  that most of these companies introduced the technology as a main component to support KM. Wong  (2005) study, based on small‐medium enterprises (SMEs),  states that pretty‐implemented technology is one of  main factors that support the success of KM. Alazmi and Zairi (2003) is a paper based on 15 literatures to study  the  critical  factors  affecting  KM.  Their  findings  show  that  components  related  to  the  technology  factor  represent the highest percentage (17%) compared to other components mentioned in the literatures.     Although these researches state that technology is a critical factor to make KM successful in the firm, many  present the issues of technology related to KM systems. These issues can be summarised as followed: 

Technology does not serve more than being a storage and repository of information. That is because the  KM systems are designed in the same way and tradition of designing and implementing the information  systems  ((Currie  &  Maire  2004),  (Nunes  et  al.  2006)  and  (Birkinshaw  2001)).  In  addition,  IT  systems  operate as storages for explicit knowledge more than the tacit type, as the tacit type is difficult to gather  ((Nunes et al. 2006) and (Birkinshaw 2001)). 

476

  Abdussalam Ali and Igor Hawryszkiewycz 

This  leads  to  an  argument  of  overlooking  the  “social  interaction”  phenomena  when  introducing  IT  to  support KM. Although social interaction is important for people to exchange information and knowledge,   IT is considered to be a replacement for “social interaction” (Birkinshaw 2001). McDermont (1999) reports  that  knowledge  is  different  from  information  in  many  aspects.  As  a  result,  the  KM  systems  cannot  be  designed and implemented based on information systems concepts. 

The other limitation that can be mentioned here is that KM systems are not up to date and do not cater  for emerging needs ((Fischer & Ostwald 2001) and (Van Zolingen, Streumer & Stooker 2001)). One of these  emergences is Big Data as it is the main source of knowledge and knowledge creation, and consists of both  sources of knowledge, tacit and explicit. 

The aim of our research  is to  implement a generic model of services that support the social  interaction  in  knowledge sharing and transfer. From a Big Data perspective, these services will support knowledge discovery  and  capturing.  In  our  model  we  consider  Big  Data  as  the  big  container  that  contains  all  information  and  knowledge sources. This information and knowledge is in soft or hard form, online or offline and explicit or  tacit. Services to be implemented that support businesses to explore and discover knowledge resides within  Big Data. Section 2 “Big Data as the Knowledge Source” presents more information about this aspect.     The  other  component  to  be  mentioned  here  is  business  networks.  Business  networks,  as  a  type  of  social  networks,  are  the  mechanism  for  performing  knowledge  sharing  and  transfer  for  innovation  and  gaining  competitive advantages. The model is supposed to coordinate and manage these networks for creating and  sharing the knowledge. In addition, the model (Ali, Hawryszkiewycz & Chen 2014) flexibly provides businesses  and business units with the ability to quickly share and analyse knowledge to address emerging business needs  in their environment. This is based on the fact that businesses, these days, operate in complex environments.  The  services  need  to  be  generic  and  reconfigurable  as  knowledge  needs  cannot  be  anticipated  in  today’s  dynamic environment. Hasgall  (2012) performed a study based on a questionnaire to understand how social  networks  are  effective  in  supporting  organisations  to  adapt  and  respond  to  changes  in  their  complex  environment. The findings lead to a conclusion that social networks support employees by providing them with  knowledge. This knowledge can be integrated into the firm and can increase the sensitivity of the workers to  the environmental changes.    Consequently a flexible approach is needed where knowledge flows and responsibilities can be easily changed  without  the  need  to  reprogram  systems.  A  typical  scenario  may  be  a  new  partner  entering  a  network,  decisions to develop new products and services that require new expertise, or simply improving workflows.  Each of these not only brings in new knowledge but often also requires the rearrangement of responsibility for  processing the knowledge. Networks also exist within businesses where different business units network to  create new products and services for business clients. 

2. Big data as the knowledge source  Kabir and Carayannis (2013) characterise Big Data as the data which is too large and is not easy to capture and  analyse by traditional technology and tools. Authors consider Big Data as a main resource to create knowledge  with continuous growth. This is because of many factors, including continuous innovations in IT hardware and  software. It is a massive and large lake that can be used as resource to create knowledge (Kabir & Carayannis  2013).     Agrawa et al. (2011) present the challenges of Big Data as a source of knowledge to support decision making  and innovation. These challenges are: 

Heterogeneity which refers to the difference in the data format, and even if the data captured is in the  same format, differences will exist in terms of data structure and organisation. 

Scale as the main characteristic of Big Data is its big size and volume. Managing these large volumes and  its continuous growth is one of the challenges of Big Data. 

Timeliness,  which  means  that  the  speed  of  data  and  growing  rate  increases  by  introducing  the  new  technology and its continuous development. The challenge here is how to co‐op this increase and growing  rates in terms of discovering and capturing the data. 

477

  Abdussalam Ali and Igor Hawryszkiewycz 

  Privacy is one of big concerns in the context of Big Data. Dealing with privacy is both a technical and social 

issue. Acquiring personal data, for example, will raise many questions regarding privacy and at which level  this data can be used and published. 

Sorting  out  how  to  solve  these  issues  is  not  in  our  research  scope.  However,  it  may  be  included  in  our  framework  for  future  development.  Big  Data  in  our  model  is  the  big  container  to  be  explored  to  discover  knowledge  and  information  based  on  the  searching  context.  Big  Data  may  include  both  explicit  and  tacit  sources of  information and knowledge.  Internet, online databases, electronic sheets, electronic documents,  hard disk drives and offline printed documents and files all compose the Big Data container. On the other  hand, experts, skilled people, consultants, managers, workers and communities of practise members are all  examples  of  tacit  sources  in  the  Big  Data  container.  Information  in  the  Big  Data  container  is  presented  in  different formats, including database records, word and pdf documents, video, audio, images, etc. Capturing  and analysing knowledge from these format types is done by specialised applications. In our research we may  support these types of format  in terms of discovering them,  indexing them and knowing how to  link these  sources with the knowledge created based on them. Recommender service may be implemented to support  knowledge discovery in the future for other explorers and users.   

Kabir and Carayannis (2013) present their “Big Data Strategy Framework” as in Figure 1. 

 

Figure 1: Big data strategy framework (Kabir & Carayannis 2013) 

The authors show that infrastructure, team building and knowledge base as being the main components and  aspects of their framework. Infrastructure includes technology as one of its subcomponents. Teams should be  created  based  on  the  business’s  objectives.  The  knowledge  that  is  created  by  businesses  should  support  innovation and competitive advantage and be considered as new knowledge for future use and share (Kabir &  Carayannis 2013).    Our model is not based on this framework, but it supports our previous arguments that the other factors such  as social factors and the environment should be considered as well.    In conclusion, our aim  is to design and  implement a generic model of services that support and coordinate  business networks to discover, capture, create and share knowledge.  Implementing these services  involves  considering Big Data as the source of information and knowledge. 

3. The proposed model  This paper proposes a model to manage discovering, capturing, organising and sharing knowledge between  business  networks  within  a  complex  environment.  The  paper  sees  knowledge  sharing  as  predominantly  a  socio‐technical issue and Big Data as the source of knowledge.    The model provides the flexibility needed in today’s environment. In our model, any business creates its own  groups  and  organisations  to  gather  knowledge  and  information.  The  organisation  in  our  model  is  defined  according to Living Systems Theory (LST) as a group of groups that deal with one or more gathering projects  ((Ali  &  Hawryszkiewwycz  2012)  and  (Miller  1965)).  Each  group  within  the  organisation  processes  the  knowledge  and  information.  Thus  if  new  groups  are  created  to  respond  to  some  event,  knowledge  must  quickly flow to these groups. 

478

  Abdussalam Ali and Igor Hawryszkiewycz 

The strategies toward the model implementation can be described as follows: 

3.1 Generic knowledge management functions, elements and activities  Boundary  roles  must  often  define  the  knowledge  elements  to  be  managed  and  the  assignment  of  these  responsibilities to roles in their business unit. Referring to Ali et al. (2014), knowledge management functions  (KMF) have been defined throughout many literatures.    These  literatures  include  and  Fernandez  and  Sabherwal  (2010)  ,  Awad  and  Ghaziri  (Awad  &  Ghaziri  2004),  Daklir (2011) and the functions can be described as follows:  

Discovering: The process of finding where the knowledge resides.  

Gathering: Fernandez and Sabherwal (2010) define gathering as the process of obtaining knowledge from  the tacit (individuals) and explicit (such as manuals) sources.  

Filtering:  It  is  the  process  of  minimising  the  knowledge  and/or  information  gathered  by  rejecting  the  redundancy (Dakilir 2011).  

Organising: The process of composing the knowledge so that it can be easily retrieved and used to make  decisions (Awad & Ghaziri 2004).  

Sharing: It is a way of transferring knowledge between individuals and groups ((Awad & Ghaziri 2004) and   (Fernandez & Sabherwal 2010)). 

In this paper we have illustrated these functions by joining them to Big Data as shown in Figure 2.    Formally, there may be any number of knowledge elements, such as sales, purchases, proposals and so on. So  we might define a knowledge element K(sale) or K(purchase). It may be a latest sale or some new idea. Each of  these knowledge elements will go through the functions  in Figure 2. We use the notation Discover(K(sale)),  Gather(K(sale)) and Discover(K(purchase)). We call these knowledge processing activities. Thus any knowledge  element goes through all the KMFs.    A knowledge processing activity is a knowledge processing function applied to a knowledge element.    From Figure 2 the following points can be highlighted: 

These  functions  are  not  sequential.  For  example,  while  organising  the  user  asks  for  more  captured  or  discovered knowledge. 

While a specific function is running the user can acquire Big Data for more knowledge. 

Knowledge creation can happen at any stage. 

 

Figure 2: Knowledge management functions 

479

  Abdussalam Ali and Igor Hawryszkiewycz 

  3.2 Allocations:  Our goal is to develop a framework that provides choices for changing allocations as systems evolve. The goal  is  to  provide  the  flexibility  to  reconfigure  the  requirements  as  needs  change.  That  provides  the  ability  for  networks to share knowledge by assigning responsibilities. The following choices are possible: 

Type  1  Allocation  (knowledge  management  function  specialists)  ‐  Allocate  all  activities  of  the  same  knowledge management function to one group. 

Type 2  Allocation  (knowledge  element  specialists)  ‐  Allocate  all  knowledge processing  activities  on  the  same knowledge element to one organisation. The organisation then distributes the different knowledge  processing activities to different groups. 

Type 3 – Each functional unit has its own knowledge processing organisation or group 

Type 4 – Totally open (Hybrid ) 

 

Figure 3: Type 1 allocation 

Allocations are at two levels – allocation of the knowledge activity to the group, followed by the allocation of  action tasks to roles in the group.     As an example, the model for type 1 allocation is shown in Figure 3. Figure 3 illustrates an organisation of three  groups which gathers knowledge by assigning roles to them.     There  are  actors  participating  in  more  than  one  network.  For  example,  the  coordinators  participate  in  the  organisational network and also in the group network.    The model does not at this stage  include the agencies used  in the exchange of  information. The goal  is to  create these agencies through a cloud platform.    The output of the research is to implement a platform based on cloud technology to support these roles within  the groups. The major functions of this platform are as following: 

Creating and resigning the groups. 

Supporting group members to access the platform’s services to finish the role’s tasks.  

Enabling knowledge sharing between the groups and organisations. 

Supporting collaboration between businesses and enterprises. 

4. Model implementation  Our goal is to implement the model on the cloud environment to support knowledge sharing across a road  community.    The implementation will create the services for implementing the relationships between roles within business  networks  for  effective  knowledge  discovery,  gathering  and  sharing.  These  services  can  be  categorised  as  administrative services and processing services. Administrative services are those modules that support the 

480

  Abdussalam Ali and Igor Hawryszkiewycz 

administrative  activities.  Examples  of  this  are  creating  groups  and  organisations,  creating  roles,  assigning/resigning the roles to/from the organisations and groups, and creating users and assigning them to  roles.  

 

Figure 4: High level illustration of the model 

The processing services are the modules which are accessed by the users within the groups and organisations.  These services support the knowledge management functions shown in Figure 2. Our model should provide  services  for  businesses  to  create  their  collaborative  environments  either  within  the  business  itself  or  in  collaboration with other businesses. Services are supposed to provide the business with the capability to adapt  to  the  changes  that  occur  in  the  business  itself  or  in  the  environment.  This  adaptation  includes  managing  groups, organisations, roles and users as well as managing the relationships between these sets. Knowledge  management functions and activities should be considered in this operation.     Our goal  is to make these services configurable and that new required services can be  implemented upon  business  request  and  added  to  the  system  at  any  time.  In  addition,  roles  and  their  responsibilities  can  be  modified as well.    Figure 4 illustrates a scenario of gathering knowledge for a “new course project” in a faculty.     Big Data is the container of sources, as defined previously that can be explored by the users for knowledge  based on the defined knowledge elements. The cloud environment will contain the services that support the  knowledge  management  functions.  The  collaborative  organisation  “new  course”  discovers,  captures  and  organises the knowledge. This knowledge is based on the knowledge elements defined as “subject material”  and “market and prices”. Services in the cloud will support businesses to: 

Creating organisations and groups (eg.: discovering, capturing and organising). 

Creating roles (eg.: coordinator, material‐discoverer and organiser). 

Assigning these roles to the groups/organisations. 

Assigning users (eg.: John, David, Mac, etc.) to these roles. 

Supporting the knowledge management functions and processes. 

The services to be implemented are not considered as interfaces between the users and Big Data, as shown in  Figure 4.  Rather, they support users to discover through Big Data, and create and share knowledge among  themselves. 

481

  Abdussalam Ali and Igor Hawryszkiewycz 

  4.1 Technology  Cloud technology  is the  infrastructure to be used for  implementing our model to gain the advantages of  it.  That can be achieved by implemented the model as a platform of services delivered as a Software as a Service  (SaaS) to the beneficiaries.  The following are some advantages that communicate our research (Marston et al.  2011):    1.  The  low  cost  of  using  cloud  services.  That  allows  small  and  medium  businesses  to  benefit  from  these  services. Knowledge management is not focused in small and medium businesses (Pillania 2008) and one of  the  reasons  behind  that  is  the  high  costs  of  dedicated  knowledge  management  applications  and  systems  (Nunes et al. 2006).  2. Large capacity. One of the cloud features is to provide big sizes of data storage. In our model, the businesses  will  not  concern  about  the  continuous  scalability.  This  scalability  is  either  in  terms  of  the  number  of  organisations and groups created within the system or the amount of knowledge produced.  3.  Mobility.  Cloud  computing  is  an  online  based  technology.  That  allows  people  to  share  knowledge  and  participate in organisations and groups even if they are in different and distant geographic areas.    Services to be implemented can be categorised as following: 

Management services: these services to support the creation of the objects and components of the model  and maintaining the relationships between them. That include; managing and maintaining the groups and  organisations, roles, users and knowledge elements. 

Processing  services:  these  services  support  the  knowledge  management  functions  performed.  These  include knowledge discovery, capturing, filtering and organising. 

Sharing services: The knowledge created through the knowledge functions is subject to be shared among  the users. These services support this sort of process. That will include services which  manage requests,   responses, broadcastings. 

Notification service(s): These services support the communications between the users. 

The platform is to be a browser based application. That allows access to the services used by users to create  their organisations and groups, and creating and sharing the knowledge. These services allow the users to  access and manage SQL tables at the back end.    Testing the model will be done by creating different scenarios and to be evaluated by experts. Our test should  satisfy that our generic model caters the different collaborative scenarios in terms of knowledge creation and  sharing. 

5. Summary and future research  The paper presents a model for facilitating knowledge management in complex business systems. This paper  illustrates the  idea of how the model operates at the high  level. Also,  it gives an  idea about the choice of  technology to be used for implementation and the reason behind this choice.     Semantics of all activities are to be defined. Accordingly, the services are to be defined and designed based on  those  semantics.  Semantics  are  high  level  descriptions  of  how  the  model  operates  and  how  the  relations  between the different components sets are maintained. These semantics will define the operations that take  place  in  the  collaborative  environment,  including  the  semantics  of  coordination,  management  and  KM  activities.     Figure 5 illustrates how our model is to be developed by time. The figure shows that we start working through  business scenarios. We then define the business model and semantics accordingly. These semantics again are  applied on the scenarios; evaluating these semantics and making any changes needed to the business model.  The  implemented services are to be tested through different scenarios and evaluated as well. Changes and  modifications take place to the business model, semantics and services until the model the testing criteria is  satisfied.   

482

  Abdussalam Ali and Igor Hawryszkiewycz 

There  will  be  continuous  evaluations  among  these  three  levels  until  the  system  reaches  the  stability  and  satisfactory.   In other words, business model, semantics and technical model are subject to change every time  until the system reaches stability.    These services are to be implemented in the cloud to take advantage of the cloud computing environment.  They will be implemented as a prototype for testing and evaluation. 

 

Figure 5: Model development 

References 

Agrawal, D., Bernstein, P., Bertino, E., Davidson, S. & Dayal, U. 2011, Challenges and Opportunities with Big Data, Purdue  University. 

Alazmi, M. & Zairi, M. 2003, 'Knowledge management critical success factors', Total Quality Management & Business  Excellence, vol. 14, no. 2, pp. 199‐204. 

Ali, A. & Hawryszkiewwycz, I. 2012, 'A Modelling Approach for Knowledge Management in Complex Business Systems',  IADIS International Conference WWW/Internet, Madrid. 

Ali, A., Hawryszkiewycz, I. & Chen, J. 2014, 'Services for Knowledge Sharing in Dynamic Business Networks', paper  presented to the Australasian Software Engineering Conference, Sydney. 

Awad, E.M. & Ghaziri, H.M. 2004, 'Working Smarter, Not Harder', in, Knowledge Management, Pearson Education, Inc,  New Jersey, pp. 24‐5. 

Barnaghi, P., Sheth, A. & Henson, C. 2013, 'From Data to Actionable Knowledge: Big Data Challenges in the Web of Things',  IEEE Intelligent Systems, vol. 28, no. 6, pp. 6‐11. 

Birkinshaw, J. 2001, 'Why is Knowledge Management So Difficult?', Business Strategy Review, vol. 12, no. 1,   pp. 11‐8.  Currie, G. & Maire, K. 2004, 'The Limits of a Technological Fix to Knowledge Management: Epistemological, Political and 

Cultural Issues in the Case of Intranet Implementation', Management Learning, vol. 35, no. 1, pp. 9‐29.  Dakilir, K. 2011, 'The Knowledge Management Cycle', in, Knowledge Management in Theory and Practice, 2nd edn, 

Massachusetts Institute of Technology, London, pp. 31‐58.  Davenport, T.H., Barth, P. & Bean, R. 2012, 'How 'Big Data' Is Different', Mit Sloan Management Review, vol. 54, no. 1, pp. 

42‐47.  Davenport, T.H., De Long, D.W. & Beers, M.C. 1998, 'Successful knowledge management projects', Sloan Management 

Review, vol. 39, no. 2, pp. 43‐57.  Fernandez, I. & Sabherwal, R. 2010, 'Knowledge Management Solutions: Processes and Systems', in, Kowledge 

Management, Systems and Processes, M.E. Sharpe, Inc, New York, pp. 56‐70.  Fischer, G. & Ostwald, J. 2001, 'Knowledge Management: Problems, Promises, Realities, and Challenges', IEEE Intelligent 

Systems, vol. 16, no. 1, pp. 60‐72.  Hasgall, A.E. 2012, 'The effectiveness of social networks in complex adaptive working environments', Journal of Systems 

and Information Technology, vol. 14, no. 3, pp. 220‐35.  Kabir, N. & Carayannis, E. 2013, 'Big Data, Tacit Knowledge and Organizational', Journal of Intelligence Studies in Business, 

vol. 3, no. 3, pp. 54‐62.  Marston, S., Li, Z., Bandyopadhyay, S., Zhang, J. & Ghalsasi, A. 2011, 'Cloud computing ‐‐ The business perspective', Decision 

Support Systems, vol. 51, no. 1, pp. 176‐89. 

483

  Abdussalam Ali and Igor Hawryszkiewycz 

  McDermott, R. 1999, 'Why information technology inspired but cannot deliver knowledge management', California 

Management Review, vol. 41, no. 4, pp. 103‐17.  Miller, J.G. 1965, 'Living systems: Structure and process', Behavioral Science, vol. 10, no. 4, pp. 337‐79.  Moffett, S., McAdam, R. & Parkinson, S. 2003, 'Technology and people factors in knowledge management: an empirical 

analysis', Total Quality Management & Business Excellence, vol. 14, no. 2, pp. 215‐24.  Nunes, M.B., Annansingh, F., Eaglestone, B. & Wakefield, R. 2006, 'Knowledge management issues in knowledge‐intensive 

SMEs', Journal of Documentation, vol. 62, no. 1, pp. 101‐19.  Pillania, R.K. 2008, 'Strategic issues in knowledge management in small and medium enterprises', Knowledge Management 

Research & Practice, vol. 6, no. 4, pp. 334‐8.  Van Zolingen, S.J., Streumer, J.N. & Stooker, M. 2001, 'Problems in Knowledge Management: A Case Study of a Knowledge‐

Intensive Company', International Journal of Training and Development, vol. 5, no. 3, pp. 168‐84.  Wong, K.Y. 2005, 'Critical success factors for implementing knowledge management in small and medium enterprises', 

Industrial Management & Data Systems, vol. 105, no. 3‐4, pp. 261‐79. 

484

Copyright of Proceedings of the International Conference on Intellectual Capital, Knowledge Management & Organizational Learning is the property of Academic Conferences & Publishing International Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

20180610221657module_6_in_database_analytics.pdf

17BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

Advances in Predictive Modeling: How In-Database Analytics Will Evolve to Change the Game Sule Balkan and Michael Goul

Abstract Organizations using predictive modeling will benefit from recent efforts in in-database analytics—especially when they become mainstream, and after the advantages evolve over time as adoption of these analytics grows. This article posits that most benefits will remain under-realized until campaigns apply and adapt these enhancements for improved productiv- ity. Campaign managers and analysts will fashion in-database analytics (in conjunction with their database experts) to sup- port their most important and arduous day-to-day activities. In this article, we review issues related to building and deploying analytics with an eye toward how in-database solutions advance the technology. We conclude with a discussion of how analysts will benefit when they take advantage of the tighter coupling of databases and predictive analytics tool suites, particularly in end-to-end campaign management.

Introduction Decoupling data management from applications has provided significant advantages, mostly related to data independence. It is therefore surprising that many vendors are more tightly coupling databases and data warehouses with tool suites that support business intelligence (BI) analysts who construct and manage predictive models. These analysts and their teams construct and deploy models for guiding campaigns in areas such as marketing, fraud detection, and credit scoring, where unknown business patterns and/or inefficiencies can be discovered.

“In-database analytics” includes the embedding of predictive modeling functionalities into databases or data warehouses. It differs from “in-memory analytics,” which is

Sule Balkan is clinical assistant professor at Arizona State University,

department of information systems.

[email protected]

Michael Goul is professor and chair at Arizona State University, department

of information systems.

[email protected]

18 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

designed to minimizing disk access. In-database analytics focuses on the movement of data between the database or data warehouse and analysts’ workbenches. In the simplest form of in-database analytics, the computation of aggregates such as average, variance, and other statisti- cal summaries can be performed by parallel database engines quickly and efficiently—especially in contrast to performing computations inside an analytics tool suite with comparatively slow file management systems. In tightly coupled environments, those aggregates can be passed from the data engine to the predictive modeling tool suite when building analytical models such as statis- tical regression models, decision trees, and even neural networks. In-database analytics also enable streamlining of modeling processes.

The typical modeling processes referred to as CRISP-DM, SEMMA, and KDD contain common BI steps or phases. Knowledge Discovery in Databases (KDD) refers to the broad process of finding knowledge using data mining (DM) methods (Fayyad, Piatetski-Shapiro, Smyth, and Uthurusamy, 1996). KDD relies on using a database along with any required preprocessing, sub-sampling, and transformation of values in that database. Another version of a DM process approach was developed by SAS Institute: Sample, Explore, Modify, Model, Assess (SEMMA) refers to the lifecycle of conducting a DM project.

Another approach, CRISP-DM, was developed by a consortium of Daimler Chrysler, SPSS, and NCR. It stands for CRoss-Industry Standard Process for Data Mining, and its cycle has six stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment (Azavedo and Santos, 2008). All three methodologies address data mining processes. Even though the three methodologies are different, their common objective is to produce BI by guiding the construction of predictive models based on historical data.

A traditional way of discussing methodologies for predic- tive analytics involves a “sense, assess, and respond” cycle that organizations and managers should apply in making effective decisions (Houghton, El Sawy, Gray, Donegan, and Joshi, 2004). Using historical data to enable managers to sense what is happening in the environment has been the

foundation of the recent thrust to vitalize evidence-based management (Pfeffer and Sutton, 2006). Predictive models help managers assess and respond to the environment in ways that are informed by historical data and the patterns within that data. Predictive models help to scale responses because, for example, scoring models can be constructed to enable the embedding of decision rules into business processes. In-database analytics can streamline elements of the “sense, assess, and respond” cycle beyond those steps or phases in KDD, SEMMA, and CRISP-DM.

This article explains how basic in-database analytics will advance predictive modeling processes. However, we argue that the most important advancements will be discovered when actual campaigns are orchestrated and campaign managers access the new, more tightly coupled predictive modeling tool suites and database/data warehouse engines. We assert that the most important practical contribution of in-database analytics will occur when analysts are under pressure to produce models within time-constrained campaigns, and performances from earlier campaign steps need to be incorporated to inform follow-up campaign steps.

The next section discusses current impediments to predic- tive analytics and how in-database analytics will attempt to address them. We also discuss the benefits to be realized after more tightly coupled predictive analytics tool suites and databases/data warehouses become widely available. These benefits will be game-changers and will occur in such areas as end-to-end campaign management.

What is Wrong with Current Predictive Analytics Tool Suites? Current analytics solutions require many steps and take a great deal of time. For analysts who build, maintain, deploy, and track predictive models, the process consists of many distributed processes (distributed among analysts, tool suites, and so on). This section discusses challenges that analysts face when building and deploying predictive models.

Time-Consuming Processes To build a predictive model, an analyst may have to tap into many different data sources. Data sources must con-

PREDICTIV E MODELING

19BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

tain known values for target variables in order to be used when constructing a predictive model. All the attributes that might be independent variables in a model may reside in different tables or even different databases. It takes time and effort to collect and synthesize this data.

Once all of the needed data is merged, each of the inde- pendent variables is evaluated to ascertain the relations, correlations, patterns, and transformations that will be required. However, most of the data is not ready to be analyzed unless it has been appropriately customized. For example, character variables such as gender need to be manipulated, as do numeric variables such as ZIP code. Some continuous variables may need to be converted into scales. After all of this preparation, the modeling process continues through one of the many methodologies such as KDD, CRISP-DM, or SEMMA. For our purposes in this article, we will use SEMMA (see Figure 1).

The first step of SEMMA is data sampling and data partitioning. A random sample is drawn from a popula- tion to prevent bias in the model that will be developed. Then, a modeling data set is partitioned into training and validation data sets. Next is the Explore phase, where each explanatory variable is evaluated and its associations with other variables are analyzed. This is a time-consuming step, especially if the problem at hand requires evaluating many independent variables.

In the Modify phase, variables are transformed; outliers are identified and filtered; and for those variables that are not fully populated, missing value imputation strategies are determined. Rectifying and consolidating different analysts’ perspectives with respect to the Modify phase can be arduous and confusing. In addition, when applying transformations and inserting missing values in large data

sets, a tool suite must apply operations to all observations and then store the resulting transformations within the tool suite’s file management system.

Many techniques can be used in the Model phase of SEMMA, such as regression analysis, decision trees, and neural networks. In constructing models, many tool suites suffer from slow file management systems, which can constrain the number and quality of models that an analyst can realistically construct.

The last phase of SEMMA is the Assess phase, where all models built in the modeling phase are assessed based on validation results. This process is handled within tool suites, and it takes considerable time and many steps to complete.

Multiple Versions and Sources of the Truth Another difficulty in building and maintaining predictive models, especially in terms of campaign management, is the risk that modelers may be basing their analysis on multiple versions and sources of data. That base data is often referred to as the “truth,” and the problem is often referred to as having “multiple versions of the truth.”

To complete the time-consuming tasks of building predictive models as just described, each modeler extracts data from a data warehouse into an analytics workstation. This may create a situation where different modelers are working from different sources of truth, as modelers might extract data snapshots at different times (Gray and Watson, 2007). Also, having multiple modelers working on different predictive models can mean that each modeler is analyzing the data and creating different transformations from the same raw data without adopting a standardized method or a naming convention. This makes deploying

Input data, sampling, data partition

Ranks-plots variable selection

Transform variable, �lter outliers, missing imputation

Regression, tree, neural network

Assessment, score, report

SAMPLE EXPLORE MODIFY MODEL ASSESS

Figure 1. SEMMA methodology supported by SAS Enterprise Mining environment

20 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

multiple models very difficult, as the same raw data may be transformed in different ways using different naming conventions. It also makes transferring or sharing models across different business areas challenging.

Another difficulty relates to the computing resources on each modeler’s workbench when multiple modelers are going through similar, redundant steps of data prepara- tion, transformation, segmentation, scoring, and all the other functions that can take a great deal of disk space and CPU time.

The Challenges of Leveraging Unstructured Data and Web Data Mining in Modeling Environments Modelers often tap into readily available raw data in the database or data warehouse. However, unstructured data is rarely used during these phases because handling data in the form of text, e-mail documents, and images is computationally difficult and time consuming. Convert- ing unstructured data into information is costly in a campaign management environment, so it isn’t often done. The challenges of creating reusable and repeatable variables for deployment make using unstructured data even more difficult.

Web data mining spiders and crawlers are often used to gather unstructured data. Current analyst tool suite processes for unstructured data require that modelers understand archaic processing commands expressed in specialized, non-standard syntax. There are impediments to both gathering and manipulating unstructured data, and there are difficulties in capturing and applying predictive models that deal with unstructured data. For example, clustering models may facilitate identifying rules for detecting what cluster a new document is most closely aligned with. However, exporting that clustering rule from the predictive modeling workbench into a production environment is very difficult.

Managing BI Knowledge Worker Training and Standardization of Processes In most organizations, there is a centralized BI group that builds, maintains, and deploys multiple predictive models for different business units. This creates economies of scale, because having a centralized BI group is definitely more

cost effective than the alternative. However, the economies of scale do not cascade into standardization of processes among analyst teams. Each individual contributor usually ends up with customized versions of codes. Analysts may not be aware of the latest constructs others have advanced.

What Basic Changes Will In-Database Analytics Foster? In-database analytics’ major advantage is the efficiencies it brings to predictive model construction processes due to processing speeds made possible by harnessing parallel database/warehouse engine capabilities. Time savings are generated in the completion of computationally intensive modeling tasks. Faster transformations, missing-value imputations, model building, and assessment operations create opportunities by leaving more time available for fine-tuning model portfolios. Thanks to increasing cooperation between database/warehouse experts and predictive modeling practitioners, issues associated with non-standardized metadata may also be addressed. In addition, there is enhanced support for analyses of very large data sets. This couldn’t come at a better time, because data volumes are always growing.

In-database analytics make it easier to process and use unstructured data by converting complicated statisti- cal processes into manageable queries. Tapping into unstructured data and creating repeatable and reusable information—and combining this into the model-building process—may aid in constructing much better predictive models. For example, moving clustering rules into the database eliminates the difficulty of exporting these rules to and from tool suites. It also eliminates most temporary data storage difficulties for analyst workbenches.

Shared environments created by in-database analytics may bring business units together under common goals. As different business units tap into the same raw data, includ- ing all possible versions of transformations and metadata, productivity can be enhanced. When new ways of building models are available, updates can be made in-database. All individual contributors have access to the latest developments, and no single business unit or individual is left behind. Saving time in the labor-intensive steps of model building, working from a single source of truth,

PREDICTIV E MODELING

21BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

having access to repeatable and reusable structured and unstructured data, and making sure all the business units are working with the same standards and updates—all this makes it easier to transfer knowledge as new analysts join or move across business units. Table 1 summarizes the preliminary benefits of in-database analytics for modelers.

Context for In-Database Analytics Innovation To drive measurable business results from predictive models, SEMMA (or a similar methodology) is followed by a deployment cycle. That cycle may involve the continued application of models in a (recurring) campaign, refine- ment when model performance results are used to revise other models, making decisions on whether completely new models are required given model performance, and so on. We distinguish deployment from the SEMMA-supported phase (intelligence) because deployment often engages the broader organization and requires a predictive model (or models) to be put into actual business use. This section introduces a new methodology we created to describe deployment: “DEEPER” (Design, Embed, Empower, Performance- measurement, Evaluate, and Re-target). Figure 2 depicts the iterative relationship between SEMMA and DEEPER.

The DEEPER phases delineate, in sequential fashion, the types of activities involved in model deployment with a special emphasis on campaign management. The

design phase involves making plans for how to transition a scoring model (or models) from the tool suite (where it was developed) to actual application in a business context. It also involves thinking about how to capture the results of applying the model and storing those results for subsequent analysis. There may also be other data that a campaign manager wishes to capture, such as the time taken before seeing a response from a target. A proper design can elimi- nate missteps in a campaign. For example, if a targeted catalog mailing is enabled by a scoring model developed using SEMMA, then users must choose which deciles to target first, how to capture the results of the campaign (e.g., actual purchases or requests for new service), and what new data might be appropriate to capture during the campaign.

SEMMA

INTELLIGENCE DEPLOYMENT

DEEPER

EVALUATE PER FO

R M

A N

C E E

M POW

ER EMBED

D ES

IG N

R

E TA

RG ET

M EA

S U

R E

SAMPLE

E X P LO

R E M

O DI

FY

MODEL ASSESS

Figure 2. DEEPER phases guide the deployment, adoption, evaluation, and recalibration of predictive models.

Table 1. Preliminary benefits of in-database analytics

Process Benefits

Data set creation and preparation Reduce cycle time by parallel-processing multiple functions; accurate andtimely completion of tasks by functional embedding

Data processing and model buildingby multiple analysts Eliminate multiple versions of truth and large data set movements to andfrom analytical tool suites

Unstructured data management Broaden analytics capability by streamlining repeatability and reusability

Training and standardization Create operational and analytical efficiencies; access to latest developments; automatically update metadata

22 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

Once designed, the model must be accurately embedded into business processes. Model score views must be secured; developers must ensure scores appear in user interfaces at the right time; and process managers must be able to insert scores into automated business process logic. Embedding a predictive model may require safeguards to exceptions. If there are exceptions to applications of a model, other safeguards need to be considered.

Making the results of a predictive model (e.g., a score) available to people and systems is just the first step in ensuring it is used. In the empower phase, employees may need to be trained to interpret model results; they may have to learn to look at data in a certain way using new interfaces; or they may need to learn the benefits of evidence-based management approaches as supported by predictive modeling. Similarly, if people are involved, test- ing may be required to ensure that training approaches are working as intended. The empower step ensures appropriate behaviors by both systems and people as they pertain to the embedding of the predictive model into business processes.

A campaign begins in earnest after the empower phase. Targets receive their model-prescribed treatments, and reactions are collected as planned for in the design phase of DEEPER. This reactions-directed phase, performance measurement, involves ensuring the reactions and events subsequent to a predictive model’s application are captured and stored for later analysis. The results may also be captured and made available in real-time support for campaign managers. Dashboards may be appropriate for monitoring campaign progress, and alerts may support managers in making corrections should a campaign stray from an intended path. If there is an anomaly, or when a campaign has reached a checkpoint, campaign managers take time to evaluate the effectiveness or current progress of the campaign. The objective is to address questions such as:

■■ Are error levels acceptable?

■■ Were campaign results worth the investment in the predictive analytics solution?

■■ How is actual behavior different from predicted behavior for a model or a model decile?

This is the phase when the campaign’s effectiveness and current progress are assessed.

The results of the evaluate phase of DEEPER may lead to a completely new modeling effort. This is depicted in Figure 3 by the gray background arrow leading from evaluate to the sample phase of SEMMA. This implies a transition from deployment back to what we have referred to as intelligence. However, there is not always time to return to the intelligence cycle, and minor alterations to a model might be deemed more appropriate than starting over. The latter decision is most prevalent in time-pressured, recur- ring campaigns. We refer to this phase as re-target, which requires analysts to take into account new information gathered as part of the performance management deploy- ment phase. It also takes advantage of the plans for how this response information was encoded per the design phase of deployment.

The most important consideration involves interpreting results from the campaign and managing non-performing targets. A non-performing target is one that scored high in a predictive model, for example, but that did not respond as predicted. In a recurring campaign, there may be an effort to re-target that subset. There could also be an effort to re-target the campaign to another set of targets, e.g., those initially scored into other deciles. Re-targeting can be a time-consuming process; new data sets with response results need to be made available to predictive modeling tool suites, and findings from tracking need to be incorpo- rated into decisions.

DEEPER provides the context for considering how improvements to in-database analytics can be game-chang- ers. In-database analytics can make significant inroads to DEEPER processes that take time and are under-supported by predictive modeling tool suites. However, these improve- ments will be driven by analysts who work closely with their organizations’ database experts. This combination of analyst and data management skills, experience, and knowledge will spur innovation significantly beyond current expectations.

PREDICTIV E MODELING

23BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

How Might In-Database Analytics for DEEPER Evolve? Extending in-database analytics to DEEPER processes requires considering how each DEEPER phase might be streamlined given tighter coupling between predictive modeling tool suites and databases/data warehouses. Although many of the advantages of this tighter coupling may be realized differently by different organizations, there are generic value streams to guide efforts. Here the phrase “value stream” refers to process flows central to DEEPER. This section discusses these generic value streams: (1) intelligence-to-plan, (2) plan-to-implementation, (3) implementation-to-use, (4) use-to-results, (5) results-to- evaluation, and (6) evaluation-to-decision.

In the design phase of DEEPER, planning can be facili- tated by examining possible end-user database views that could be augmented with predictive intelligence. Instead of creating new interfaces, it is possible that Web pages equipped with embedded database queries can quickly retrieve and display predictive model scores to decision makers or front-line employees. Many of these displays are already incorporated into business processes, so opportuni- ties to use the tables and queries to supply model results can streamline implementation. When additional data items need to be captured, that data may be captured at the point of sale or other customer touch points. A review of current metadata may speed up the design of a suitable deployment strategy. In addition to “pushing” model intelligence to interfaces, there may also be ways of “pulling” data from the database/warehouse to facilitate re-targeting or for initiating new SEMMA cycles.

For example, it may be possible to design queries to automate the retrieval of data items such as target response times from operational data stores. Similarly, it may be possible to use SQL to aggregate the information needed for this type of next-step analysis. For example, total sales to a customer within a specified time period can be aggregated using a query and then used in the re-targeting phase to reflect whether a target performed as predicted. In-database analytics can support the design phase because it eliminates many of the traditional bottlenecks such as complex requirements gathering and the creation of formal specification documents (including use cases). Instead,

existing use cases can be reviewed and augmented, and database/warehouse–supported metadata facilities can support the design of schema for capturing new target response data. We refer to this as an intelligence-to-plan value stream for the in-database analytics supported design deployment phase.

In the embed phase, transferring scored model results to tables is a first step in considering ways to make use of database/warehouse capabilities to support DEEPER. Once the scores are appropriately stored in tables, there are many opportunities to use queries to embed the scores into people-supported and automated business processes. For example, coding to retrieve scores for inclusion in front-line employee interfaces can be done in a manner consistent with other embedded SQL applications. This saves time in training interface developers because it implies that the same personnel who implemented the interfaces can effectively alter them to include new intelligence.

There is also no need for additional project governance functions or specialized software. In fact, database/ warehouse triggers and alerts can be used to ensure that predictive analytics are used only when model deployment assumptions are relevant. As the database/warehouse is the same place where analytic model results reside, there are numerous implementation advantages. We refer to this as a plan-to-implementation value stream for the in-database analytics supported embed deployment phase.

After implementation, testing will ensure that model results/scores are understandable to decision makers (the empower phase) and that their performance can scale when production systems are at high capacity. Such stress tests can be conducted in a manner similar to database view tests. Because of the inherent speed of database/ warehouse systems, their performance will likely exceed separate, isolated workbench performance. Global roll-out can be eased by tried-and-true database/warehouse roll-out processes. We refer to this as an implementation-to-use value stream for the in-database analytics supported empower deployment phase.

Similarly, the use-to-results value stream is that part of a campaign when actions are taken and targets respond.

24 BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

In this performance management phase of deployment, dashboards can be used to track performance, database tables can automatically collect and store ongoing campaign results, queries can aggregate responses over time as part of automating responses, and many other in-database solutions can help to streamline related processes. This information is central to the evaluate phase, where the results-to-evaluation value stream can enable careful scrutiny of the predictive analytics model portfolio. Queries can be written to compare actual results to those predicted during SEMMA phases. When more than one model has been constructed in the SEMMA processes, all can be re-examined in light of the new information about responses. If-then statements can be embedded in queries to identify target segments that have responded according to business goals, and remaining non-responders can be quickly identified.

Such analysis can be done for each analytical model in the portfolio and for each decile of predicted respondents associated with those models. This has been an enormously time-consuming process in the past, but the database/ware- house query engine can conduct this type of post-analysis efficiently. Queries can also identify subsets of respondents that outperformed the predicted model performance—and those that significantly under-performed. This type of analysis can be quickly supported through queries, and it can provide significant insight for the re-target phase.

Following the results-to-evaluation value stream of the deployment cycle, the evaluation-to-decision value stream focuses on whether a new intelligence cycle (a repeat of SEMMA processes) is required. If performance results indicate major model failures, then a repeat is likely necessary to resurrect and continue a campaign. Even if there weren’t major failures, environmental changes such as economic conditions may have rendered models outdated. Data collected in the performance evaluation phase may help to streamline the decision process. If costs aren’t being recovered, then it is likely that either the campaign will cease or a new intelligence cycle is necessary.

Often a portfolio of models is created in the initial intel- ligence cycle. It may be possible to use queries to automate the process of recalculating the prior and anticipated

performance of the models in the portfolio. If models exist that were not used but appear to perform better, those models may be used in the next DEEPER cycle. Alterna- tively, a combination or pooling of models might be most appropriate. Again, automated queries might be able to provide decision support for such pooling options, and they can aid in scheduling the appropriate model for the data sets as the DEEPER cycle progresses. In addition, it may be possible to use queries to apply business rules to manage data sets, and prior results could inform the scheduling of resting periods for targets such that each target isn’t inundated with catalog mailings, for example.

Table 2 summarizes key generic value streams that can be supported by in-database analytics and briefly describes the possibilities discussed in this section. Opportunities to evolve in-database analytics are likely to be numerous.

Conclusion In-database analytics create an environment where functions are embedded and processed in parallel, thereby streamlining the steps of both intelligence (e.g., SEMMA) and deployment (e.g., DEEPER) cycles. As data sources are updated, attribute names and formats may change, yet they are sharable. In-database analytics can support quality checks and create warning messages if the range, format, and/or type of data differ from a previous version or model assumptions. If external data has attributes that were not in the data dictionary, metadata can be updated automatically. Data conversions can be handled in-database and only once instead of being repeated by multiple modelers. In-database analytics fosters stability, enhances efficiency, and improves productivity across business units.

In-database analytics will be critical to a company’s bottom line when models are deployed and there is time pressure for multiple, successive campaigns where ongoing results can be used to build updated, improved predictive models. Enhancements can be realized in a host of value streams. For example, in-database analytics can significantly reduce cycle times for rebuilding and redeploying updated models to meet campaign dead- lines. As multiple models are constructed, in-database analytics will enable managing them as a portfolio. Timely responses, tracking, and fast interpretation of

25BUSINESS INTELLIGENCE JOURNAL • VOL. 15, NO. 2

PREDICTIV E MODELING

early responders to campaigns will enable companies to fine-tune business rules and react in record time.

As the fine line between intelligence and deployment cycles fades because of the fast-paced environment supported by in-database analytics, businesses may move away from the concept of campaign management into trigger- based, “lights-out” processing, where all data feeds are automatically updated and processed, and there is no need to compile data into periodic campaigns. There will be real- time decision making with instant scoring each time there is an update in one of the important independent variables. Analysts will spend their time fine-tuning model perfor- mance, building business rules, analyzing early results, monitoring data movements, and optimizing the use of multiple models—instead of dealing with the manual tasks of data preparation, data cleansing, and managing file movements and basic statistical processes that have been moved into the database/warehouse.

Although lights-out processing is not on the near-term horizon, the evolution of in-database analytics promises to move organizations in that direction. Once in the hands of analysts and their database/warehouse teams, in-database analytics will be a game-changer. ■

References Azevedo, Ana, and Manuel Felipe Santos [2008]. “KDD,

SEMMA AND CRISP-DM: A Parallel Overview.” IADIS European Conference Data Mining, pp. 182–185.

Fayyad, U. M., Gregory Piatetski-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy [1996]. Advances in Knowledge Discovery and Data Mining, AAAI Press/The MIT Press.

Gray, Paul, and Hugh J. Watson, Hugh [2007]. “What Is New in BI,” Business Intelligence Journal, Vol. 12, No. 1.

Houghton, Bob, Omar A. El Sawy, Paul Gray, Craig Donegan, and Ashish Joshi [2004]. “Vigilant Information Systems for Managing Enterprises in Dynamic Supply Chains: Real-Time Dashboards at Western Digital,” MIS Quarterly Executive, Vol. 3, No. 1.

Pfeffer, Jeffrey, and Robert I. Sutton [2006]. “Evidence Based Management,” Harvard Business Review, January.

Table 2. Generic value streams and areas for innovation with in-database analytics

Intelligence-to-plan Planning is streamlined; push and pull strategies are feasible; schema design can support planning

Plan-to-implementation Scores maintained in-database; embedded SQL in HTML can facilitate view deployment; triggers and alerts can be used to guard for exceptions

Implementation-to-use Stress testing and global rollout follow database/warehouse methodologies and rely on common human and physical resources

Use-to-results Dashboards can be readily adapted; database/warehouse tables can be used as response aggregators

Results-to-evaluation Re-examine all created models efficiently in light of response information; embed if-then logic to re-target non- responders

Evaluation-to-decision Consider applying different models; allow targeted respondents to “rest”; use database to provide decision support for deciding to re-target or re-enter the intelligence cycle

Copyright of Business Intelligence Journal is the property of Data Warehousing Institute and its content may

not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written

permission. However, users may print, download, or email articles for individual use.