DM.zip

Assignment12_5.docx

1

Data Security

Student’s Name

Professor’s Name

Course Name

Institution Name

Date

Data Security

Today, data security is among the biggest issues facing companies as the number of data breaches continues to increase. It should be noted that organizational data breaches may have lasting security and public confidence consequences. According to Gwebu et al. (2018), data breaches are both expensive and damaging as organizational operations have to slow down to address the issues. The time lost is viewed as economic loss; furthermore, it reduces public confidence as private consumer data is availed to the public exposing them to unwarranted security risks (Gwebu et al., 2018). Nonetheless, numerous strategies could have been employed to prevent data breaches in educational institutions.

According to a study by Talesh (2018), most data breaches are instigated by employee negligence or lack of awareness of the importance of data security. Nevertheless, trainings current employees and hiring computer literate coupled with rigorous screening can offer protection. The strategy is also supported by Cheng et al. (2017) who states that investing in employee training is the simplest approach to protecting client data against data breaches. The integration of IT and education is still taking place thus institutional employees are required to have the required training to allow secure operation. Alternatively, the use of multi-layered security systems offers sufficient protection from external attacks. Though internal threats are top concerns in data breaches, external protection is also important (Cheng et al., 2017). It prevents the occurrence of attacks through brute force, DDoS (persistent attacks), and backdoor threats. Finally, regular risk assessments offer a proactive approach to preventing data breaches. Regular assessment of data security measures allows organizations to identify threats before being security issues (Gwebu et al., 2018).

Overall, the current technology industry is developing at an exponential rate with hacking tools being freely available on the internet. Educational institutions are expected to invest in data protection in order to safeguard client data.

References

Cheng, L., Liu, F., & Yao, D. (2017). Enterprise data breach: causes, challenges, prevention, and future directions. Wiley Interdisciplinary Reviews: Data Mining And Knowledge Discovery7(5), e1211. https://doi.org/10.1002/widm.1211

Gwebu, K., Wang, J., & Wang, L. (2018). The Role of Corporate Reputation and Crisis Response Strategies in Data Breach Management. Journal Of Management Information Systems35(2), 683-714. https://doi.org/10.1080/07421222.2018.1451962

Talesh, S. (2018). Data Breach, Privacy, and Cyber Insurance: How Insurance Companies Act as “Compliance Managers” for Businesses. Law & Social Inquiry43(02), 417-440. https://doi.org/10.1111/lsi.12303

Assignment12_5brevision.docx

1

Data Mining

Student’s Name

Professor’s Name

Course Name

Institution Name

Date

Data Mining

Comparable to various other industries, technology has brought about numerous developments that have allowed practices such as data mining more predictive. According to Dutta et al. (2017), the current configuration of the financial system is heavily reliant on standard mass marketing strategies and consumer service techniques. The researchers continued by stating that the financial sector is predestined to face failure if data collection on various fiscal activities is not conducted (Dutta et al., 2017). Traditional banks that ignore the importance of technology are fixated on financial resources not considering the possibilities available in the future. Moreover, according to Hassani et al. (2018), before technological advancements, the value of information in the financial sector was centered on economic activities and the number of consumers. The quality and quantity of information were limited thus restricting the scope of the industry’s development (Hassani et al., 2018) t. Nonetheless, the baking sector based on its constant interaction with clients is viewed as one of the greatest sources of consumer data. It is gathered without intrusive or expensive research programs.

The previously identified notion of data mining in the financial sector is supported by Bartoletti et al. (2018) stating that data mining has been one of the industry’s core practices. The data has numerous functions relating to consumer satisfaction and increment of market share. However, the advent of technology tools such as the internet and computers, have revolutionized the role of data (Bartoletti et al., 2018). It is crucial to appreciate that the modern financial sector handles enormous amounts of data and information. The approach has allowed the financial sector to significantly develop due to predictive actions and consumer considerations thus improving operations. Zhong and Enke (2017) argue that the easy access to consumer data allows the sector to develop predictive patterns that have allowed the sector to be a key economic performance determinant.

Other than industrial development and consumer satisfaction, big data has numerous other roles in the financial sector. Firstly, the prevention and detection of fraud; based on the assessment of the industry by Dutta et al. (2017), fraud and other financial crimes are the main areas of concern for financial institutions. However, by taking advantage of the available information, big data can be applied by processing available data and creating predictions preventing fraudulent happenings (Dutta et al., 2017). Modern technology has also allowed financial institutions to better implement regulatory compliance. It takes advantage of numerous technologies which evaluate the large dataset of consumers determining their compliance to regulations. Moreover, data mining is used to appreciate the expectation of the consumers (Dutta et al., 2017). By exploiting the previously mentioned non-intrusive and inexpensive strategies of data collection available, the industry can predict consumer expectations relating to various factors such as tailored banking rates.

Though not limited to the identified roles, data mining in the financial sector has numerous potentials. Currently, its role scopes consumer experience and improved performance thus advancing its overall importance in the economy. However, the future of the practice is expected to streamline and diversify the financial sector. The notion is supported by Hassani et al. (2018) who identifies that machine learning and artificial intelligence are technologies being for better utilization of big data. It eliminates the common factor associated with poor performance, human error. The growth of data mining in the financial sector also expresses the available potential data in various activities.

References

Bartoletti, M., Pes, B., & Serusi, S. (2018). Data Mining for Detecting Bitcoin Ponzi Schemes. 2018 Crypto Valley Conference On Blockchain Technology (CVCBT). https://doi.org/10.1109/cvcbt.2018.00014

Dutta, I., Dutta, S., & Raahemi, B. (2017). Detecting financial restatements using data mining techniques. Expert Systems With Applications90, 374-393. https://doi.org/10.1016/j.eswa.2017.08.030

Hassani, H., Huang, X., & Silva, E. (2018). Digitalisation and Big Data Mining in Banking. Big Data And Cognitive Computing2(3), 18. https://doi.org/10.3390/bdcc2030018

Zhong, X., & Enke, D. (2017). A comprehensive cluster and classification mining procedure for daily stock market return forecasting. Neurocomputing267, 152-168. https://doi.org/10.1016/j.neucom.2017.06.010

DataAnalyticsTools.docx

Running Head: TOOL 1

TOOL 12

Data Analytics Tools

Name

Institution

Professor

Course

Date

The power BI is mainly a business intelligence service that is cloud-based suite by Microsoft. Its main use is converting data that is raw into information that is meaningful through use of visualizations that are intuitive as well as tables. The benefits associated with it include affordability which is a key advantage whenever it is being used for analysis of data as well as visualization. It is said to be relatively inexpensive. The Power BI version of the desktop tends to be free of charge, while the solution of cloud service gets charged for $9.99 for every user on a monthly basis. Therefore, it is provided at a price that is fair in comparison to different tools of BI. The plan that is premium starts from $4,995 in a monthly basis per dedicated storage resources and cloud compute. In addition, it provides custom visualizations that are if a wide range which means visualizations made by the developers for a use that is specific. Power BI has also the options of uploading as well as viewing personal data within Excel. One is capable if selecting, filtering and slicing data within a report of Power BI it dashboard in addition to putting it on Excel.

The disadvantages of power BI include: it is considered to be appropriate when handling relationship that is simple in between tables within a data model. However, in case there are relationships that are complex in between tables, if they happen to have above a single link in between tables. Power BI is not in a position of handling them appropriately. One is supposed to create a model of data carefully through having fields that are more unique in order for the power BI not to confuse the relationships whenever it is in relation to relationships that are complex. Another disadvantage is that its user interface happens to be crowded as well as bulky by the clients (Gandhi, V. I., Subramaniyaswamy, V., & Logesh, R. (2017).

Vendor Name: Microsoft Power BI

URL: https://powerbi.microsoft.com/en-us/

References

Arfaoui, M. (2020). Pilot project to transform a BI solution from microstrategy to power BI (Doctoral dissertation). Retrieved from https://run.unl.pt/handle/10362/94475

Gandhi, V. I., Subramaniyaswamy, V., & Logesh, R. (2017). Topological review and analysis of DC-DC boost converters. Journal of Engineering Science and Technology, 12(6), 1541-1567. Retrieved from http://jestec.taylors.edu.my/Vol%2012%20issue%206%20June%202017/12_6_9.pdf

DATAMINING.edited.docx

Running Head: DATA MINING 1

Running Head: DATA MINING 9

DATA MINING

Name

Institution

Date

Introduction

First, data mining has been defined as a process usually used by companies or businesses to change raw data into useful information (Alasadi & Bhaya, 2017). These companies use software to learn more about their consumers by looking for patterns in large amounts of data. They learn more about their consumers so that they can develop more marketing strategies that are effective. They also learn more about their customers to ensure that their businesses increase sales and decrease costs. Data mining is also a process used to find anomalies, patterns, and correlations in large data sets to predict outcomes. Many companies use the data mining process in a range of techniques to achieve things like improve revenues and customer relation, reduced costs and even reduced risks. Data mining is an essential process in business, but it carries its advantages and disadvantages. There are also different types of data mining that companies can choose to use in business activities. More so, data mining is applied in business when there is an identification of a problem. For instance, in a business setting, a problem that needs a data mining process can be something like customer profiling. In a business, customer profiling is a critical aspect that is applied primarily to marketing. When a business builds their customers profiles, it builds their customer relationship and improves their customer services. Understanding that there are different customers in a business will help the business owners set their customer profiles, which will help them use different strategies on different customers. The data mining process is essential when it comes to building the customers profiles in the business as it plays a significant role in providing information about the customers.

Background information

Building a customers profile is an interesting problem in a business because it is an important marketing tool. When doing the customer profiling, these profiles can be classified into different types: the behaviour profile, the demographic profile, and maybe a hobby profile for the customers. A business should understand their customers starting from the customers' behaviour, which is a business setting refers to how the customer's purchase or consumer the company's products and services and what they have to say about these products and services. This problem is interesting because the company should also know the hobbies of their target consumers to find ways that are suitable to provide required products and services to their customers. Understanding the demographics of their customer's companies are supposed to look at the information regarding things like the age of the target population and their incomes to determine their products' prices and what type of products and services they can offer.

There are several techniques that can be used in data mining but to be precise in our identified problem of customer profiling, and we can use the classification technique. This technique is most accurate when dealing with customer profiling because it is mainly used to retrieve relative information or data concerning the customers. As its name states, classification is used to classify different information into different classes that can be easy to analyze. The business analyst will use the classification method because, unlike the other classification methods, the analysts can know about the different classes of the customer profiles. In classification, the analysts apply algorithms to decide how the newly acquired data should be classified. This classification method also has its models used to explain how the data mining is done, and one primary model is the decision model.

Discussion

Data mining involves using vast amounts of data in their analysis to identify patterns (Rohit, 2020). For an analyst to get the information that they can use in customer profiling during data mining, there are sources they should consider for the appropriate data. Since it involves customers, the analyst can dig into the transactional data where most data stored here are kept when customers purchase it. This is an essential source of information to build a customer profile because the analyst can identify what the customer purchases most frequently and how they spend on their purchases and how often these customers purchase goods or get services from this particular company. It is to mine data in the transactional data because each transaction usually has a unique identifier. The items that have been purchased are usually listed in the transactional data source.

Apart from the transactional data, the analyst can do the data mining from a source we refer to as the data warehouse. This data warehouse is usually a single data storage location that collects and stores piles of data from different sources. The data collected in the data warehouse is usually stored in a unified plan, making it easy for an analyst to do data mining. Since we are dealing with a problem concerning customer profiling, the data mining in the data warehouse should concern the businesses, customers. The data warehouse makes it easy to mine such data because the analyst can find the data they needed even months ago, and this data is usually summarized. Information collected and stored in the data warehouse is cleaned and integrated, and in some cases, the information is refreshed. The data is also organized in several parts, which also makes it easy when data mining.

And finally, another source that we can do data mining to get information that will help the business in doing customer profiling is in the company's databases. The database management system in a company usually stores data related to each other; therefore, it is easy to find information related to the customers and get on with the customer profiling. Since the database management system uses software to manage data, it also makes it easier for the analyst to access it. Information from the database management system is accessible to mine because it is arranged in tables with names and different attributes; since we are looking for data that will be used in doing customer profiling, it easy to identify the specific names of the customers and their attributes.

Since we have found the sources of our data concerning customer profiling, there are steps required to obtain these data types from these sources. We have seen how accessible the sources are to mine data, but we have to ensure that the data mining process is efficient. Therefore the first step is identifying the sources of the data, which we already have found some. After knowing where we will find our data, we need to identify the specific key data points that we need. In our case, we are dealing with customer profiling; therefore, we will be picking on data related to the customers. The data sets we will be dealing with will be complex; that is why we need to identify the exact information we need to make the data mining process fast and efficient. After we have identified what type of data we need to pick, we extract the data from these sources. And in extracting, we only take the information related to our problem to avoid confusion of data sets.

After extracting the data, we need the data sources we go on to interpret the data (Tejal, et.al. 2017). We now identify the valid information that can build our customer profiles. This information should help our company, or instead, business improves the customer relationship and improves our customer services and understands what our customers prefer. After interpreting this data, we need to arrange the information to be interpreted easily to get results. The results should also help our company to solve the problem of customer profiling. The whole point of doing data mining was to help our company to build the customer profile, which will, in turn, benefit the company to satisfying their customers. Our company was supposed to figure out the patterns in the data obtained to identify the best ways to satisfy our customers. In the business today, understanding your customers' well is an essential tool to help in competing with others in the market.

When doing data mining to help in customer profiling, the business mainly needs to consider our customers' essential or rather factual data like the names, age, etcetera. There are also what we call algorithms that are used during data mining. These algorithms include what we call “Apriori for association rules mining and CART algorithm (Classification and Regression Trees) for classification rules" the apriori algorithm is mainly used to extract data from databases that generally contain the transactions of the customers. Simultaneously, the classification and regression trees are used to make predictive models that classify items naming their attributes. The classification and regression tree model is mainly used when obtaining data from the data warehouses.

Most companies have taken the lead in competition because of how well they understand their customers. Doing customer profiling for a business mainly is used to understand the customers' behaviour, which will guide the company to reach the perspective of their ideal customer. In customer profiling, the business's primary focus is its customer because the business usually develops around its customers. Customer profiling majorly focuses on the transactional and personal details of the customers. The company will have an easy way to retain their customers and attract new customers when they understand the patterns on their customers' transactional and personal information.

Business uses the data mining process mostly to predict their customer behaviour and also the market trends. Business analysts mostly use the available data because it provides a comprehension concerning their customers. The whole data mining process requires the company to research the resources that will be used and the current trends to help the company reach their goal. The mined data should be checked if the quality can be reliable so that the company can benefit from this information and not go wrong in solving the problem they are facing. The data used to help the company, for instance, in this case of generating the customer profiles, should be cleaned and formatted anonymously to ensure that the analyst receives the information correctly. This data should also be modelled and transformed to be reliable to the data analyst.

Conclusion

In conclusion, the business should prioritize their customers to ensure that they are satisfied for the business to succeed. When the customer is made happy, there are very high chances that they will stick to your products; therefore, businesses should work hard to understand their customers by building customer profiles through data mining. It is also essential to use the proper techniques in doing data mining depending on the problem to find a proper solution.

References

Alasadi, S. A., & Bhaya, W. S. (2017). Review of data preprocessing techniques in data mining. Journal of Engineering and Applied Sciences, 12(16), 4102-4107.

Tejal Upadhyay, Atma Vidhani, Student, Vishal Dadhich, march 2016, Customer Profiling and Segmentation using Data Mining Techniques pgs65-67. (http://csjournals.com/IJCSC/PDF7-2/10.%20Tejpal.pdf).

Rohit Sharma, April 30, 2020. Data Mining Techniques: Types of Data, Methods, Applications. (https://www.upgrad.com/blog/data-mining-techniques/).

DataMiningSignificance.docx

Running Head: DATA 1

DATA 3

Data Mining Significance

Name

Institution

Professor

Course

Date

Mining of data is basically a procedure that organizations use in converting data that is raw to information that is useful. Through the use of software into finding patterns within massive data batches, enterprises are capable of learning a lot more regarding their clients to create marketing strategies that are more effective, have sales increased and costs reduced. Mining of data is of great significance because marketing firms make use of it in creating data models as well as predictions depending on data that is historical. They are able to run campaigns, strategies for marketing in addition to pivots whenever needed which results in rapid development and success. The retail industry on the other hand, have their believe in models that are based on prediction for their own services and commodities. It enables the Retail stores to have insights that are advanced in relation to their productions in addition to consumer. They have their redemption codes as well as discounts based on data that is historical (Luo and Xiang, 2020).

In addition, mining of data is used by the banks to assist with financial updates as well as benefits. They create models that rely on consumer data so as to have the loan process checked out which is practically dependent on the data mining. Manufactures make use of data mining in having data engineered and detection of the devices as well as products that are faulty which helps in cutting off the items that are defective from the particular list and have them replaced with the most suitable services as well as commodities. The bodies of governments use data mining significance in analyzing the financial transactions in addition to data so that they can be modeled into information that is of use. It also helps organizations in improving their planning and making of decisions. (Ren, et.al, 2020)

References

Luo, Y., & Xiang, Y. (2020). Application of data mining methods in Internet of Things technology for the translation systems in traditional ethnic books. IEEE Access, 8, 93398-93407. Retrieved from https://ieeexplore.ieee.org/abstract/document/9093869/

Ren, J., Zhang, Q., & Liu, F. (2020). Analysis of factors affecting traction energy consumption of electric multiple unit trains based on data mining. Journal of Cleaner Production, 262, 121374. Retrieved from https://www.sciencedirect.com/science/article/pii/S0959652620314219

RandomSampling.docx

Running Head: DATA 1

DATA 12

Random Sampling

Name

Institution

Professor

Course

Date

A simple random sample basically is considered to be a subset that happens to be selected in a random manner belonging to a population. When making use of this particular method of sampling, every single member of the population is known to have an exact chance that is equal of getting selected. In the case of statistics, a sample of a simple random can also be described as an individual’s subset that is chosen from a set that is larger. Every single person is selected in a manner that is random as well as by completely sheer luck, in such a way that every person happens to have a probability that is the same. Through this method, it is quite obvious that getting to select the necessary items entirely relies on a chance or it is through a probability, therefore, due to this , the technique for sampling is also referred to as a method that uses chances a times (Albahri, et.al, 2020)

A uniform distribution basically is considered to be a term that is utilized in describing a type of a probability distribution whereby every outcome that is possible happens to have a likelihood that is equal in relation to its occurrence. The given probability happens to be constant due to the fact that every variable has chances that are equal of becoming the outcome. Instances of sampling data through the use of a distribution that is different from a distribution that is uniform. Yes it is very possible to have instances of sample data through use of a distribution that differs from the distribution that is uniform (Gupta and Chandra, 2020)

For example, when using excel, an individual can perform this so as to get a set of a sample data.

For the creation of a sample that is of 10 elements from the normal distribution that are standard, have the formula placed = NORMS.INV(RAND()) within A1 cell, then have the range highlighted A1:A10 then have Ctrl-D pressed.

References

Albahri, A. S., Hamid, R. A., Alwan, J. K., Al-Qays, Z. T., Zaidan, A. A., Zaidan, B. B., … & Madhloom, H. T. (2020). Role of biological data mining and machine learning techniques in detecting and diagnosing the novel coronavirus (COVID-19): a systematic review. Journal of medical systems, 44, 1-11. Retrieved from https://link.springer.com/content/pdf/10.1007/s10916-020-01582-x.pdf

Gupta, M. K., & Chandra, P. (2020). A comprehensive survey of data mining. International Journal of Information Technology, 1-15. Retrieved from https://link.springer.com/content/pdf/10.1007/s41870-020-00427-7.pdf

Week7Assignment3.docx

Running Head: WEEK 7 ASSIGNMENT THREE 1

WEEK 7 ASSIGNMENT THREE 3

Week 7 Assignment 3

Name

Institution

Professor

Course

Date

Week 7 Assignment 3

Clustering analysis is identified as being the task associated with the grouping of a set of the objects in such a manner that objects in the same type of group that is called a cluster is considered to be more the same to another than to those that are in other types of groups. One of the most common clustering utility is identified as being the segmentation of the customer portfolio that is usually based on the identified demographics as well as the transaction type of behavior and even other types of behavioral attributes (Smith, 2014). 

Analytics industry is one that is dominated by the identified objective modeling such as the decision tree as well as regression. If the identified decision tree is considered to be able of carrying out segmentation, it is important to understand if there is the need of an open ended type of technique. The answer to this type of question is involved in understanding the advantages associated with the use of the clustering technique. Clustering is entailed in the Generation of the natural clusters as well as is not dependent on any kind of driving objective associated function (Smith, 2014). 

Therefore, such a type of cluster can be utilized in the analysis of the portfolio that is on the distinct target associated attributes. For example one can use the example of the decision tree which is constructed on the identified customer associated profitability in the following three months. This type of segmentation cannot in any way be utilized in the use of making the identified retention strategy for each of the segment. Therefore, it is vital to understand that clustering is a type of technique that is usually utilized in doing initial profiling regarding the portfolio. There are various algorithm for the generation of the clusters that are in statistics (Hu, 2018). 

References

Hu, V. (2018). Attribute-Based Access Control. Boston, MA: Artech House.

Smith, E. N. (2014). Workplace Security Essentials: A Guide for Helping Organizations Create Safe Work Environments. Place of publication not identified: Butterworth-Heinemann.

ANOMALYDETECTIONTECHNIQUES.edited.docx

Running Head: ANOMALY DETECTION TECHNIQUES 1

Running Head: ANOMALY DETECTION TECHNIQUES 2

ANOMALY DETECTION TECHNIQUES

Name

Institution

Date

First of all, anomaly detection is also what we refer to as the outlier analysis. Anomaly detection is usually a step used in data mining to identify data points or events and an observation taken from a normal dataset behavior (Mehrotra, et.al. 2017). To improve identifying anomalous objects, some techniques are used to combine the multiple anomaly detection techniques. These anomaly detection techniques include the density-based anomaly detection technique, clustering-based anomaly detection technique, and the support vector machine-based anomaly detection technique. The technique that combines the three anomaly detection techniques is called the machine learning-based technique. When detecting anomalous objects, a person should consider the objects' types in the group of objects provided.

Using the density-based anomaly detection technique, we focus mainly on what we refer to as the "k nearest neighbor algorithms." In the density-based anomaly detection technique, there are assumptions that abnormal data points are usually far away from the dense neighborhood while the standard data points are around the dense neighborhood. Using the k nearest neighbor technique means classifying the data points in terms of the distance similarity metrics. This technique can also classify the data objects in terms of the relative density of the data. Here the objects are classified based on the reachability distance (Pramit, 2017).

On the other hand, when we use clustering-based anomaly detection, we assume that similar data points tend to locate themselves in clusters or groups similar to them. In this technique, the similarities of these objects and the groups are usually determined by their local centroids' distance. In the clustering-based anomaly technique, we use the k means, which is used widely as a clustering algorithm. This algorithm usually creates a k similar cluster of data points, and those data points outside the group are marked as anomalies. And finally, the support vector machine-based anomaly detection technique is an algorithm that identifies a soft boundary that uses a training set and classifies the average data. Then use the training instances to identify the abnormal data sets, which usually fall outside the learned region.

References

Mehrotra, K. G., Mohan, C. K., & Huang, H. (2017). Anomaly detection principles and algorithms (p. 217). New York, NY, USA:: Springer International Publishing.

Pramit Choudhary, February 15, 2017, introduction to Anomaly Detection. (https://blogs.oracle.com/ai-and-datascience/post/introduction-to-anomaly-detection).