AV_Disk2

profileUser3337
AV_Disk2.docx

Discussion 1

Importance of R language among various statistical programming languages

Organizations in every industry have started realizing the fact that - the secret to success is being able collect, store and analyze data at a faster pace than the competitors. The consequence of this big data revolution is that the hiring demand for data scientists with Hadoop, NoSQL, Python programming, R programming and other big data skills is heating up. With big data analytics and machine learning driving intelligence in almost every Internet connected device, software application and smartphones, R is a powerful statistical tool that data scientists use to find answers from the large treasure troves of data. R programming helps data scientists with statistical analysis of data more quickly and powerfully when compared to any other statistical computing tools (Zhang, Muller, & Wang, 2020).

R language is used by more than 2 million statisticians and  data scientists across the world, and with the wider adoption of R language for business applications, the usage of this statistical software is increasing exponentially. R programming language was developed for statistical analysis at a small-scale in academic settings. R language is a powerful statistical computing tool for visualizing data, exploring large data sets and creating novel statistical models (Zhang, Muller, & Wang, 2020).

R is on the rise as a powerful business analytics tool with contributions from popular statisticians to the open source community over 20 years. R language is among the most powerful and popular data science tools because it presents different faces to different users. R programming language has been kicking around since 1997 as an alternative to expensive statistical programming tools like SAS or Matlab (Zhang, Muller, & Wang, 2020).

Key features of R language:

R language is Open Source

R language is an all-in-one package of a Statistical Analysis toolkit

R has excellent charting benefits

R language has robust and vibrant online community

R language has a powerful package ecosystem

Advantages of R over other languages:

R contains very large package library which helps in data wrangling and ease the process of data analysis.

R, SQL and python are open source tools whereas SAS is licensed and more expensive.

R got strong open source community which makes constant updates and releases to keep its product features latest and up to date compared to other licensed tools.

Comparative to SQL, SAS and python, R has the advanced graphical features for data analysis process.

R is easily understandable and more user friendly in developing packages for performing custom functions compared to other statistical tools.

R and python programming languages are good and suitable for startups, because they don’t need to purchase the tool. SAS is widely used by MNCs (Korkmaz et al., 2020).

Disadvantages of R over other languages:

As R is a Legendry language with lots of old features it is quite challenging to maintain such a huge code which can’t be used huge data sets.

There are very few options to store in Git and processing require very less memory which makes it difficult to process huge amount of data.

As it is an open source product, finding documentation is very difficult (Korkmaz et al., 2020).

Reference:

Korkmaz, G., Kelling, C., Robbins, C., & Keller, S. (2020). Modeling the impact of Python and R packages using dependency and contributor networks. Social Network Analysis and Mining, 10(1), 1-12.

Zhang, A. X., Muller, M., & Wang, D. (2020). How do data science workers collaborate? roles, workflows, and tools. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW1), 1-23.

Discussion 2

Statistical Programming is the process of utilizing computation techniques to assist in data analysis. It supports statistical techniques such as linear and nonlinear modeling, classification, clustering, and time-series analysis.

The statistical programming languages are developed to sort out the business complexities and escort technological innovations. The popular programming languages are as follows: Python, Java, R, Julia, SAS, SQL, MATLAB, and Scala. These languages assist the data scientist in business analysis (nuggets., 2015).

They assist the data scientist in analyzing the structure in data and developing the predictions in making critical business decisions. The data scientists utilize enormous algorithms in predictive analytics and machine learning. The most important algorithms used in the analysis are linear regression, logistic regression, K nearest neighbor, and K means clustering that is encouraged by the statistical programming languages.

It helps the data scientists in developing and building statistical models and the most preferred statistical language for modeling are R and SAS (RA., 2014). As the business organization deals with huge data for making effective decision-making, Python provides a suitable platform for the data scientist for scientific computing, data mining in predictive analytics, and machine learning techniques in the organization.

The benefits of R Programming language (Ihaka R, 1996):

It performs ad hoc analysis better than python and it is suitable for exploring and analyzing the datasets in the organization.

R can apply in all areas of industry with its wide range of array packages. It has more than 10000 packages in the CRAN repository.

It enables quality plotting and graphing with its famous libraries such as ggplot2 and plotly which makes the graphs aesthetic and fascinating.

It provides effective techniques in developing artificial neural network and assists in performing machine learning operations such as classification and regression.

The drawbacks of R Programming language:

It has a major disadvantage of handling memory, it occupies more space for storing data and it requires it at a single place. So it not an ideal choice for organizations dealing with Big Data.

It has security concerns as it has several restrictions with the R and it lacks basic security. So it cannot be embedded into a web-application.

The operational speed of this language is much slower when compared to the other statistical programming languages.

Overall, statistical programming languages are crucial elements for the data scientists in learning algorithms to deal with the enormous data in the organizations using statistical techniques in machine learning, artificial intelligence, and predictive analytics. R language has more advantages over other programming languages.

REFERENCES

Ihaka R, G. R. (1996). “R: A Language for Data Analysis and Graphics.”.

nuggets., K. (2015). “Primary Programming Language for Analytics, Data Mining, DataScience Tasks: R,Python, or Other.”.

RA., M. (2014). "Statistical Programming Languages: The Popularity of Data Analysis Software.” .