Data visualisation

profileJohnW
ttttt.pdf

FIT5147 Monash University 30369916

FIT5147 Data Exploration Project

30369916

Jianwei Jing

FIT5147 Monash University 30369916

1.Introduction

After busy weekdays, people like to find some entertainment to take their

weekends. Doing some sports has become important for many people, which

can help them not only stay in a good mood but also keep them healthy.

Meanwhile, for those people who don’t have enough time to do sports, watching

competitive sports has become a way to entertain in their daily life. Competitive

sports have existed in our society for many centuries. We have held Olympic

game for 31 times, which contains multiple kinds of competitive sports. Not

only the Olympic Games, different types of sporting events are held all over the

world which motivate more and more people to participate in competitive

sports.

However, while we enjoy watching competitive sports, injuries are also

troubling professional athletes. In order to get better result, professional athletes

have to do some difficult moves in the game. While improving viewing rate,

these movements also cause a great threat to the physical health of athletes.

NBA (National Basketball Association) is an American men's professional

basketball league, which has many talent professional athletes from all over the

world. However, injuries plague many talented athletes. Some of their

competitive condition are affected after getting injury, some of them have to

end their professional career because of injury.

This project is going to talk about what makes a professional basketball

player have a longer career, which will be discussed based on some facts in

those athletes. We will analyse the injury statistic of basketball professional

players in NBA to discuss which basketball position is more likely to get injury

and how much affect to players after they get injury. Also, we will analyse the

technology statistics of some professional players to discuss at what age does a

basketball player begin to slip. These questions can help us to figure out how to

make a professional basketball player have a good career.

2.Data Wrangling

About data wrangling part, let’s look at the data we have. We have three

tables to do the data exploration including:

Injuries Statistics from 2010 to 2020

FIT5147 Monash University 30369916

Players Technical Statistics from 1950 to 2017

Three Outstanding Players Technical Statistics during their career

We are considering to use R to do the data wrangling work. In order to

discuss about which basketball position is easy to get injured, we need to

combine the injuries statistics table and players technical statistics table to help

us, which means the players technical statistics from 1950 to 2010 are useless

anymore. Here, we create a new table named “injury-stat” by R. Firstly, when

we look at the injuries table, we can see that it has columns called “Acquired”

and “Relinquished”. Actually the values blow these columns are all players’

name, so the first step we are going to combine these two columns as one

column which is called “Player”, then we are going to combine injury table and

statistics table as one table called “injurystat”. After doing that, we can get the

new table below.

Injurystat table

Based on injurystat table we can see the injured Players and their

statistics. After this, we have finished our data wrangling work.

3. Data Checking

We are going to use is.na() of R to check if our data has empty value.

FIT5147 Monash University 30369916

Data Checking Result1

Here we can see. Table “Players” doesn’t have null value, that’s what we

want here. New table “injurystat” has many empty values. For the question

which basketball position is easiest to get injury, we are going to focus on the

attributes “Notes” and “Pos” here. Obviously “Notes” doesn’t have empty

values because this table records the injury description for those players.

However, “Pos” has some empty values. We are considering that because of the

injury, some players don’t have their technical statistics during some year,

which cause the empty values in the table.

Also, in order to get more exploration of the data, we are going to keep

the table “injury” and “Stats” afterward, so we also did data checking for those

two tables.

Data Checking Result2

Table “injury” doesn’t have any empty values, but “Stats” has some

empty values. After checking the table “Stats”, we notice that some attributes

are not recorded in the technical statistics before 1978. That is why we have

many empty values in this table, but it doesn’t matter, we just need the players’

technical statistics after 2010. So far, our data wrangling and checking work

have done already.

FIT5147 Monash University 30369916

4.Data Exploration

Question1: Which basketball position is the easiest position to get injury.

This question we are going to use R to solve this question.

Injurystat table

If we want to discuss which position is easy to get injury. We are going to

focus on these two rows. We can notice that one player’s position is not fixed.

Aj. Price’s position was PG in some years, he could also be SG for some years.

That is because basketball players’ position can change after a season ends.

Based on these facts, we need to use groupby() and summarize() to get the

result we want.

Group_by() and Summarise() code1

Injurystat2 table

FIT5147 Monash University 30369916

Here we can see, regardless of the count, we can get thee players’ name

and their positions summary. Then we are going to calculate the distribution of

the position.

Group_by() and Summarise() code2

Injurystat4 table

Obviously, we can notice that there are 227 players who got injury are

PF, and 223 players who got injury are SG, 200 players who got injury are SF,

PG and C are 162 and 163. It seems that PF and SG are most two difficult

position in professional basketball.

Question2: How much the injury affects Players’ career.

Actually, this question is a tough question to figure out the affection to all

injured players. If we want to figure out the affection to all players, we need to

generate about 1000 tables or visualization results for that. Therefore, we are

going to research some specific player’s technical statistics to discuss this

question.

In this case, we are going to research about two outstanding players in

NBA, Paul George and Derrick Rose. They were so talented for basketball,

however they got serious injured and had to suspend their basketball career for a

few years. Their basketball careers are worth to discuss.

We are going to use R to generate the technical statistics for Paul George

and Derrick Rose.

FIT5147 Monash University 30369916

Paul George’s technical statistics

Derrick Rose’s technical statistics

In this case we are going to focus on one factor in the tables: Ts.(True

Shooting percentage). This factor refers to measure of shooting efficiency for a

player. One more thing need to be mentioned is that Paul George got serious

injury when 2014 (his leg is broken), and Derrick Rose got serious injury when

2012(torn acl). We can focus on the change of the Ts of them after 2014 and

2012.

We can see the line chart below:

Paul George’s Technical Statistics

Derrick Rose’s Technical Statistics

FIT5147 Monash University 30369916

We can notice that after getting injury, their true shooting percentage got

dropped dramatically. Derrick Rose is a good example to reflect the affection

caused by injury, his true shooting percentage got a bit increased but it would

never get to the peak as before. In fact, the effects of injuries are very long

lasting, not just physically, but also mentally. However, by analysing the change

of Paul George’s true shooting percentage, we can also see another fact. Some

players do can improve themselves after getting injury. He even reached a new

peak on his true shooting percentage.

Question3: At what age does a basketball player begin to slip

This question needs to analyse big amount of data to draw the accurate

conclusion, but in this case, we are only going to talk about three outstanding

players for this question.

We need more attributes to discuss this question, so that we are going to

use tableau to solve this problem.

We are going to set age as the x axis and 2P%(2-point percentage), 3P%(3-

point percentage), and FT%(Free Throw Percentage) as the y axis. If we put

them together and get the average of these values visualization, we can see the

result below.

Shooting Percentage Result(Average)

FIT5147 Monash University 30369916

According to the result we got, we can see that after 30, 2P% and 3P%

were gradually dropping, and it was difficult to get back as before. However,

FT% were always steady during their career. That’s because there is no

disturbing when they free throw. The fact is that 30 is a turning point for an

athlete, they wouldn’t do well in games as before when they after 30.

5.Conclusion

After researching the above questions, we can draw our conclusions for

this project. First, in basketball field, SG(Shooting Guard) and PF(Power

Forward) are the positions which are most likely to get injury. We can’t say

other positions are less likely to get injury, but seems that SG and PF are easy to

get injury. There no more explanations for that based on research. In my

personal opinion, SG and PF would jump more times than other positions, this

may cause they are easier to get injury.

Second, in most cases, after players get injury, they would experience a

hard period to get back to peak. Some of them may be not able to get back to

peak anymore, but some of them may reach to a new peak. However, there is no

doubt that injuries can seriously affect a professional athlete's career. Third, it

seems that 30 is a turning point for a professional player, this conclusion needs

more research to prove that. There is one thing we can be sure is that

professional sports players’ career are not that long as other jobs. Players have

to consider seriously what to do after they retire.

6.Reflection

From this data project exportation, I have learnt the knowledge of the R,

Tableau and Excel data process. I have to say data visualization and exploration

are useful and fun. When I tried to solve the questions in different, I could

always find some fun facts and somethings new for me. Actually, data is a good

tool to measure professional players status. As what I know, ESPN will do a lot

of athlete data research after a season ends. Sometimes ESPN will release some

fun facts of players based on the data analysis. We can see the importance of

data for professional players. Wish every athlete has a perfect career.

Data exploration and visualization are very useful for measuring the facts

happening in this world. Sometimes when we feel confused about something,

we can use this tool to get a better angle to consider things.

FIT5147 Monash University 30369916

7. Bibliography

NBA Advanced Stats. [online] Available at:https://stats.nba.com/leaders/?Season=2019-

20&SeasonType=Regular%20Season [Accessed 18 Sept 2020]