PPSP Data Analysis Part1

AndreaL
DataAnalysisProjectInstruction_Part1.pdf

1

Data Analysis Project Instruction: Part 1

Due in Tutorials in the Week of Oct 1st to 5th or On Quercus Friday Oct 5th, 11:59pm

Purpose:

The objective of this project is to give you the opportunity in using some of the statistical techniques that

you have learned in this course for exploring a real data set.

Submission Format:

You are required to submit a concise written (typed) report (no more than two pages) of your data analysis

with font size of no smaller than 12 points. Your PSPP outputs may be incorporated into the body of your

written report or it may be left in the appendix section of your paper. You may work individually or in

groups of no more than three students. Your group members can be from different tutorial sections. If you

are working in a group, please think of creating a team name for changing a variable name in this data set;

if you are working individually, you will change a variable name with your last name. If you are working

in a group, please submit your work to only one course TA. You can either submit your report in your

tutorial (to one course TA), or you can submit your paper on Quercus (refer to Assignment section:

Analysis 1). The assessment criteria are described on page 7 of this document.

Context of Data:

The Organisation of Economic Cooperation and Development (OECD) gathers various information

regarding OECD countries and its partners in order to promote policies that aims to improve the economic

and social well-being of people around the world (http://www.oecd.org/about/)

This agency collects quantitative information on many domains and makes the collected data available for

public use (e.g., researchers) so that interested individuals can further investigate relationships among a set

of variables. A particular domain is named “Social Protection and Well-being”, which includes a yearly

collection of data “Better Life Index”. This information can be retrieved from: http://stats.oecd.org

2

From the “Better Life Index 2017” (BLI, 2017), the most recent data collected in this domain, we will

analyze a quantitative variable named “Social Network Support”. Information regarding this variable can

be retrieved from: http://www.oecd.org/statistics/OECD-Better-Life-Index-2017-definitions.pdf

(note that this document is also posted on our Quercus page, “Data Analysis Project” module). This

variable is a sub-component of the Social connections/Community component in (BLI, 2017), which

reflects percentage of males and females aged 15-years and over in 35 OECD countries who perceive their

social network as having relatives or friends that they can count on to help them in times of need and

trouble. OECD indicates that they obtained and calculated this information based on Gallup World Poll.

• Let us recap the variables of interest in our data analysis:

1. Percentage of people (15 years of age and older) having social network support

2. Sex of the respondents identified as Male or Female

• I recommend that you read about this data here:

http://www.oecdbetterlifeindex.org/#/11111111111

Also, click on “Community” on the right hand-side menu to be directed to another web-link:

http://www.oecdbetterlifeindex.org/topics/community/

Scroll down that page and you can click and read about each country’s supported network.

PSPP Activity: Task A, and Task B

A. Describing the distribution of percentages of adults who reported having a Social Network Support in OECD countries.

B. Understanding and comparing distributions of percentages of males and females who reported having a Social Network Support in OECD countries.

Overview of Steps:

1. Save the following two data files on your computer (e.g., My Document folder)

• BLI_Support_Net_2017.csv

• BLI_Support_Net_Gender.csv

2. Open each of the above files and make changes to the column headings:

• Change the variable name “Support_network_lastname_teamname” with your last or team name. For example: Support_network_Aslemand

3. Save the excel files that you have modified for their column heading names.

4. Refer to the described PSPP tasks (A, and B) on the next pages. For each task, produce the associated

PSPP outputs and answer the related questions.

5. Please include your PSPP outputs with your written report upon submission. If you are submitting on

Quercus, you can upload three files: One PDF file for your written report and two PDF files for your PSPP

outputs. Name your files (modify with your lastname) as:

o Report 1_Lastname.pdf (e.g., Report 1_Aslemand.pdf)

o PSPP Overall Outputs_Lastname.pdf (e.g., PSPP Overall Outputs_Aslemand.pdf)

o PSPP Gender Outputs_Lastname.pdf (e.g., PSPP Gender Outputs_Aslemand.pdf)

3

Task A. Describe the distribution of perceived social network support.

Open PSPP (from your computer program).

Step 1. Select Files to Import:

▪ In menu bar, go to File > Import Data > (e.g., My Document) ▪ Select the saved data file: “BLI_Support_Net_2017.csv” ▪ Click Next (Bottom of the page)

Step 2. Select the Lines to Import: Click Next

Step 3. Select the First Line:

▪ Select Line “1” (move the blue line from line 0 to line 1) ▪ At the bottom of screen, check off the box: Line Above Selected Line Contains Variable Names ▪ Click Next

Step 4. Choose Separators: Click Next

Step 5. Adjust Variable Formats: Click Apply.

PSPP will open “Data View” and “Variable View”. See the bottom of the screen to change between these

two windows. It is not necessary to change screens.

PSPP Instruction for Task A:

• From top bar menu in PSPP, go to Analyze > Descriptive Statistics > Explore

• Select the variable “Support_network_lastname_teamname” from the list and put it in the “Dependent List”. You should see your lastname or team name.

• Do not close this box yet; click on “Statistics”, select “Descriptive”, “Extreme”, “Percentile”, and click on Continue.

• Do not close the main box yet; click on Paste.

• PSPP will open another window; this is the Syntax Editor window.

• Your code looks like this at the moment:

EXAMINE

/VARIABLES = Support_network_lastname_teamname

/STATISTICS = DESCRIPTIVES EXTREME

/PERCENTILE

/MISSING=LISTWISE.

• We need to add a line for PLOT = BOXPLOT

• So, add the following red line in the code, exactly where I placed mine.

EXAMINE

/VARIABLES = Support_network_lastname_teamname

/STATISTICS = DESCRIPTIVES EXTREME

/PERCENTILE

/PLOT = BOXPLOT

/MISSING=LISTWISE.

4

• Highlight the entire modified code in your PSPP Syntax Editor and go to “Run” from the tool bar menu and click on “All”.

• You will get an output (PSPP icon blinks/flashes at the bottom of your computer screen).

• Open your PSPP output. This output displays tables of descriptive statistics and a boxplot.

• Note the case numbers that are displayed individually on the boxplot. What are their country names? Record these case numbers.

• Go back to your PSPP Syntax Editor. Highlight the code below and paste it into your syntax editor.

LIST

/VARIABLES = COUNTRY Support_network_lastname_teamname

/CASES = FROM 19 TO 19

/FORMAT = NUMBERED.

• Make sure that you modify the above variable name for “Support_network_lastname_teamname” with your lastname. The variable name should match what you have changed it with previously.

• In your syntax editor, highlight the above codes, go to “Run” from the tool bar menu, and click on “Selection” in order to run/compile the selected code.

• PSPP icon blinks/flashes again at the bottom of your computer screen to indicate that something new has been added to your output.

• Open your PSPP output window. New information is added to your output. See the bottom of your PSPP output window.

❖ Save/Export your PSPP outputs:

• In PSPP output window, go to File > Export > “Give a name to your output” and the location that you want to save your output in (e.g., My Document folder).

• At the bottom of the box that has appeared on your screen, from the drop down menu, select the format that you want to save your PSPP output: e.g., PDF(*.pdf) and click “Save”.

• Check your computer folder to make sure that your PSPP output is exported to your desired folder.

❖ Close the PSPP program in your computer.

❖ Refer to your PSPP outputs to answer the following questions.

Note: Unit of measurement is “Percentage of people aged 15 and over”

1. Refer to the descriptive statistics table for the distribution of percentages of adults who reported having

a social network support. Report the mean and standard deviation for this distribution. Interpret these

values within the context of this study.

2. Refer to the boxplot, and tables of descriptive statistics, percentiles, and extreme values for the

distribution of percentages of adults who reported having a social network support. Describe the shape,

centre, and spread of this boxplot within the context of this study. Note whether any points are plotted

individually on the plot. Specify the country name(s) for the individually plotted points on the boxplot.

3. Use the 1.5IQR rule to determine whether the individually plotted point(s) is/are suspect outlier(s).

4. Find the z-score for the minimum data value. Give a brief interpretation of this value (z-score) within

the context of this study.

5

Part B. Compare percentages of perceived social network support between males and females.

Open PSPP (from your computer program).

Step 1. Select Files to Import:

▪ In menu bar, go to File > Import Data > (e.g., My Document)

• Select the saved data file: “BLI_Support_Net_Gender_2017.csv”

• Click Next (Bottom of the page)

Step 2. Select the Lines to Import: Click Next

Step 3. Select the First Line:

▪ Select Line “1” (move the blue line from line 0 to line 1) ▪ At the bottom of screen, check off the box: Line Above Selected Line Contains Variable Names ▪ Click Next

Step 4. Choose Separators: Click Next

Step 5. Adjust Variable Formats: Click Apply.

PSPP will open “Data View” and “Variable View”.

PSPP Instruction for Task B:

• From top bar menu in PSPP, go to Analyze > Descriptive Statistics > Explore

• Select the variable “Support_network_lastname_teamname” from the list and put it in the “Dependent List”. You should see your lastname or team name.

• Select the variable “Gender” from the list and put it in the “Factor List”.

• Do not close this box yet; click on “Statistics”, select “Descriptive”, “Extreme”, “Percentile”, and click on Continue.

• Do not close the main box yet; click on Paste.

• PSPP will open another window; this is the Syntax Editor window.

• Your code looks like this at the moment:

EXAMINE

/VARIABLES = Support_network_lastname_teamname

BY Gender

/STATISTICS = DESCRIPTIVES EXTREME

/PERCENTILE

/MISSING=LISTWISE.

• We need to add a line for PLOT = BOXPLOT (let’s add it in the same place as mine).

EXAMINE

/VARIABLES = Support_network_lastname_teamname

BY Gender

/STATISTICS = DESCRIPTIVES EXTREME

/PERCENTILE

/PLOT = BOXPLOT

/MISSING=LISTWISE.

6

• Highlight the entire modified code in your PSPP Syntax Editor and go to “Run” from the tool bar menu and click on “All”.

• You will get an output (PSPP icon blinks/flashes at the bottom of your computer screen).

• Open your PSPP output. This PSPP output displays tables of descriptive statistics by gender and side-by-side boxplots.

• Note the case numbers that are displayed individually on the side-by-side boxplots. What are their country names? Record these case numbers (Females: 54, 44, 66; Males: 33, 59, 65, 53, 43)

• Go back to your PSPP Syntax Editor. Highlight the code below and paste it into your syntax editor.

• The code below will only display information for the case #54. Change the number to display other cases (the ones displayed individually on the boxplots). You need to do these seven more times.

LIST

/VARIABLES = COUNTRY Support_network_lastname_teamname

/CASES = FROM 53 TO 54

/FORMAT = NUMBERED.

• Make sure that you modify the above variable name for “Support_network_lastname_teamname” with your lastname. The variable name should match what you have changed it with previously.

• In your syntax editor, highlight the above codes, go to “Run” from the tool bar menu, and click on “Selection” in order to run/compile the selected code.

• PSPP icon blinks/flashes again at the bottom of your computer screen to indicate that something new has been added to your output.

• Open your PSPP output window. New information is added to the bottom of your PSPP output.

❖ Save/Export your PSPP outputs:

• In PSPP output window, go to File > Export > “Give a name to your output” and the location that you want to save your output in (e.g., My Document folder).

• At the bottom of the box that has appeared on your screen, from the drop down menu, select the format that you want to save your PSPP output: e.g., PDF(*.pdf) and click “Save”.

• Check your computer folder to make sure that your PSPP output is exported to your desired folder.

❖ Close the PSPP program in your computer. ❖ Refer to your PSPP outputs to answer the following questions.

Note: Unit of measurement is “Percentage of people aged 15 and over”

1. Refer to the descriptive statistics table for the distributions of percentages of perceived social network

support by gender. Report and compare the means and standard deviations for distributions of females and

males. Interpret these values within the context of this study.

2. Refer to the side-by-side boxplots, and tables of descriptive statistics, percentiles, and extreme values for

the distribution of percentages of perceived social network support by gender. Describe and compare the

shapes, centres, and spreads of these plots within the context of this study. Note whether any points are

plotted individually on each boxplot. Specify the country name(s) for the individually plotted points on

each boxplot.

3. Use the 1.5IQR rule (and 3IQR) to determine whether the individually plotted point(s) is/are suspect

outlier(s) or rare/unlikely cases. Confirm the cases that are displayed with “O” or “*” on the boxplots.

4. Find the z-scores for the individually plotted data value(s) on each boxplot (females, males). Give a

brief interpretation of these values (z-scores) within the context of this study.

7

Assessment of Data Analysis Project: Part 1

Last Name of Students or Team Name

1.________________________________________

2. ________________________________________

3. ________________________________________

Task A: Describe the distribution of perceived social network support Possible

Points

Point(s)

Received

1: PSPP Outputs (with modified lastname/team name) 10

2: Interpretation of Descriptive Statistics: Mean and Standard Deviation 4

3: Interpretation of Boxplot: Describe shape, centre, spread, outliers 8

4: Investigation of suspect outliers 2

5: Interpretation of Z-score(s) 2

Total 26

Task B: Describe the distribution of perceived social network support by gender Points Point(s)

Received

1: PSPP Outputs (with modified lastname/team name) 10

2: Interpretation of Descriptive Statistics: Means and Standard Deviations 6

3: Interpretation of Side-by-Side Boxplots: Describe shape, centre, spread, outliers 10

4: Investigation of suspect outliers 6

5: Interpretation of Z-score(s) 2

Total 34

Total Points

60

Marked by TA: ______________________________________

Comments (if any):