LAB3

profileS1monD
LabAssignment3.pdf

ECON 101: Statistics for Economics Lab Assignment 3

Dates: October 25th, October 28th or October 29th Due November 1st, 4th or 5th

The overall aim of our lab section this term is to increase your proficiency in Microsoft Excel and data analysis. By the end of the term, you should feel comfortable taking on any workplace assignments that involves data analysis through Excel. The second lab aims to teach you:

• How to collect, organize and analyze a very large dataset • Show how the Central Limit Theorem works using real-world data • Become comfortable using some more advanced Excel commands

We are going to get a new dataset this week. Similar to what we did in class a few weeks ago, let’s go get data on the stock market. We will analyze the Standard & Poor’s 500 Index (S&P 500). The S&P 500 is a weighted-average of the value of the 500 largest stocks on the New York Stock Exchange. Historical data on the S&P 500 can be gathered at Yahoo!: https://finance.yahoo.com/quote/%5EGSPC. Type in the site into the address bar.

1. When you reach the webpage for the S&P 500, you will see the real-time value of the index. Click on the “Historical Data” tab.

2. We want to get data for the entire time period available. Click on the date under “Time Period” and a range of possible dates can be chosen. Click on “MAX”, then “Done” and finally “Apply”. Once the screen reloads, you can click “Download Data” and an Excel spreadsheet should automatically download. Open up the file and see all the data you have gathered!

3. The dataset shows the Date, Open, High, Low, Close, Volume and Adjusted Closing Price. For this exercise we will be interested in the difference between the opening and closing prices. Because of that, we can delete the “High”, “Low”, “Volume” and “Adj Close” columns by holding down the “Ctrl” button and then clicking on columns C, D, F and G, which highlights the columns. Right click the mouse and click “Delete”. The dataset should now look like this:

4. The data needs to be cleaned a little bit more. If you scroll down to approximately row 14500 (yes, there is this much data!), you will notice that before 1962, only one price was reported for the index. We will not be able to calculate the daily change before 1962, so we need to delete these rows. The easiest way to do this is to click the number associated with the last day of 1961, highlighting the entire row. Then hold down “Ctrl” and “Shift” at the same time and press the down arrow. This highlights all the rows below 1962. Press “Delete” on the keyboard and all the highlighted rows will be deleted.

5. We can figure out the daily change in price by calculating it in Excel. In cell D1, name the variable we will create something like “Daily Change”. In cell D2, we can calculate the percentage change as 100 x (Close Price – Open Price)/Open Price. To do this, type “=100*(C2-B2)/B2”. This can also be done by clicking the cells you are using at the appropriate point in the equation.

6. After pressing enter, the percentage change will be calculated for the first cell (D2). In order to have all the cells calculated, double click on the bottom right corner of cell D2. This will fill in the calculation for all the cells in the row!

7. Calculate the Mean and Standard Deviation in Columns E and F. Assume we have the population and use the commands “=average()” and “=stdev.p()” to calculate the statistics.

8. Now create a histogram of the daily change by highlighting the data in row D. Do this by clicking on cell D1, holding down “Ctrl” and “Shift”, then pressing the down arrow so that all the data is

highlighted. Once it is highlighted, click on the “Insert” tab, then , which is the histogram creator, click the histogram on the left and a histogram will pop up.

9. The histogram will appear in the worksheet. Let’s move this to its own sheet. Right click on the chart, click “Move Chart”, click the circle next to “New sheet” and rename Chart1, “Population Histogram”.

10. You will probably notice that the histogram looks like a normal distribution, but the tails of the distribution are very wide. We can tighten up the histogram by double-clicking the values in the X-axis. When you do this, format options will appear on the right side. Click the boxes next to “Overflow bin” and “Underflow bin”. This will place any values that are below -3.0 or above 3.0 into the same bin. Double click “Chart Title” and rename the chart “S&P 500 Daily Returns, 1962-2018”. Your final histogram should look like this:

Construct Sample Distributions

1. Now that you have a population to compare to, you will construct sample distributions using the random number generator. Get back to your spreadsheet by clicking on the tab “table” at the bottom left of the chart. Now create two new columns by right-clicking on the E column and then clicking insert…do this twice so that your spreadsheet now looks like this:

2. Title the E column “Random”. In cell E2, type “=rand()” and click enter. This will return a random number between 0 and 1. Double-click the bottom right corner to randomly assign a number to each date. You may notice after you double-clicked the corner, the value in E2 changed. In order to keep the random values from changing, highlight the entire row by clicking E, right click the mouse, click copy, right click cell E1 and then under “Paste”, click on the clipboard with 123 underneath it to paste the values of the random numbers. This will keep the numbers fixed.

3. We will randomly choose dates based on the random number assigned to it. Specifically, click cell E1, then click sort and filter in upper right hand corner and sort A to Z. Now your sheet will have a random date in the first cell:

4. Highlight the first 50 values, D2 to D51, and create a new histogram. Move it to a new sheet and rename it to Sample50. Change the chart title to something like “S&P 500 Daily Returns, 1962- 2018, n=50”. Explain the shape of the histogram relative to the population distribution. Does it look normal, is it skewed, where is the average, what is the standard deviation?

5. Create three more sample histograms, one with n=100, n=1000 and n=2000. Calculate the mean, standard deviation and skewness for each sample. Explain the differences between each histogram and the value from increasing the number of observations in the sample. In other words, does increasing from 50 to 100 make the distribution look more like the population? What about 1000 to 2000?

6. Turn in the five histograms you create. You will have two paragraphs to explain the charts and answer the questions in points 4 and 5. The entire assignment should be done on two sheets of paper (one double-sided sheet). Turn in your assignment in lab next week.