Math Home work

statsassignmentinstructions.zip

Home >Mathematics homework help >Math Home work

Final exam data (Winter 2022).xlsx

DATA

Sales manager ID #	Sales	Wonder	SCIIT	Experience (yrs)
798	96	27	42	5
178	90	35	46	8
264	113	30	55	8
589	98	26	47	2
392	76	28	45	7
476	117	24	56	11
620	118	35	63	4
653	101	33	50	9
237	95	27	54	4
333	94	38	41	8
497	119	31	62	3
257	120	31	79	1
378	115	32	52	9
260	131	31	62	4
514	99	34	45	3
343	102	25	59	0
213	66	26	40	6
754	129	25	64	11
696	100	25	39	6
132	111	33	52	2
820	128	39	74	5
615	104	28	45	9
676	133	33	61	5
905	125	37	66	8
861	99	23	46	8
944	90	31	46	5
890	122	36	63	9
158	62	32	54	11
468	98	37	46	11
421	100	25	49	3
993	123	30	62	0
640	120	36	57	5
298	83	28	41	2
724	71	24	34	9
388	102	34	54	4
212	89	35	48	8
690	75	31	53	1
304	106	30	54	5
559	80	30	36	0
149	99	25	49	8
290	104	38	56	11
220	105	26	55	4
283	87	24	43	13
686	105	26	50	5
535	90	37	41	5

Final exam (Winter 2022).docx

MGMT 2262

Final exam

Winter 2022

Contents

General Information 2 Rules 2 Outside sources 3 Scenario 4 What you need to do 4 Part 1 - Exploratory data analysis 5 Table 1 5 Part 2 - Training and testing set (sample) 5 Table 2 7 Part 3a - Simple linear regression 9 Table 3 10 Part 3b - Choosing between models 12 Table 4 12 Part 4 - Multiple linear regression 13 Table 5 13 Submission Guidelines 14 Breakdown of marks 15 Notes on plagiarism and cheating (and how to avoid it) 17

Two very important notes:

1. This is a statistics course and the goal of this final exam is to demonstrate your understanding of the whole course. When you are reviewing your work, ask yourselves “are we demonstrating our understanding of relevant topics?”

2. Related to 1, though the rubric is in the middle of the document, it is the most important part of the exam as it specifically tells you what you are being grade on. As you complete each step, ensure that you have checked your work against the rubric to make sure you are maximizing your grade. Also it indicates where to put most of your effort (i.e. the portion of the exam that is worth the most should be where you put most of your work).

General Information

· This exam has four parts, which involve utilizing multiple analysis techniques to explore a human resources problem.

· The overall goal of this exam is to demonstrate your understanding of the key topics in this course and that you can apply them in a real-world situation.

· Worth: 25% of total mark for the course.

· Due: Wednesday, April 20th by 11:59pm

· Late submissions will not be accepted. Extensions will not be granted except in extreme circumstances. If you submit late, expect to receive 0% on the exam.

· Though you do not need to do additional research for this exam, if you directly use any sources outside of the course (i.e. not from course notes or from the textbook), it is expected that you properly cite them. Both in-text citations and the reference list need to be done in APA style: https://library.mtroyal.ca/citations

Rules

1. Most important rule: This is an individual exam. It must be completed by you and only you. The work you submit must be your own work. The normal expectations of students completing an exam apply.

· You are not allowed to ask for help from another human, show your work to another human (other than when you submit to your instructor), or in any way gain any form of assistance from another human.

· You cannot pay someone to do this exam for you. You cannot go to a website and share the document, get possible answers and use them in any way. You cannot go to a website, look for the exam, get possible answers and use them in any way.

· Communication about any portion of the exam in any way with any person other than Your instructor is strictly prohibited.

· Asking for any kind of help for this exam is strictly prohibited. Think like you are writing an in-person exam. You couldn’t lean over to your buddy and ask them how to make a histogram. So you can’t do it on this exam.

· Note: If two people submit work that is strangely similar, there is a very good chance that both students will receive 0 and will be sanctioned with academic misconduct. Further to this, I will pay special attention to students who have worked together in the past to see if their answers are similar.

2. It is “open” book. This means you can use any resource (other than a human) you want to complete the submission. This includes the textbook, course notes, course videos, and internet sites (excluding homework help sites like Chegg and CourseHero).

3. You can ask the course coordinator Collette Lemieux ( [email protected] ) for help that involves clarification. For example, if you do not understand what an instruction means, you can ask for clarification.

· Similar to assignments 1 and 3, a FAQ document has been created. Please check there for questions and answers.

4. You cannot ask your instructor for help doing the exam because it is expected that you know how to do it. For example, if you do not know how to make a histogram, you need to figure it out on your own. Or if you are not sure what model to use in Step 3 of a hypothesis test, you need to figure it out on your own.

· This relates to the majority of Excel issues as well. For example, if you don’t have the Data Analysis Toolpak properly installed prior to the final exam, that suggests you aren’t prepared to write the final and need to figure out the problem yourself. As another example, it is expected that you have actually used Excel to do a similar analysis prior to the final exam. Therefore, if you are having problems with doing the analysis, you need to figure it out on your own.

5. You cannot ask your instructor for feedback.

6. For all parts, you can work as much or as little on it as you want. As long as it is completed by April 20th end of day.

· You have been given over ten days to complete this exam. It is expected that you work on the exam throughout this period. If you choose to wait until late on Wednesday to start the exam and run into problems, then you need to accept the consequences.

· If you studied for the final exam prior to writing it, it will take 3 to 4 hours to complete. But most of you will study as you are writing it (because it is an open book exam). Therefore, plan to spend at least 12 hours working on the exam. Therefore, starting this exam three hours before it is due is like showing up to an exam two hours after it has started.

· This is a final exam. It is worth 25% of your mark. Behave accordingly.

7. This is not a complete list of rules as that is hard to do. Instead, please keep in mind the spirit of the rules which is an open book, individual exam .

Outside sources

NOTE: If you directly use an outside source (e.g., paraphrase or quote), you still need to do a proper APA citation. What is described below is only for outside sources that you looked at for help and not for directly writing your final work.

As I am absolutely convinced that most of you are using outside sources and failing to cite them, let us make it easy. If you look at a website outside of the class (i.e. not our textbook or from BB), insert the URL in the table below, state which part of the exam you used it for (1, 2, 3 ,4), and very briefly how you used it. In the first row, I’ve provided an example of what I’m expecting. Please delete it before submitting.

URL	Part	How used
https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-formula/	2	Read about r and how to interpret it.

If you claim you used no outside sources OR you directly used all outside sources, instead of submitting the above table, include the following sentence in your final work (see submission guidelines for where):

“I, [insert full name], solemnly swear that I did not use any outside source (except those properly cited using APA referencing) to complete this exam. I understand that failure to indicate outside sources is an act of academic misconduct and could result in getting 0 on this exam.”

Scenario

The Craybill Instrumentation Company produces highly technical industrial instrumentation devices. The company has 45 sales regions, each headed by a sales manager.

The human resources (HR) director has the business objective of improving recruiting decisions concerning sales managers. The HR director determined that the primary method of evaluating the effectiveness of recruitment is the hire’s resulting “sales index” score, which is the ratio of the regions’ actual sales divided by the target sales. The target values are constructed each year by upper management, in consultation with the sales managers, and are based on past performance and market potential within each region.

At the time of their application, candidates are asked to take the Strong-Campbell Interest Inventory Test and the Wonderlic Personnel Test. The former test measures the applicant’s perceived interest in sales, while the latter measures their perceived ability to manage. For both tests, the higher the score the better. Due to the time and money involved with the testing, some discussion has taken place about dropping one or both of the tests.

The HR director decided to use regression modelling to predict the sales index (Sales) of the sales managers. To start, the HR director gathered information on each of the 45 current sales managers, including years of selling experience (Experience), and the scores from both the Strong-Campbell Interest Inventory Test (SCIIT) and the Wonderlic Personnel Test (Wonder). The attached Excel file contacted information on the 45 current sales managers.

Your goal is to perform analysis to determine: Can the sales index be predicted by the variables chosen by the HR director? If so, which variable or combination of variables is the most effective predictor.

What you need to do

To answer the above question, follow the instructions below. You will submit all of the tables in each of the parts and your Excel file with your completed work. See Submission Guidelines for more details.

Part 1 - Exploratory data analysis

Prior to doing the regression analysis, the HR director wants to get a sense of the quality of sales managers Craybill currently has. To do this, run an exploratory data analysis of the sales index and determine what story you want to tell about the sales managers.

· Goal: Answer the HR director’s question: “What is the quality of the current sales managers at Craybill?”

· How: Perform exploratory data analysis.

· Step 1: Run an exploratory data analysis (i.e., create visualizations and numerical summaries) for the sales index.

· Step 2: Choose one other variable (i.e., Wonder, SCIIT or Experience) to drill down into the sales index data. Run an exploratory data analysis (i.e., create visualizations and numerical summaries) for the sales index and your chosen variable.

· Step 3: Review your work and decide what story you want to tell about the quality of the current sales managers at Craybill.

· Step 4: Choose one visualization or set of numerical summaries (or both) from Step 1 and one visualization or set of numerical summaries (or both) from Step 2. Insert them in Table 1. Then write an answer/explanation to the HR director that explains the story and answers their question.

Table 1

	Evidence: Visualization, numerical summaries or both	Explanation to HR director of current situation
Sales index	[Focus only on the sales index. What is the story of the quality of the current sales managers.]
Drill down	[Choose one other variable (i.e., Wonder, SCIIT or Experience) to drill down. What is the story of the sales manager based on one of the other variables? ]

Part 2 - Training and testing set (sample)

Have you heard of machine learning ? Sounds like a super hard area of computer science that is way too hard for a first class. But actually, you’ve already engaged in machine learning! How? you ask. Well, a type of supervised machine learning is linear regression.

The goal of machine learning is to build a model that learns or changes as new information is provided. In regression, the model is built from data and it can be improved upon as new data is provided. For example, if we build a regression model to predict the sales index for sales managers, as we hire new sales managers, we can add their information to the model, re-run the regression analysis, and get an even better prediction model.

Another big part of machine learning is testing the accuracy of our model. We often do this by taking our data set and dividing it into two parts: a training set and a testing set. The training set is used to build the model, which in our case means using the data analysis toolpak to get the regression values. Then we plug the values from the testing set into the model to see how good the model is at making predictions for a different set of data. In short, the training data set is used to build the model (in this case the regression model), while the testing data set is used to test the ability of the model to make predictions. If you are interested in finding out more, check out this article (note: this isn’t needed to do this exam but is provided purely for interest).

The common rule for dividing the data is called the 80/20 split. That is, the training set is made up of 80% of the data while the testing set is made up of 20% of the data.

In this first step, divide the data set to make the training and testing set.

· Goal: Divide the data into two random samples. The first sample is called the training set and will contain 80% of the data values. The second sample is called the testing set and will contain 20% of the data values.

· How: Collect a random sample.

· Step 1: Choose a random sampling technique.

· Step 2: Apply the random sampling technique to the data set to randomly select 20% of the sales managers and their associated data. Copy and paste those into the “Testing set” part of the table below.

· Though this is the “second sample”, we are collecting it first for efficiency - it is faster to collect a 20% sample instead of collecting an 80% sample.

· Step 3: Then take the remaining 80% of the sales managers and their associated data, and copy and paste those into the “Training set” part of the table below.

· Step 4: At the top of the table, briefly explain how you collected your sample in the row provided in the table.

Table 2

Brief explanation of how the sample was collected.

	Sales manager ID	Sales	Wonder	SCIIT	Experience (yrs)
Testing set









Training set

Part 3a - Simple linear regression

Build and evaluate a simple regression model to predict the sales index using the training set . To start, pick one of the three variables (Experience, SCIIT, or Wonder) that you believe will be useful at predicting the sales index.

· Goal: Build and evaluate a model to predict the sales index.

· How: Perform simple linear regression analysis:

· Step 1: Choose one variable as a predictor for the sales index.

· Step 2: Run the residual analysis to determine if the assumptions of regression are valid for this model.

· Note: Even if they are not all valid, continue with the analysis.

· Assume that ID#s are handed out in numerical order from earliest hire to most recent hire.

· Step 3: Build the regression model.

· Step 4: Evaluate the regression model using the testing data set.

· To evaluate the model, you’ll use something new called the root mean square error or RMSE. It is found by taking the average of the square of the residuals, and then square rooting the result.

· In general, the closer RMSE is to 0, the better the model is at making predictions. The larger the RMSE is, the worse the model is at making predictions. The RMSE is an absolute measure which means the size of the data needs to be considered when interpreting it.

· Step 5: Make predictions.

To do the above, fill in the following table. Leave the titles and labels alone. But anything in [] should be removed or replaced.

Table 3

Question	ANSWER
Step 1: State the variable you chose and explain why you chose it.	[Provide one or two sentences that explain why you think your variable is a good predictor of sales index. Make sure to clearly state which variable you chose.]
Step 2: Run the residual analysis to determine if the assumptions of regression are valid for this model.	Linearity	[Insert the relevant visualization for this portion of the residual analysis]	[Explain whether the visualization indicates whether the assumption of linearity is appropriate.]
	Independence	[Insert the relevant visualization for this portion of the residual analysis]	[Explain whether the visualization indicates whether the assumption of independence is appropriate.]
	Normality	[Insert the relevant visualization for this portion of the residual analysis]	[Explain whether the visualization indicates whether the assumption of normality is appropriate.]
	Equal variance	[Insert the relevant visualization for this portion of the residual analysis]	[Explain whether the visualization indicates whether the assumption of equal variance is appropriate.]
Step 3: Build the regression model. Once this model is built use the values related to the model (e.g.., regression equation, r, standard error, etc.) in the next section to evaluate the model.	Initial analysis	[Insert the scatterplot]	[Find and interpret the r-value.]
	Build the model.	[State the regression equation. Include for what values of X the equation is valid.]
	Hypothesis test for the slope. (Steps 2 and 3 are done for you)	Step 1	[Beta symbol if you need it: β]
		Step 2	Use a level of significance of 5%.
		Step 3	Use the Student-t distribution.
		Step 4
		Step 5
		Step 6
	95% confidence interval for the slope	[Provide a complete and thorough interpretation of the 95% confidence interval for the slope. Make sure to include the confidence interval in your answer. Your answer needs to show understanding of confidence intervals and slope. ]
	Conclusion as presented to the HR director	[By referring to all of the analysis done in this section, indicate a) if linear regression is an appropriate way to model this situation, and b) what the model indicates about the relationship between sales index and your chosen variable.]
Step 4: Evaluate the accuracy of the regression model you built above.	*For each of the sales managers in the Testing data set* , provide the following information.** To find the predicted value use the model built above.	X-value	Actual y-value	Predicted value y-value	Residual	(Residual)^2









	Calculate and interpret the RMSE	[RMSE = Find the average of the values in the (residual)^2 column. Then take the square root of that result].	[Interpret the RMSE. See details about Step 4 above for help]
Step 5: Make predictions	Choose one of the X-values in Step 4.	[For the chosen X, find and interpret the confidence interval estimate for the mean value of Y.]
		[For the chosen X, find and interpret the prediction interval estimate for the individual value of Y.]

Part 3b - Choosing between models

In Part 3a, you have done a thorough analysis for one possible predictor of sales index. Here, we want to address the HR director’s question of “Can the sales index be predicted by the variables chosen by the HR director? If so, which variable is the most effective predictor?”

· Goal: Determine if any of the chosen variables can predict the sales index. If so, which variable is the most effective predictor?

· How:

· Step 1: Run the regression analysis for each variable: Experience, SCIIT, or Wonder.

· Step 2: Compare the three models.

· Step 3: Write a paragraph that explains the results of the analysis to the HR director.

Table 4

Use the regression analysis of all three variables to write a paragraph to the HR director that answers: “Can the sales index be predicted by the variables chosen by the HR director? If so, which variable is the most effective predictor?”

Part 4 - Multiple linear regression

Build two multiple regression models using the training data set and determine which one is better. Each model needs to have at least two independent variables. Once you build your models, fill in the following table.

· Goal: Build two multiple linear regression models and compare them to determine which is better.

· How:

· Step 1: Choose the variables for Model 1. Run the regression analysis for this model using Excel. Plug in the necessary values into the table below.

· Note: For Multicollinearity, choose how you want to measure this.

· Step 2: Repeat Step 1 for Model 2 with different variables.

· Step 3: Compare both models using the results in Table 5. Determine which model is better. Then write a paragraph that presents your argument and provides evidence to support your decision.

Table 5

	Model 1	Model 2
Chosen variables
Relevant scatterplots
r
Adjusted r^2
p-value for group of slopes
Individual p-values for slopes
Multi-collinearity
Regression equation
Prediction	[Use the same actual Y-value you highlighted above when doing simple linear regression. Predict the Y-value using this model. Then calculate the residual.]	[Repeat for this model.]

Compare the two models. Which model would you recommend that the HR director use? Justify your answer.	[Your answer needs to compare each model based on a) the appropriateness of using linear regression for the model, b) the accuracy of the model, and c) collinearity Finally, there needs to be a clear closing statement that takes together the whole argument and clearly states which model is better and why].

Submission Guidelines

Your submission needs to follow these guidelines.

Submit one Word doc or PDF file and one Excel file.

The Word doc needs to follow this format:

· Section 1: Filled in Outside Sources table OR the statement that you did not use any outside sources. If you have used any sources directly in your work, insert the reference list here (in addition to the table).

· Section 2: Table 1

· Section 3: Table 3

· Section 4: Table 4

· Section 5: Table 5

· Section 6: Table 2

See the Word document “Final exam submission (Word).docx” for the required layout.

The Excel doc should have a minimum of five sheets that demonstrate your computations to complete the exam. Do not provide explanations or any additional information. You are providing the Excel spreadsheet simply so we can see your work. Your spreadsheet needs to include the following sheets:

· “Part 1”: Your exploratory data analysis

· “Part 2”: Your training set and testing set

· “Part 3 – Experience”: Your regression analysis for Experience.

· “Part 3 – SCIIT”: Your regression analysis for SCIIT.

· “Part 3 – Wonder”: Your regression analysis for Wonder.

· Note: One of the tabs for Part 3 will include the in-depth analysis for Part 3a. While the other two will include a shorter analysis for Part 3b.

· “Part 4 - Model 1”: The regression analysis for model 1

· “Part 4 - Model 2”: The regression analysis for model 2

You can have more sheets than this but ensure that you have labelled the sheets appropriately to help your instructor find the information.

See the Excel document “Final exam submission (Excel).xlsx” for the required layout.

Breakdown of marks

Here is how you’ll be marked on the final exam

· Superior performance– A+: The answer is correct, complete, and demonstrates a very strong understanding of the relevant course content.

· Excellent – A: The answer is correct, complete, and demonstrates a strong understanding of the relevant course content.

· Good - B: The answer is mostly correct and complete, with no errors or only small ones. Understanding of relevant course content is generally demonstrated. Above average performance.

· Satisfactory - C: The answer is mostly correct and complete, with either multiple small errors or a significant error. Basic understanding of course content is demonstrated.

· Marginal Performance - D: There is more than one significant error. The response suggests a lack of understanding of course content.

· Fail – F: There are multiple errors and overall, the answer does not demonstrate understanding of the course content.

· Not done: The component is missing.

	Description	Mark
Part 1 - Exploratory data analysis	The response presented in the table demonstrates that the student correctly knows how to do data analysis, can identify a story within the data, can present evidence to support the story, and can communicate the story in a meaningful way to their employer.	A+	15
		A	13
		B	11
		C	9
		D	7.5
		F	3
		Not done	0
Part 3a – Residual analysis (Step 2)	The response presented in the table demonstrates that the student correctly understands how to perform residual analysis, understands what the results of the analysis indicates, and can effectively communicate the results of the analysis.	A+	5
		A	4
		B	3.5
		C	3
		D	2.5
		F	1
		Not done	0
Part 3a – Build, evaluate and predict using the regression model (Steps 1 and 3, 4 and 5)	The response presented in the table demonstrates that the student can correctly make an argument why two variables are related, explain whether the two variables are related, perform inferential statistics on the slope, calculate the RMSE, and make predictions. Finally, the student can communicate the results of the analysis in a meaningful way.	A+	15
		A	13
		B	11
		C	9
		D	7.5
		F	3
		Not done	0
Part 3b – Choosing between models	The response presented in the table demonstrates that the student can correctly choose between multiple models and can effectively communicate the decision process.	A+	10
		A	8.5
		B	7.5
		C	6
		D	5
		F	2
		Not done	0
Part 4 – Multiple linear regression	The response presented in the table demonstrates that the student can correctly find relevant features of regression models to allow for their comparison. Additionally, the student can appropriately compare the features and make a complete and compelling argument for why one model is better than the other model.	A+	10
		A	8.5
		B	7.5
		C	6
		D	5
		F	2
		Not done	0
Part 2 and Excel	The table includes both the training set and the testing set. The explanation of how the random sample was found was correct and sufficiently explained. The Excel file with all required computations was included.	A+	5
		A	4
		B	3.5
		C	3
		D	2.5
		F	1
		Not done	0
Outside sources	An appropriate list of outside sources is provided. If direct sources are used, correct APA referencing is used.	Complete
		Incomplete	Depends on severity of omission. Anywhere from -3 to -55
Total		60

Notes on plagiarism and cheating (and how to avoid it)

Plagiarism is any act where you present work as your own when it is not. When plagiarism is found, a letter is sent to the Office of Student Conduct.

Cheating is when you do something which gives you an unfair advantage over other students.

When you submit anything at MRU with your name on it, you are stating that you are comfortable with all the work presented and you agree that it is your work. Make sure you review your work to ensure it is your own prior to submitting.

Here are two common scenarios that I have seen in the past.

Scenario 1: Suppose you do not quite understand standard deviation. So you google “standard deviation” and then you click on Standard Deviation Definition on Investopedia ( https://www.investopedia.com/terms/s/standarddeviation.asp ). One sentence makes sense to you “Standard deviation measures the dispersion of a dataset relative to its mean.” What is the right way to deal with it, so you are not engaging in plagiarism?

Options	Result
We found the standard deviation of income to be $4000. Standard deviation measures the dispersion of a dataset relative to its mean.	Plagiarism! This is a direct copy and paste without any indication of the source. This is work presented as your own when it is not.
We found the standard deviation of income to be $4000. Standard deviation measures the scatter of a dataset relative to its mean.	Plagiarism! Though it is not a direct copy, it is still close to the websites wording and it is still presented as your work when it is not as there is no citation.
We found the standard deviation of income to be $4000. Standard deviation measures the scatter of a dataset relative to its mean (Hargrave & Westfall, 2020).	Not obviously plagiarism but still borderline. A correct in-text citation was used, but the quote was insufficiently paraphrased. Changing one word is not paraphrasing.
We found the standard deviation of income to be $4000. This measure indicates how much the incomes vary from the mean (Hargrave & Westfall, 2020).	Not plagiarism : ) There is a correct APA in-text citation and the sentence was paraphrased.
We found the standard deviation of income to be $4000. “Standard deviation measures the dispersion of a dataset relative to its mean” (Hargrave & Westfall, 2020, para. 2).	Not plagiarism : ) Direct quote is used (and indicated by quotation marks) and the a correct APA in-text citation was used. BUT in this exam, you should avoid using direct quotes and instead she focus on what these definitions mean in the context.

Note: An APA proper reference at the end of the document needs to be included if outside sources are used. For this example, the APA reference would look like:

Hargrave, M. & Westfall, P. (2020, July 21). Standard deviation definition. Investopedia. https://www.investopedia.com/terms/s/standarddeviation.asp

Here are some good habits:

· Never copy and paste a sentence straight into your exam document. Instead, immediately paraphrase it and include the reference. A lot of students copy and paste and then forget to change it –it is still plagiarism.

· If you spend any time on a website as you are doing this exam, write down the websites name and URL in a document (use the Outside Sources table for this exam).

Scenario 2: Your friend asks to see your exam because they just want some ideas on what they could do.

There are not good options on this one. Probably a more accurate way to write the scenario is: Your “friend” asks to see your exam because they just want to copy and paste your work.

Do NOT share your work with anyone. If you do share your work and your “friend” copy and pastes it (even if you don’t know), you have committed academic misconduct. Friends don’t ask to borrow your work because they get how unfair it is to put you in that spot. Also simply sharing work between friends even when no copying is done is cheating (but not plagiarism) as the groups involved are getting an unfair advantage.

In previous assignments, you have been allowed to help each other. This is NOT the case for this exam.