Programming script and presentation for the attached problem statement

profilempotey1
ResidencyGroupAssignment1.pdf

10/8/20 Residency Group Assignment Toy Hon Kia P a g e | 1

Residency: Group Assignment

Problem: When shopping for a used vehicle outside of the traditional window sticker, the city fuel

economy should always be provided. Unfortunately, it is not always that obvious. If it were as

simple as knowing the year, make, and model, it wouldn’t be that difficult to use an internet

search to find the information. However, different options available on vehicle models are

typically needed to identify what the city fuel economy is. Researching what influences the fuel

economy can offer insight and potentially direct a buyer to the best option.

Question: Considering the vehicle manufacturers Toyota, Honda, and Kia, between 1991 and 2020,

amongst the fuel economy on the highway, the vehicle make, and the year, the primary fuel

type, and the number of engine cylinders, the engine displacement, and the vehicle class, and

the tailpipe carbon dioxide emissions for the primary fuel type, what features have the most

influence when prediction the fuel economy in the city, using the data consolidated by the DOE

(n.d.a).

Data:

• The data and data dictionaries are online. Links and the references formatted per APA 7:

o A direct link to the raw data is: https://www.fueleconomy.gov/feg/epadata/vehicles.csv

o U.S. Department of Energy (n.d.a). Download fuel economy data [Data set].

www.fueleconomy.gov: The official U.S. government source for fuel economy information [Data

set]. Office of Energy Efficiency & Renewable Energy.

https://www.fueleconomy.gov/feg/epadata/vehicles.csv

o A direct link to the data dictionary is: https://www.fueleconomy.gov/feg/ws/index.shtml#vehicle

o U.S. Department of Energy (n.d.b). Fueleconomy.gov web services [Data code book].

www.fueleconomy.gov:The official U.S. government source for fuel economy information. Office

of Energy Efficiency & Renewable Energy.

https://www.fueleconomy.gov/feg/ws/index.shtml#vehicle

Requirements for the analysis:

• BEFORE subsetting, for the field that represents the different vehicle classes, using the following

programming code to recode this variable before conducting analysis. For this code the object name of

the data is df, you will need to update this part of the code to match the object name of the data set in

your analysis: vehClass = df$VClass

levels(vehClass) <- 1:34

levels(vehC) <- list(car=c(1:6,14,29:30),

van= c(7:8,31:34),

truck = c(9:11,23:26),

SUV = c(12:13,21:22,27:28),

other = 15:20)

Replace the data in VClass with the data in vehC.

• Do not delete missing values.

• For observations where the primary fuel type is electricity, modify the fields representing the number of

engine cylinders and the engine displacement by replacing the value with a zero. Ask questions if you

don’t understand.

• A description similar to the carbon dioxide emitted from the tailpipe is attached to more than one

variable.

10/8/20 Residency Group Assignment Toy Hon Kia P a g e | 2

o Use the information in the data dictionary to update the values that represent NA to NA.

o To determine the field needed for this analysis, identify the field that uses the first or primary fuel

type and does not have any NA values. There is only one field that matches these criteria.

• A random forest model is the required analysis method to address the research question.

• You must ensure the model is valid and reliable before considering the influence of the independent

variables.

Reporting your research:

• Do not use variable names, function names, or argument names when presenting. Examples of non-

words from the analysis that are not meaningful to the audience include mtry, importance, or UCity.

Presentations must be made in words that are meaningful to the audience.

• In the presentation, describe the secondary data sample in terms that are meaningful to an audience

that doesn’t have access to the data, data dictionary, or the programming.

• If speaking about fuel economy, be specific. What do you mean?

o What do the values in this field represent?

▪ Is it miles per gallon? Is it kilometers per liter?

▪ Is it city traffic? Highway traffic?

▪ Is it a consolidation, like the average, the median, or something other calculation?

o Ask yourself questions such as these when evaluating the delivery of information in your

presentation or when you document research.

• Your presentation shall be 10 – 15 minutes in length. Practice it!

• This research will be presented. There is no research paper for this assignment.

Additional Assignment Requirements:

• Make sure every member of the group is named in each submitted file.

• Every student in the group must deliver part of the presentation to earn credit for this assignment.

• Every student is responsible for fully comprehending the programming and the interpretations of

all analyses.

• All analyses require meaningful interpretations. Your programming must align with the presented

findings.

• Peer reviews and self-reviews

o When completing the within-group review, provide thorough feedback for every member of the

group, including yourself. You are individually graded on the quality and thoroughness of the

feedback.

o When completing audience assessments of the other groups, you are individually graded on

the quality and thoroughness of the feedback.

▪ Fill in the audience assessment form while the groups are presenting.

▪ You will not complete an audience assessment on your group because you are not in

the audience.

▪ Pay attention when your peers present.

▪ Ask your peers questions about their presentations.

o The quality of your feedback is a large portion of the points in the residency.

• The presentation will not include “using R,” “using RStudio,” or any other reference to programming.

• Make sure to submit all files necessary to make the program fully functioning.

• The R script that contains the analysis, the slides submitted, and the slides presented shall

match exactly. Do not change the slides after submitting them.

10/8/20 Residency Group Assignment Toy Hon Kia P a g e | 3

Presentation Tips:

• Considerations for presenting:

o The amount of information contained on a single slide

o Interaction of the presenters

▪ How the presenter will convey the “next slide” to the group member that is sharing their

screen with the slides

▪ The transition between presenters

▪ Answering questions after presenting; show me the entire group understands the

information. All group members should answer questions.

o The speaker and slide content should complement each other.

o All findings must be presented with interpretations.

• Organize the information on the slides meaningfully

o Why are you telling me this? (topic)

o Why do I want to know? (problem)

o What is the focus? (research questions)

o How were the answers found; is it sound research? (Analysis method – the high-level plan)

o What are the answers? (results and discussion)

▪ Is the method of analysis valid and reliable? (When testing for one of the weaknesses,

did the model pass?)

▪ What did you learn? (results)

▪ What does it mean in terms of the focus of this research (discussion)

o Where does this type of research go next? (recommendations for future analysis)

o What was all covered in the presentation? (conclusion, a summary of the entire presentation)

Good to know:

• DO NOT forget to cite and reference your sources in APA 7.

• You do not have to annotate figures and tables in APA 7 format in your slides unless the images were

not derived from your group’s analysis in R.

• Figures and tables

o If derived from the analyses your group programmed in R, it does not need to be documented in

APA 7 format. If it is, it will exceed the expectations for the assignment. See the rubric.

o If derived from an external source of any kind, it must be documented per APA 7 with credit to

the external source per APA 7. Examples include images obtained from clip art, the internet, or

a photo.

• If you annotate figures and tables from the analysis in R per APA 7, you will exceed the expectations.

• When submitting in Blackboard, you may receive an error because the R file types are not recognized.

That is okay. It is only indicating that SafeAssign cannot evaluate that part of your submission.

o Ensure that every reference in your reference list is also cited in the text.

o Do not forget to cite and reference the source of the data.

• Complete this assignment independent of the other groups. Do not share work between groups.

• Each group has a different version of this assignment. If you complete a version of this assignment that

is not available to you in Blackboard, you will violate your pledge.

• Any demonstration of academic dishonesty on this assignment will result in all group members earning

zero points for the assignment.