assignment work (AVD)
11/4/20 Assignment 2 Sp Au Br.docx P a g e | 1
Research Assignment 2 The Outline for Research Assignment 2 and Research Assignment 2 will use this document.
Use the Documenting Research Guide to understand how to use the information in this document for
either of these submissions. Ask questions, if needed!
Problem:
Employers’ external job postings need to be posted to the one job board that targets their
model candidate and only receive applicants that are perfect for the role. In reality jobs are
typically posted in numerous places, and both suitable and unsuitable candidates apply for the
role. Using specific candidate characteristics and a specific job board, considering what may or
may not influence the use of a specific job board will lead to better targeting of candidates,
reducing redundant job postings, and decreasing the number of unfit candidates.
Question 1:
What are the most influential features when predicting whether a survey respondent has used
the SO job board or is aware of the board but has never used it when considering respondents
who reported residing in the country of Spain, Australia, or Brazil; reported their age as
somewhere between 18 and 65 years old; and that indicated that they were either not at all,
somewhat, or very confident in their manager; reported an undergraduate major in either an
engineering field, information systems, or web design, or statistics; in addition to the responses
these respondents reported regarding employment; how often the respondent contributes to
open source; and whether or not they code for a hobby; when the respondent indicated that
the number of years they have been coding is somewhere within one to 49 years using the
data from SO (2019)?
Question 2:
You are responsible for developing a second research question. This question must meet the
criteria from Unit 1 Part 1. Additionally, it must relate to the problem statement. It does not
have to use the same subset of data as the other research question. The analysis method
must be an analysis method demonstrated in one of the lectures. When completing the outline,
make sure to include both the given question and the well-developed, sound research question
you have developed.
Data:
• The data and data dictionaries are online.
o Note: The raw data in your program must be in the original form. Do not modify the data
outside of the programming. Use the data dictionary to understand the data.
o The data and data dictionary are downloaded together. When you visit this site, ensure
you select the 2019 survey and you cite and reference the source in your work.
▪ Stack Overflow. (2019). Stack overflow annual developer survey [Data set and
code book]. https://insights.stackoverflow.com/survey/
• Create a subset of data to represent the sample of secondary data in this analysis, based on
the research questions.
11/4/20 Assignment 2 Sp Au Br.docx P a g e | 2
Data Cleaning:
• Do not remove missing values during cleaning.
• When changing an object or part of an object, validate the change that occurred as expected.
• The steps that are taken in cleaning are not discussed in the research paper.
Analyze:
• When analyzing the given research question, you must use a random forest model.
o You must attempt to improve the model performance by one of the methods covered in
Unit 5.
o The research question you write must make use of a method of analysis demonstrated
in the lectures from this course.
o The use of Accuracy is not suitable in and of itself to determine the validity and reliability
of the model.
• The sub-stages of Analyze are necessary at least two times; profile, prepare, and apply. This
method is for programming, not documenting research.
• Ensure you establish that the model is valid and reliable before discussing the influential
indicators.
Results section and discussion section:
• Ensure that assertions and assessments in the results and discussion sections are derived
from the analysis in R.
• Do not speculate. Use evidence. When documenting the results, consider the generalizability.
• Explain what was done to improve model performance in words: not programming functions,
variable names, or argument names. Assume the reader cannot see the programming code or
raw data, but needs to understand what you did to improve the performance.
Future recommendations:
• Include recommendations for future analysis, based on the research in R.
• Explore the insights you can gain from this model and provide your interpretations when
documenting your research.
Bonus challenge:
Compare the influential indicators in predicting the outcome depending on the country by creating
separate models for each country. Describe if there were or were not distinct differences in the
contribution of the different predictors. Do not speculate when discussing the findings.
Tip: An additional research question that meets the five criteria from the first lecture will bring
this additional analysis into the focus of the research. The challenge does not replace the original
research requirements for this assignment. If you were to complete the challenge, there would be
three research questions.
Required files to submit:
1) Research paper in APA 7 format; MS Word document file type
2) R Script; final version with file type .r
11/4/20 Assignment 2 Sp Au Br.docx P a g e | 3
Important Information:
• You will receive an email confirming the submission. Should you receive that email, your
submission is received.
o An error is derived from the use of SafeAssign.
o SafeAssign does not recognize r file types. The warning does not impact the
submission.
• The research paper will be written in a professional writing style, following APA 7 student
paper format, use the student paper template.
o The document shall be 3-5 pages and at least 1000 words. The page count does
include the cover page, tables, or figures, or the reference page.
o Ensure that every reference in the reference list is also cited in the text.
o Do not forget to cite and reference the source of the data.
• It is ill-advised to modify the problem statement and research question provided.
• If the research problem or research questions are modified, the requirements of the analysis
will not change, nor the objective outlined in the original research question.
• There are several different versions of this assignment. If the submitted work is in line with a
different version than assigned, the submitted work is a demonstration of academic
dishonesty. Do not share the work with peers. Do not accept work that you did not do.
• Take a look at the rubric to get the best grade possible.
- Problem:
- Question 1:
- Question 2:
- Data:
- Data Cleaning:
- Analyze:
- Results section and discussion section:
- Future recommendations:
- Bonus challenge:
- Required files to submit:
- Important Information: