Assignment # 1663LA3
Stroop Task Data
Annie Qiu
25 FEB 2021
Part I: raw data
What we have vs #Goals
Original data set
Original Data set
Participant #1 through Participant #36 occupy 2 rows each, creating 72 rows of data
Demographics exist twice for each participant
Gender
Age
Ethnicity
Measures for Hours of Sleep, Caffeine Consumed, Learning Disorder, Words Read Correctly and Total Words exist twice for each participant.
Unclear which row means “MUSIC CONDITION” vs “NO MUSIC CONDITION”
Data structuring rules
Each row represents information for one participant.
The first row (aka ‘header’) should contain clear markers for each variable in easy-to-read categories.
We do not want to get confused on which categories to use when we input the IV vs DV into JASP.
Demographics should have clear coding for nominal data points.
For example: Does WHITE = CAUCASIAN? Do we combine these or keep them separate?
OG data set in jasp
What jasp sees = “FLUFFY” Data
72 separate participants
Double counts of:
Gender
Age
Ethnicity
Diagnosed learning disability
No easy way to do a side-by-side comparison of NO MUSIC vs MUSIC condition
Which ROW = music vs no music?
Implications of unstructured data
Incorrect input leads to erroneous interpretations of significance.
This leads to TYPE I or TYPE II error.
The value of your findings are diminished.
How do we believe your conclusion the next time, when you were not able to reach a statistically sound conclusion the first time?
Reputation of researchers are important if you are trying to impact the community with your results.
Implications of unstructured data
More drastically, people can get hurt/die based on what you report.
COVID vaccine from PHARMACEUTICAL X reported 99.99% efficacy in Phase III Trial, meanwhile it *actually* is only 50% effective. People will still be susceptible to disease and there are social implications that come with this.
-insert housekeeping #queen-
New data structure in jasp
What changed?
There is *ONE* participant per row of data.
Everything that happened to PARTICIPANT 1 exists in ROW 2, everything happened to PARTICIPANT 2 exists in ROW 3, etc.
We are aware of the variables as affected by condition (no music vs music)
For example: WORDS_READ_CORRECTLY_NM and WORDS_READ_CORRECTLY_M
What changed?
While this created double the columns of data as we had originally, it provides clear allocation of information per participant, without DOUBLE COUNTING the demographic variables.
For example: ROW #17 shows PARTICIPANT 16, who is a 20-year-old HISPANIC FEMALE with NO diagnosed learning disability who got 8 hours of sleep and consumed 0 ounces of caffeine and completed the Stroop task, reading 104 out of 104 words correctly, WITHOUT MUSIC…etc.
TL;Dr Data cleaning Saves Lives
A clean data set provides an optimal set-up for easy analysis.
Having structured items make for easy location of skewed items.
This is for easy determination of exclusionary criteria.
TL;Dr Data cleaning Saves Lives
All statistical packages will THANK YOU for organizing information.
In return, you will spend less time smashing your head into the keyboard, trying to figure out why the variables do not fit into a certain type of analysis.
Data analysis
Hypotheses and processes
What can hypothesize?
Task: Identify the INDEPENDENT and DEPENDENT variables you want to examine.
This should be done PRIOR to data mining/data analysis.
No HARKing = hypothesizing after results are known
This is a big no-no in research land.
Breaking down variables:
Categorical variables:
Gender
Ethnicity
Diagnosed Learning Disability
Scalar variables:
Age
Hours of sleep
Caffeine consumed
Words read correctly
Total words read
Hypothesis 1
There will be an association between words read correctly while listening to music and hours slept the night prior to completing the Stroop Task.
Nondirectional research question
IV: HOURS_OF_SLEEP_M (hours of sleep; music condition)
DV: WORDS_READ_CORRECTLY_M (words read correctly; music condition)
Type of Variables will denote what test you can use.
HOURS_OF_SLEEP_M = scalar; WORDS_READ_CORRECTLY_M = scalar
Type of test = Pearson’s Correlation
Pearson’s correlation in jasp
Things to check
Pearson’s correlation in jasp
Results
| Pearson Correlations | |||||||
| WORDS_READ_CORRECTLY_M | HOURS_OF_SLEEP_M | ||||||
| WORDS_READ_CORRECTLY_M | Pearson's r | — | |||||
| p-value | — | ||||||
| HOURS_OF_SLEEP_M | Pearson's r | 0.072 | — | ||||
| p-value | 0.676 | — | |||||
Pearson’s correlation in jasp
Graphics
Hypothesis 2
There will be an association between words read correctly and caffeine consumption while completing the Stroop Task without music.
Nondirectional research question
IV: CAFFEINE_CONSUMED_OZ_NM (caffeine consumed: no music condition)
DV: WORDS_READ_CORRECTLY_NM (words read correctly; no music condition)
Type of Variables will denote what test you can use.
CAFFEINE_CONSUMED_OZ_NM = scalar; WORDS_READ_CORRECTLY_NM = scalar
Type of test = Pearson’s Correlation
Pearson’s correlation in jasp
Things to check
Pearson’s correlation in jasp
Results
| Pearson Correlations | |||||||
| CAFFEINE_CONSUMED_OZ_NM | WORDS_READ_CORRECTLY_NM | ||||||
| CAFFEINE_CONSUMED_OZ_NM | Pearson's r | — | |||||
| p-value | — | ||||||
| WORDS_READ_CORRECTLY_NM | Pearson's r | 0.205 | — | ||||
| p-value | 0.229 | — | |||||
Pearson’s correlation in jasp
Graphics
Hypothesis 3
There will be a difference between the number of words read correctly in the music vs no music conditions.
Nondirectional research question
IV: CONDITION (Music vs No Music)
DV: WORDS READ CORRECTLY
Type of Variables will denote what test you can use.
CONDITION = categorical; WORDS READ CORRECTLY = scalar
Type of test = Paired Samples t-test
Paired Samples t-Test in jasp
Things to check
Paired Samples T-Test in jasp
Results
| Paired Samples T-Test | |||||||||||
| t | df | p | Mean Difference | SE Difference | Cohen's d | ||||||
| WORDS_READ_CORRECTLY_NM | - | WORDS_READ_CORRECTLY_M | WORDS_READ_CORRECTLY_M | 0.244 | 35 | 0.808 | 0.833 | 3.412 | 0.041 | ||
| Note. Student's t-test. |
Your turn!
What do you hypothesize?