1 / 3100%
If alreadyf havef usedf visualizationsf inf myf dataf prepf process,f andf thef
moref in-depthf dataf analysisf willf requiref additionalf visualizations,f
movingf fromf thef largelyf univariatef analysis-focusedf visualizationsf
thatf informedf dataf cleansingf andf otherf preparationf decisions,f tof
usingf visualizationsf tof performf multivariatef analysis.f Scatterplotsf willf
bef usedf tof lookf forf visualf cluesf aboutf thef relationshipsf betweenf thef
variablesf off interestf (thef setsf off variablesf thatf describe:f Medicaidf
expansionf status,f healthf insurancef coverage,f andf Medicaidf
enrollment)f andf thef outcomef variablesf (healthf outcomes,f educationf
outcomes,f andf employment/incomef outcomes.f Rf offersf af relativelyf
simplef functionf forf creatingf af scatterplotf matrix,f calledf pairs,f thatf canf
operatef onf thef entiref dataframef tof generatef pairwisef scatterplots.f
Thesef visualizationsf mayf offerf additionalf insightf intof eitherf variablesf
thatf canf bef removedf (tof cutf downf onf dimensionality)f orf tof atf leastf
prioritizef thef relationshipsf tof testf inf thef models.
Thef initialf setf off predictivef modelsf thatf willf bef usedf aref associationf
analysisf (usingf thef Apriorif algorithmf inf R),f thef conditionalf inferencef
treef algorithmf (alsof inf R),f andf Ensemblef modelsf inf SASf Enterprisef
Miner.f Inf allf cases,f thef modelsf willf bef runf onf af randomlyf generatedf
subsetf off thef dataf (trainingf data)f andf thenf validatedf onf af separatef
subsetf (test/validationf dataf set).
Thef Apriorif methodf willf requiref discretizationf off nearlyf allf off thef
variables,f whichf aref generallyf continuous,f numericf variables.f Thesef
variablesf willf bef dividedf intof groupsf usingf thef “discretize”f functionf inf
R,f primarilyf withf equalf intervalf orf clusterf methods,f givenf thef
skewnessf off thef data,f thef sizef off thef dataset,f andf thef difficultyf off
identifyingf breakpointsf manually.f Thef Apriorif methodf forf generatingf
associationf rulesf isf applicablef tof thisf projectf becausef itf doesf notf
comprehensivelyf modelf everyf combinationf off everyf levelf off thef
discretizedf variablesf tof identifyf strongf associations.f Thef methodf
identifiesf onlyf evaluatesf thosef relationshipsf thatf meetf thef user-
determinedf minimumf supportf levelf tof thenf outputf af confidencef level,f
thatf againf mustf meetf af user-determinedf threshold.f Thef strengthf off
thef correlationf isf capturedf inf thef liftf value:f greaterf thanf orf lessf thanf 1f
meansf thatf theref isf somef dependencyf (positivef orf negative,f
respectively)f betweenf thef twof items,f whilef af liftf valuef off 1f meansf thef
twof eventsf aref independent.f Usingf liftf helpsf mitigatef thef potentiallyf
misleading/overstatingf off relationshipsf thatf aref simplyf frequent,f butf
notf statisticallyf correlated.f Thesef modelsf shouldf offerf furtherf insightf
(buildingf onf thef scatterplots)f tof identifyf relationshipsf thatf notf onlyf
appearf visually,f butf thosef whichf mayf bef hiddenf byf relativelyf fewerf
casesf orf thosef whichf mayf bef overstatedf withf af simplef one-to-onef
mapping.
Supervisedf classificationf modelsf usingf conditionalf inferencef decisionf
treesf shouldf refinef classf membershipsf thatf mayf showf upf lessf clearlyf
inf Apriorif results.f Thef conditionalf inferencef approachf operatesf inf af
strictf if/thenf structuref thatf resolvesf thef limitationsf off Apriorif models,f
whichf allowf forf overlappingf classf membershipsf thatf canf muddyf
insightsf andf createf ambiguityf inf predictionsf off futuref casef outcomes.f
Thisf modelf willf notf requiref discretizationf off thef variables,f becausef
conditionalf inferencef treesf canf takef bothf discretef andf continuousf
variables.f Thisf typef off approachf hasf thef addedf benefitf off beingf
agnosticf tof skewedf orf normalf distributions,f whichf isf relevantf givenf
thef skewnessf off severalf off thef positedf outcomesf off interest,f andf off
participationf inf Medicaidf expansionf itself.f Thisf approachf isf alsof
valuablef becausef itf doesf notf requiref thef explanatoryf variablesf tof bef
independent.f Forf example,f thisf projectf doesf notf evaluatef whetherf thef
decisionf tof expandf Medicaidf orf notf isf predicatedf onf af particularf
incomef levelf orf setf off healthf outcomesinf otherf words,f thef causalityf
off thef relationshipf mayf bef thef reversef off whatf isf beingf tested.f Thisf
issuef makesf conditionalf inferencef particularlyf useful.f Becausef
conditionalf inferencef treesf canf handlef collinearf modelsf andf selectf thef
bestf predictor,f theyf alsof offerf thef benefitf off beingf ablef tof takef allf orf
mostf off thef variablesf identifiedf thef previousf sectionf hasf havingf somef
degreef off collinearity,f ratherf thanf requiringf thef researcherf tof makef
thef choicef orf testf multiplef combinations.
f f f f f f f f f f f f Finally,f Ensemblef Modelsf generatedf inf SASf Enterprisef Minerf
willf alsof bef used.f Thef decisionf treesf andf boostedf treesf inf thef randomf
forest,f gradientf boosting,f andf baggingf modelsf are,f likef conditionalf
inferencef trees,f typicallyf robustf tof asymmetricf orf unbalancedf
datasets.f SASf Enterprisef Minerf offersf thef advantagef overf Rf off beingf
ablef tof relativelyf quicklyf generatef modelf results,f whichf allowsf thef
userf tof tryf differentf mechanismsf tof tweakf (andf improve)f thef results.f
Forf example,f thef modelsf canf bef tunedf withf adjustedf cutofff
thresholdsf orf costf adjustmentsf tof improvef predictivef strength.f Thef
powerf off thef SASf Enterprisef Minerf applicationf isf thatf itf enablesf thef
identificationf off optimalf thresholdsf that,f inf turn,f increasef thef
robustnessf off thef model’sf predictivef strengthf acrossf multiplef
evaluativef dimensions:f recall,f specificity,f andf precisionf (inf additionf tof
generalf accuracy).f Thef applicationf alsof enablesf comparisonsf off moref
andf lessf complexf models,f tof ultimatelyf arrivef atf af predictivef modelf
thatf isf asf simplef asf possiblef withf asf robustf anf outcomef asf possible.f
Thef benefitsf off thisf toolf aref directlyf relevantf tof thef projectf atf handf
becausef off thef sizef andf complexityf off thef dataset,f asf describedf
previously;f thef apparentf collinearityf amongf thef variablesf off interest;f
andf thef needf tof bef ablef tof tellf asf clearf anf analyticf storyf asf possiblef tof
informf healthf caref policy.
Thisf lastf itemf isf particularlyf importantf andf particularlyf well-servedf byf
thef abilityf availablef throughf SASf EMf tof iteratef throughf dozensf off
versionsf off modelsf iff necessary,f tof arrivef atf thef mostf understandable,f
explainable,f andf actionablef implications.f thef ultimatef goalf off thisf
projectf isf tof determinef what,f iff any,f relationshipf existsf betweenf thef
policyf decisionf andf implementationf off Medicaidf expansionf andf
positivef societalf results,f inf orderf tof informf futuref policyf choicesf onf
healthf care.
Students also viewed