week3.pdf

Identify the types of graphs and statistics that are appropriate for analysis of variables at each level of measurement.

I was fortunate enough to have an internship this summer at the Attorney General's office and work in their special crtmes untt, focus- ing on gang participatton, statistics 0n crtme, and preparing informa- tional booklets for the office,law enforcement, and the community. One of the main reasons I got this rnternship was because I took a research methods class and knew how to use SPSS to analyze and interpret data,

Ricky E., Stud,ent

"n h no, not dataanalysis and statistics!" We now hit the chapter thatyou \r;, may have been fearing all along, the chapter on data analysis and the use of

statistics. This chapter describes what you need to do after your data have been collected. You no\M need to analy ze what you have found, interpret it, and decide how to present your data so that you can most clearly make the points you wish to make.

What you probably dread about this chapter is something that you either sense or know from a previous course: Studying data analysis and statistics will lead you into that feared world of mathematics. We would like to state at the beginning, however, thatyou have relativelylittle to fear.The kind ofmathematics required to perform the data analysis tasks in this chapter is minimal. If you can add, subtract, multiply, and divide and are willing to put some effort into carefully reading the chapter, you will do well in the statistical analysis of your data. In fact, it is our position that the analysis of your data will require more in the way of careful and logical thought than in mathematical skill. One helpful way to think of statistics is that it consists of a set of tools that you will use to examine your data to help you answer the questions that motivated your research in the first place. Right nolv, the toolbox that holds your statistical tools is fairly empty (or completely empty). In the course of this chapter, we will add some fundamental tools to that toolbox. We would also like to note from the start that the kinds of statistics you will use on criminological data arc very much the same as those used by economists, psychologists, political scientists, sociologists, and other social scientists. In other words, statistical tools are statistical tools, and all that changes is the narure of the problem to which those tools are applied.

This chapter will introduce several common statistics in social research and highlight the factors that must be considered in using and interpreting statistics. Think of it as a review of fundamental social statistics, if you have already studied them, or as an introductory overview, if you have not.

2. List the guidelines for constructin g frequency distributions.

; Discuss the advantages and disadvantages of using each of the three measures of central tendency.

'r:[" Understand the difference

A NA LYZIN G QUA N.TITATI VE DATA

Tivo preliminary sections lay the foundation for studying statistics. In the first, we will discuss the role of statistics in the research process, returning to themes and techniques you abeady know. In the second prelimi- nary section, we will outline the process of acquiring data for statistical analysis. In the rest of the chapteE we will explain how to describe the distribution of single vari- ables and the relationships among variables. Along the wa[twe will address ethi;l issues related to data analysis. This chapter will be successful if it encourages you to use statistics responsibly and evaluate them critically and gives seek opportunities for extending your statistical knowledge.

you the confidence necessary to

It should be noted that in this chapter, we focus primarily on the use of statistics for descriptive purposes. Those of you looking for a more advanced discussion of statistical methods used in criminal justice and criminology should seek other textbooks (e.g., Bachman and Paternoster 2015). Although many colleges and universities offer social statistics in a separate course, we don't want you to think of this chapter as something that deals with a different topic than the rest of the book. Data analysis is an integral component of research methods, and it's important that any proposal for quantitative research include a plan for the data analysis that will follow data collection.

I NTRODUCI N G STATI STICS

Statistics play a key role in achieving valid research results in terms of measurement, causal validity, and generalizability. Some statistics are useful primarily to describe the results of mea- suring single variables and to construct and evaluate multi-item scales. These statistics include frequency distributions, graphs, measures of central tendenry and variation, and reliability tests. Other statistics are usefirl primarily in achieving causal validity by helping us describe the association amongvariables and control for or otherwise take into account othervariables.

Cross-tabulation is one technique for measuring association and controlling othervari- ables and is introduced in this chapter. All these statistics are called descriptive statistics because they are used to describe the dismibution of and relationship among variables.

You learned in Chapter 5 that it is possible to estimate the degree of confidence that can be placed in generalizations for a sample and for the population from which the sample was selected. The statistics used in making these estimates are called irrferential statistics, and they include confdence intet'aals, to which you were exposed in Chapter 5. In this chapter, we will refer only briefly to inferential statistics, but we will emphasize later in the chapter their importance for testing hypotheses involving sample data.

Criminological theory and the results ofprior research should guide our statistical plan or analytical strategy, as they guide the choice of other research methods. In other words, we want to use the statistical strategy that will best answer our research question. There are so many particular statistics and so many ways for them to be used in data analysis that even the best statistician can become lost in a sea of numbers if she or he is not using prior research and theorizing to develop a coherent analysis plan, It is also important for an analyst to choose statistics that are appropriate to the level of measurement of the variables to be analyzed. As you learned in Chapter 4, numbers used to represent the values of variables may not actu- ally sig'nify different quantities, meaning that many statistical techniques will be inapplicable. Some statistics, for example, will be appropriate only when the variable you are examining is measured at the nominal level. Other kinds of statistics will require interval-level measure- ment. To use the right statistic, then, you must be very familiar with the measurement proper- ties of your variables (and you thought that stuffwould go awayl).

Frequency distri butions:

l4tnis * r i *al di sr:l a:,1 sh *,*i * g

tlt* n*{r*s*r *{, r:;:s*r;,frfit} t t r, L t.t 11', ih r, r^ n v r.

^ 1 t+ 4 )^1,1

U I' Li {l t l'l I t 1* 1}fr 1 l.tfr 111.(l'-}V

*t *ay,rt* fitt* r*latrc* I r * qu rtrs r:i **j, * {fi r *fi {} * fi {l tri fr t* *nth,i,?lit* {}t fi{{tuft t}l ,l a\**'r rst a v aria?sl*.

Cross-tabu lation (cross-tab):

It ?: r,i t:,r i *.t* #,w {t - ri'a r i a.l,rl * I riirt rilxstirat s'rrrtrxin t lh * rli*tri*uli*n *f *n* ur,riabl* t*r t:at*t *?io#{}r\t t}l nrti}[ll*{ ,t ry ri tlrl * u(11 1lj.tJt\.J.

Descriptive statistics:

$tetisti ,:,fr u**il L* 4**r:ribrt th* *irfili'*t:lirsn *l a*d ,^,',1 n ii.,r". *l,i r. (\ ffi r t ri t0 t.tilr i,r i r r * rs.t tnil,ltl

., r, ni ,r,'l ",

n v *_1 i fi,tJltj?:t.

i nferential statistics:

|,fr ath *ut'*:,i *;tl t rs *l r: ! * r *nlnfinlittg h*w lilr,*l,t tl is th*t a *lati*tir;al r**tsllbr,**d ti{1 t.irJt?ltr*rfi a {zt\d{J{{i

li?it:t{}l t1 iti 1 q;S;v slti*lltitttU * {}f thr: p *p*latt*n t r r:*i'u rilr:lt i,rie ';a*s'51'* i';; ar:*ttrtrcd ttt

h'rsv r* ** *rr **1* ct*tj,

CHAPTER 14 O INALYZING QUANTITATIVE DATA 405

CASE STUDY

The Causes of Delinquency In this chapter, we will use research on the causes of delinquenry for our examples. More specifi- cally, our data will be a subset of a much larger study of a sample of approximately 1,200 high school studene selected from the metropolian and zuburban high schools of t oity in South Carolina. These students, all of whom were in the tenth grade, completed a questionnaire that asked about such things as how they spent their spare time; how they got along with their par- en6, teachers, and friends; their attitudes about delinquenqz; whether their friends committed delinquent acts; and their own involvement in delinquenry. The original research study was designed to test specific hypotheses about the factors that influence delinquency. It was predicted that delinquent behavior would be affected by such things as the level ofsupervision provided by parens, the studens' own moral beliefs about delinquenry, their involvement in conven- tional activities zuch as studying and watching T! their fear of getting caughg their friends' involvement in c,rime, and whether these friends provided verbal zupport for delinquent acs. All these hypotheses were derived from extant criminological theory theories we have referred to throughout this book One specific hypothesis, derived from deterrence theory predicts that youths who believe they are likely to get caught by the police for committing delinquent ace are less likely to commit delinquency than others. This hypothesis is shown in Exhibit 14.1. The variables &om this study that we will use in our chapter examples are displayed in Exhibit 14.2.

PREPARI NG DATA FOR ANALYSIS

If you have conducted your own survey or expeqlment, your quantitative data must be prepared in a format suitable for com- puter entry. You learned in Chapter 8 that questionnaires and interview schedules can be precoded to facilitate data entry by representing each response with a unique number. This method allows direct entry of the precoded responses into a computer file, after responses are checked to ensure that only one valid answer code has been circled (extra written answers can be assigned their own numerical codes). Most survey research orga- nizations now use a database management program to control data entry. The program prompts the data entry clerk for each response, checks the response to ensure that it is a valid response for that variable, and then saves the response in the data file. Not all studies have used precoded data entry, however, and individ- ual researchers must enter the data themselves. This is an ardu-

ous and time-consuming taslg but not for us ifwe use secondary data.After all, with secondary data, we get the data only after they have been coded and computerized.

Of course, numbers stored in a computer file are not yet numbers that can be analyzed with statistics. After the data are entered, they must be checked carefully for errors, a process called data cleaning. If a data entry program has been used and programmed to flag invalid values, the cleaning process is much easier. If data are read in from a text file, a computer program must be written that defines which variables are coded in which columns, attaches meaningfirl labels to the codes, and distinguishes values representing missing daa. The pro- cedures for doing so vary with each specific statistical package. We used the Windows version ofthe Satistical Package for the Social Sciences (SPSS) for the analysis in this chapter; you will find examples of SPSS commands required to define and analyze daa on the Snrdent Study Site for this text (http://edge.sagepub.com/bachmanprccj6e).

Data cleaning:

The process ol *herking data Irrr *{r*rs afi.*r t** r)atnhau* b*e* *fii*retl ift a c*rrrFutCIr fi!0,

406 SECTION V . AFTER THE DATA ARE CoLLECTED

Exhibit L4.L Hypothesis for Perceived Fear of Being Gaught and Delinquency

,,,,,,,,,,,,,,,[#,,1i.1,[r],[,S,fi..llf]|,€..[11t,.,,.i:,:,,:i,::,::,::,::,i,:,,,i

..,.,.,,.,,.i[.,O,,lli...O.Jil,fl,,I(,,l,li:l€I.,,,,,,,:::::::r:::::'::':: :::::::::,:

Ggh. .b,.r

Fj.r. I lu.n,d,.s.''..t.n.1..nln

u tr.l,..n'k..1 H.g..,.i.lIll l...1..

lvronO: : :

:: t' . r l:.:l l:

LOSI OI. . :

I i:, :: r ,: r i: I i:: i:: | ! I i: ! i i: , i: t: ::

j i:: i i i:. i i:

r i:

r :

i ! r

:: i i I i I i: I I i:: i::: i::

vAin: iiA::l.ii: sriffi.

ii::;iiiiiilx.lr:iE |il ili,r]ilia*g;:1l1l1l;l,llilliiiiiiitiiiiiiiliiii:ii:ili

ii,,irir,iiSr$,i0;g,ll.Vr;l:,S:t,Q;l,l.;,::::::::::::::::::i:::;:::,,::,::::::

::!:r,!:rl:::!:ti::i:;:i:::i!:ir:r!lr:t!::i:i!:i:i:::l!:!:r::,:::,:ii:::r:ir:i .i,:,::i:::,i:1::r::i:i :!,,:!.,::i:::!::i::i:i,:ii:!:!:r:i::i::l

iiiF,,,,,,,,,,,,11 iiiieiili)i.oi. :iSi

::]::]: :] :::::::i::]]:: :::::':]:: '::::::::::::::] :

.iO;pi1li;il1}1.f.ioin ;1.1;;

| : i ::ri n

,ue ftalnt\{ or

.lp.:U..h.il$l.hl.ffi:e lt

::,!,::!:::::::'l+:i,lnilliitr:]r:l:,;4x,::l::cf !:i::::,:,:::.::'::::::::::::::::::::i::::::,;::',::'::,',::.,:,::i::,:::': : ::: ::: ::: :::li::: ::t:.::.:l::.E, ::l::: :! :!i;l:l:::::\r-:l::::i:l:::: :t: ::::::' ::::::.j:: .::::li:::j:l: ::: ::::::: :li::: :: i.

:t::::l::i..i.:t0O.[it,&Il::lo.fi:::,:::::,::,:.,:,:,:,:,:,:,:,,t,,,,,,,,,,,,,,,,,,,,,,,,:,:,,

How :rnuch :Would t;h:e:,respond,€,nt,'s,chances of haVi,ng, good

A,d,d e,d :: SC a t e t:.h:a:t a,s k S,the,, red po nd e ht,, if,,,h i S o r h e r, b,e.St

f r i en d,1 t h,! ug,ht t hat c o,m m, itt,i n g, ,,

va r i,o,u:s d e l i n Q ue,nt a,ct,s

A h,ig,h.SCore.means more,.s,uppoit,b,y f r::i,e.nd.S:fo:i , i ,',',, ,

thinks it iS that he:or she,wil:l b:e cau,grh,t,by. pol,ic:e i.f,,he:o,r,she

.d e l:1 h Q ue nt adts, i,n,,t he past, Ve a f,.T he h,ig. h e r' t h6. sdore,,t h,e

CHAPTER 14 o ANALYZTNG QUANTITATIVE DATA 407

Exhibit 14.2 List of Variables for Glass Examples of Causes of Delinquenry

Central tendency:

h {pnt*rr, *t n vrtrinhl*'t. rJt*trifuuti*{i', tilt*rs l* titr,: ll;rl u f; *{ u alu*ti ar'\}Ltnd vthi*Nt c'as*?, l*rrrl 1.* * * rf, qr

Variability: tt {nr^ir1rrr (t{ * ttr\L,:,,lrIn:n

rlt*trihesti*n, r*tfrt,\ l* !it* *v.tnr,t l.*,athi*h nr *ft.\ n{fr

U:) { *ail *u1 tlu ** Ul1 i.h'd rli+trilrt*inn nr t ltr"tt,rt'fi irt

L.llttl.l l*4iiQl, \)l Vt71\Jl.\)t lrTJ 1lJ

{x1iT {}t1* Irstati**,

Skewness:

tt f *nusr* *t a uariabl*'s

r,,',i',.ttjr'r rr rn'iri rr1. rt,, :i t\t\ 11 y 1 \

^1..^-^..^ ^i .-^ t ..-- ^ ....,i^ ^

. ;.1.. .. .. .. ... ..1 ...1 '.i.. . ..: . .."...1- ...; : ... ,.tl\tir)i *i11,, ijl tll* a;t\-.jt:171!Jl,i1l

,.r,,lr r.,r. ilr-,.- ., r,rr,,, ri i1.,.. | .111t!"1 t\1/tit /rt !i1:lilt tiJl".

rririft*:,

DISPLAYI NG U N IVARIATE DISTRI BUTIONS

The first step in daa analysis is usually to display the variation in each variable of interest in what are called uniaariate frequenry distrib,utions. For many descriptive purposes, the analysis may go no further. Frequency distributions and graphs of frequency distributions are the two most popular approaches for displaying variation; both allow the analyst to display the distribution of cases across the value categories of a variable. Graphs have the advantage over numerically displayed frequenry distributions because they provide a picture that is easier to comprehend. Frequency distributions are preferable when exact numbers of cases with par- ticular values must be reported and when many distributions must be displayed in a compact form.

No matter which type of display is used, the primary concern of the data analyst is to accurately display the disribution's shape-that is, to show how cases are distributed across the values of the variable. Three features of the shape of a distribution are important: central tendency, variability, and skewness (lack of qymmetry). A1l three of these features can be represented in a graph or in a frequenry distribution.

These features of a distribution's shape can be interpreted in several different ways, and they are not all appropriate for describing every variable. In fact, all three features of a dis- tribution can be distorted if graphs, frequenry distributions, or summary statistics are used inappropriately.

Avariable's level of measurement is the most important determinant of the appropriate- ness of particular statistics. For example, we cannot talk about the skewness (lack of symme- try) of a qualitative variable (measured at the nominal level). If the values of a variable cannot be ordered from lowest to highest or if the ordering of the values is arbitrary we cannot say whether the distribution is qnnmetric, because we could reorder the values to make the dis- tribution more (or less) qrmmetric. Some measures of central tendency and variability are also inappropriate for qualitative variables.

The distinction between variables measured at the ordinal level and those measured at the interval or ratio level should also be considered when selecting statistics to use, but social researchers differ on how much importance they attach to this distinction. Many social researchers think of ordinal variables as imperfecdy measured interval-level variables and believe that in most circumstances, statistics developed for interval-level variables also pro- vide useful summaries for ordinal variables. Other social researchers believe that variation in ordinal variables will often be distorted by statistics that assume an interval level of mea- surement. We will touch on some of the details of these issues in the following sections on particular statistical techniques.

We will now examine graphs and frequenry distributions that illustrate these three fea- tures of shape. Summary statistics used to measure specific aspects of cenral tendenry and variability will be presented in a separate section. There is a suilrmary statistic for the mea- surement of skewness, but it is used only rarely in published research reports and will not be presented here.

Graphs

It is true that a picture often is worth a thousand words. Graphs can be easy to read, and they very nicely highlight a distribution's shape. They are particularly usefirl for exploring data because they show the full range of variation and identify data anomalies that might be in need offurther study. And good, professional-looking gtaphs can now be produced relatively easily with software available for personal computers. There are many types of graphs, but the most cornmon and most useful are bar charts and histograms. Each has two axes, the vertical axis (7-axis) and the horizontal axis (r-axis), and labels to identifiz the variables and the values

AFTER THE DATA ARE COLLECTED408 SECTION V

with tick marls showing where each indicated value falls along the axis. The vertical y-axis ofa graph is usuallyin frequenry or percentage units, whereas the horizontal r-axis displays the values of the variable being graphed. There are different kinds of graphs you can use to descriptively displayyour data, depending upon the level of measurement of the variable.

Abar chart contains solid bars separated by spaces. It is a good tool for displaying the distribution ofvariables measured at the nominal level and other discrete categorical variables because there is, in effect, a gap between each ofthe categories. In our study ofdelinquency, one of the questions asked of respondents was whether their parents lnew where the respon- dents were when the respondents were away from home. We graphed the responses to this question in a bar chart, which is shown in Exhibit 14.3. In this bar chart, we report both the frequenry count for each value and thepercentage of the total that each value represents. The chart indicates that very few of the respondents (only 16 ll.3o/")) reported that their parents neoer lrr:.ew where the respondents were when the respondents were not at home. Almost one half (562 l++3%)) of the youths reported that their parents usaally llc;,ew wherc the respondents were. What you can also see, by noticing the height of the bars above u.su- ally and always, is tJrat most youths report that their parents provide very adequate supervi- sion. You can also see that the rnost frequent response was wually and the least frequent was nezter.Becarrse the response asaally is the most frequent value, it is called themode or mod.al response. With ordinal data such as these, the mode is the most appropriate measure of cen- tal tendency (more about this later).

Notice that the cases tend to cluster in the two vahes of usually and alutays; in fact, about 80% of all cases are found in those two categories. There is not much variability in this dis- tribution, then.

Ahistogram is similar to a bar chart, but it has bars that are aqacent, or right next to each other with no gaps. This is done to indicate that data displayed in a histogram, unlike the data in a bar chart, are quantitative variables that vary along a continuum (see the discussion of levels of measurement for variables in Chapter 4). Exhibit 14.4 shows a histogram from the delinquenry daaset we are using. The variable being graphed is the number of hours per week the respondent reported to be studying. Notice that the cases cluster at the low end of the values. In other words, there are a lot of youths who spend between 0 and 15 hours per week studying. After that, there are only a few cases at each different value, with spikes occurring

^t25,30,38, and 40 hours studied. This distribution is clearly not symmetric. In a

symmetric distribution, there is a lump of cases or a spike with an equal number of cases to the left and right of that spike. In the distribution shown in Exhibit 14.4, most of the cases are at the left end of the distribution (i.e., at low values), and the distribution trails offon the right side. The ends of a histogram such as this are often called the tail of a distribution. In a symmetric distribution, the left and right tails are approximately the same length. As you can clearly see in Exhibit l4.4,however, the right tail is much longer than the left tail. When the tails of the distribution are uneven, the distribution is said tobe asymmetrical or skmted. A skew is either positive or negative. When the cases cluster to the left and the right tail of the distribution is longer than the left, as in Exhibit 14.4, ow variable distribution ispositively skewed. When the cases cluster to the right side and the left tail of the distribution is long, our variable distribution is negatively skewed .

If graphs are misused, they can distort rather dian display the shape of a distribution. Compare, for example, the two graphs in Exhibit 14.5. The first graph shows that high school seniors reported relatively stable rates of lifetime use of cocaine between 1980 and 1985. The second graph, using exactly the same numbers, appeared in a 1986 Newsuteek article on the coke plague (Orcutt and Tirrner 1993). To look at this graph, you would think that the rate of cocaine usage among high school seniors increased dramatically during this period. But, in fact, the difference between the two graphs is due simply to changes in how the graphs are drawn. In the "plague" graph (B), the percentage scale on the vertical axis begins at 15 rather t-han 0, making what was about a one-percentage-point increase look very big indeed.

Bar chart:

fi, qrnVl:i* t*r q*alitatru* v*.ria*l** i* wltit;h r"h* tt ar ialsi *'s d isr.r ils*li r:n i :; tlisflla,{ r,d l'v i th s* I id lsar s

ir *#arut*rj hu $#ff,*'t * .

Percentage:

W, *l atiu * I r * tyt *n *i *,o, * *r*ysut*, d'*,t diu i di* ts i.* *

tr*rtru*n*,; rst r,**** i* a

fia*io*lar *n\*r;*1y by lh* trslil rsur*?s*r *{ ta*** rt*tl rnrrlti r:fuinrs t:u 1*{},

H istoUram:

fr, rsrt"ssltt: t*r ryn*litrtriu* vari*"lsi** m whi*h tlt* t, t ri aln.l rrt r. rJi rri *i\t 14't nn i rt\!dll*,lJifr J Ul,)ttlui-tltuii ib

rli r;tsl *V * d wit* x.rj ln r: * rtt lsar *,

Positively skewed:

*r::i*ri1**ti a dirfi,ri**ti*rr in w?tir:lt t.h* r:a.r*g r,lt:r,l*r 1tt

tlz* l*lt a"*d th* riUln t'ail r:f tl:r: distrifysli*n i* l*ng*r than rh* l*tt.

Negatively skewed:

h * t *trils',sli * * i',t,u*i *ft r:ar: *:* rlu*l*r trr tft* riul* *i4* *nrj th* l*lttail *t t?t*, distribLrli** i'* lrtnq*r tharr th* rt7ht,

CHAPTER 14 O ANALYZING QUANTITATIVE DATA 409

() tr o e 3oo o L- IL

Usually Always

Do your parents know where you are when you are away from home?

Sometimes

t 1oo o =cr o Llr

50

15 20 25 30 35 40 45 50 55 60 65 70 75 Number of Hours per Week Spent Studying

In addition, omission from the plague graph of the more rapid increase in reported usage between 1975 ail l9S0makesitlookasifthetinyincreasein l9S5wereanewandthusmore newsworthy crisis.

Adherence to several guidelines (fufte 1983) will help you spot these problems and avoid them in your own work:

The difference between bars will be exaggerated ifyou cut offthe bottom of the vertical axis and display less than the full height of the bars. Instead, begin the $aph

SECTION V o AFTER THE DATA ARE COLLECTED410

Exhibit 74.3 Bar Ghart Showing Youths' Reponses on Parents Knornring Where They Are

Exhibit 14.4 Histogram

Area covered by graph below

1975 1976 1977 1 978 1979 1 980 1 981 1982 1 983 1 984 1 985

A. University of Michigan lnstitute for Social Research, Time Series for Lifetime Prevalence of Cocaine Use

17%

16%

15%

.g 20 (E o o

31s o ID 3 b 10

IJJ

o CD(Ec E o o L

to

1 980 1 981 1982 B. Final Stages of Construction

1 984 1 985

Source: Iames D. Orcutt and J. Blake Tirmer. "Shocking Numbers and Graphic Accounts.' Social Problems, 4O(2): t9O-2O6. Copyright @ 1993, The Society for the Study of Social Problems. Reprinted with permission fiom oxford University Press.

of a quantitative variable at 0 on both axes. It may at times be reasonable to violate this guideline, as when an age distribution is presented for a sample of adults, but in this case, be sure to mark the break clearly on the axis.

Bars of unequal width, including pictures instead of bars, can make particular values look as if th.y carry more weight than their frequency warrants. Nways use bars of equal width.

Either shortening or lenghening the vertical axis will obscure or accentuate the differences in the number of cases benveen values. The two axes should be of approximately equal length.

Avoid chart junk that can confuse the reader and obscure the distribution's shape (a lot of verbiage, numerous marks, lines, lots of cross-hatching, etc.).

411CHAPTER 14 o ANALYZING QUANTITATIVE DATA

Exhibit 14.5 TWo Graphs of Gocaine Usage

Base /U:

Th* t*tal nun:h*r rf *as*s in a disirihuticn,

Frequency Distributions

A frequency distribution displays the number, the percentage (the relative frequencies), or both for cases corresponding to each of a variable's values or a group of values. The compo- nents of the frequenry distribution should be clearly labeled, with a tide, a stub (labels for the values of the variable), a caption (identifiring whether the distribution includes frequencies, percentages, or both), and perhaps the number ofmissing cases. Ifpercentages are presented rather than frequencies (sometimes both are included), the total number of cases in the dis- uibution (theBase N) should be indicated (see Exhibit 14.6). Remember that a percentage is simply a relative frequenry. A percentage shows the frequenry of a given value relative to the total number of cases times 100.

Ungrouped Data

Constructing and reading frequency distributions forvariables with fewvalues is not difEcult. In Exhibit l4.6,we created the frequenry distribution from the variable Punisbrnentfor Drink- ing fotnd in the delinquency dataset (see Exhibit 14.2).For this variable, the study asked the youths to respond to the following question: "trIow much of a problem would it be if you went to court for drinking liquor while underage?" The frequency distribution in E:rhibit 14.6 shows the frequenry for each value and its corresponding percentage,

As another example of calculating the frequencies and percentages, suppose we had a sam- ple of 25 youths and asked them their gender- From this group of 25 youths, 13 were male and 12 werc female. The frequenry of males (symbolized here byfl would be 13 and the frequenry of females would be 12. The percentage of males would be 52o/o, cilc.ola;ted by f / the total number of cases x 100 (13 / 25 x 100 = 52%).The percentage of females would be 12 / 25 x 100 = 48"/o.

In the frequency distribution shown in Exhibit l4.6,you can see that only a very small number (14 out of 1,272) of youths thought that they would experi errce no problem if they were caught and taken to court for drinking liquor while underage. You can see that most-in fact, 1,009-of these youths (79.3y. of them) thought that they would have either a big prob- lem or a ztery big problern with this. If you compare Exhibit 14.6 to Exhibit 14.3 , you can see that a frequency distribution (see Exhibit 14.6) cm provide much of the same information as a gaph about the number and percentage of cases in avariable's categories. Often, however, it is easier to see the shape of a distribution when it is graphed. When the goal of a presentation is to convey a general sense of a variable's distribution, particularly when the presentation is to

No pfsSlem at all

Hard ly any problem

A ver:y big p,roblenr

Total

How much of a problem would it be if you went to court for drinking liquor while underage?

412 SECTION V O IFTER THE DATA ARE COLLECTED

Exhibitt4.6 Frequency Distribution

an audience not trained in statistics, the advantages ofa graph ourweigh those ofa frequency distribution.

Exhibit 14.6 is a frequenry distribution of an ordinal-level variable; it has a very small number of discrete categories. In Exhibit 14.7 , we provide an illustration of a frequency dis- tribution with a continuous quantiative variable. This variable is one we have already looked at and graphed from the delinquenry daa-the number of hours per week the respondent spent studying. Notice that this variable, similar to many continuous variables in criminologi- cal research, has a large number ofvalues. Although this is a reasonable frequency distribution to construct-you can, for example, still see that the cases tend to cluster in the low end of the distribution and are strung vray out at the upper end-it is a litde difEcult to get a good sense of the distribution of the cases. The problem is that there are too many values to eas- ily comprehend. It would be nice if we could simplifi, distributions such as these that have a large number of-different values. Well, we can. We can construct what is called a grouped frequency distribution .

Grouped Data

Many frequenry distributions, such as those in Exhibit l4.7,andmany graphs require group- ing of some values after the data are collected. There are two reasons for grouping:

1 . There are more than 15-20 values to begin with, a number too large to be displayed in an easily readable able.

2. The distribution of the variable will be clearer or more meaningfi.rl if some of the values are combined.

Inspection of Exhibit 14.7 should clarifi, these reasons. In this distribution, it is very difficult to discern any shape, much less the central tendency. What we would like to now do to make the features ofthe datamorevisible is change thevalues into intervals(arange) ofvalttes. For example, rather than having five separate valuer of 0, 1, 2, 3 , and,4 hours studiei per -weelq we can have a range ofvalues or an interval for the first value, such as 0-4 hours studied. Then we can get a count or frequenry of the number of cases (and percenage of the toal) that fall within that interval.

Grouped frequency

distribution:

h trr,rSurt**y rli*tribtttirs* in

whir;* tho rlata arc {}rfreniz*d

i*i* *at*#* ri rs, qtii**r i:*ca*sr th*rrt ard {fi*ra ualts** tltart *nn ** *nsilu di,*plaUrf. *r ***ata** ylt*

di*1ri',:u{i*rt rsl th* varial:l*

',;ill hc *l*ar*r {ff trt*r* rfi*:arti**i*1,

(Continued)

CHAPTER 14. ANALYZING QUANTITATIVE DATA 413

Exhibit 14;7 Frequency Disribution with Gontinuous Quantiative Data: Hours Sudied perWeek

(1o:l ,LL. ff ,D,f),, :4ul;: Ii!iil!ii!i!iIiIj

,H :: .: ).:

414 SECTION V . AFTER THE DATA ARE COLLECTED

Exhibit74.7 (Gontinued)

Once we decide to group values, or categories,we have to be sure that in doing so, we do not distoft the distribution. Adhering to the following guidelines for combining values in a frequency distribution will prevent many problems:

Categories should be logically defensible and preserve the distribution's shape.

Categories should be mutually exclusive and exhaustive, so every case is classifiable in one and only one category.

The first interval must conain the lowest value, and the last interval must contain the highestvalue in the distribution.

Each interval width, the number of values that fall within each interval, should be the same size.

There should be benveen 7 and 13 intervals. This is a tough rule to follow. The key is not to have so few intervals that your data are clumped or clustered into only a few intervals (you will lose too much information about your distribution) and not to have so many intervals that the data are not much clearer than an ungrouped frequenry distribution.

Let us use the data in Exhibit 14.7 on the number of hours snrdied by these youths to create a grouped frequenry distribution. We will follow a number of explicit steps:

Step 1. Determine the Number of lntervalsYouThinkYou Want. This decision is arbitrary but try to keep the number of intervals you have in the 7-13 range. For our example, let us say we initially decided we wanted to have l0 intervals. Q.{ote: If you do your frequency distribution and it looks too clustered or there are too many intervals, redo your distribution with a different number of intervals.) Dont worry; there are no hard-and-fast rules for the correct number of intervals, and constnrcting a grouped frequenry distribution is as much art as science.Just remember that the frequency distribution you make is supposed to convey information about the shape and central tendency ofyour data.

Step 2. Decide on the Width of the lnterval (Symbolized by W,). The interval width is the number of different values that fall into your interval. For example, an interval width of 5 has five different values that fall into it, say, the values 0, 1, 2,3 , ar,d 4 hours stufied. There is a simple formula to approximate what your interval width should be, given the number of intervals you decided on in the first step: Determine the range of the data, where the range is simply the highest score in the distribution minus the lowest score. In our data, with the number of hours studied, the range is 80 because the high score is 80 and the low score is 0, so range = 80 - 0 = 80. Then determine the width of the interval by dividing the range by the number of intervals you want from Step 1. We wanted 10 intervals, so our interval width would be w; = 80 / 10 = 8. We should therefore have an interval width of 8. If you use this simple formula for determining your interval width and you end up with a decimal, say 8.2 or 8.6, then simply round up or down to an integer.

Slep 3. Make Your First lnterval So That the Lowest Value Fatts lnto lt. Our lowest value is 0 (for studied 0 hours per week), so our first interval begins with the value 0. Now; if the beginning of our first interval is 0 and we want an interval width of 8, is the last value of our interval 7 (with a first interval of 0-7 hours), or is the last value of our interval 8 (with a first interval of 0-8 hours)? One easy way to make a grouped frequenry distribution is to do the following: Take the beginning value of your first interval (in our case, it is 0), and add the interval width to that value (8). This new value is the first value of your next interval. What we know, then, is that the first value of our first interval is 0, and the first value of our

CHAPTER 14 . ANALYZING QUANTITATIVE DATA 415

second interval is 8 (0-?, 8-?). This must mean that the last value to be included in our first interval is one less than 8, or 7. Our first interval, therefore, includes the range ofvalues 0-7. If you count the number of different values in this interval, you will find that it includes eight different values (0, l, 2,3, +, 5,6, 7). This is our interval width of 8.

Step 4. After Your First lnterval ls Determined, the Next lntervals Are Easy. They must be the same width and not overlap (mutually exclusive). You must make enough intervals to include the last value in your variable distribution. The highest value in our data is 80 hours per week, so we construct the grouped frequency distribution as follows:

0-7

8-t 5

r6_23

24-31

32-39

+047

48-55

56-63

64-7t

72-79

80-87

Notice that in order to include the highest value in our data (80 hours) we had to make 1l intervals instead of the 10 we originally decided upon in Step 1. No problem. Remember, the number of intervals is arbirary and this is as much art as science.

Slep 5, Count the Number or Frequency of Cases That Appear in Each I nterval and Their Percentage of theTotal. The completed grouped frequenry distribution is shown in Exhibit 14.8. Notice that this grouped frequency distribution conveys the important features of the distribution of these data. Most of the data cluster at the low end of the number of hours studied. In fact, more than two thirds of these youths studied less than 8 hours per week. Notice also that the frequenry of cases thins out at each successive interval. In other words, there is a long right tail to this distribution, indicating a positive skew because fewer youths studied a high number of hours. Notice also that the distribution was created in such a way that the interval widths are all the same, and each case falls into one and only one interval (i.e., the intervals are exhaustive and mutually exclusive). We would have run into trouble if we had intervals such as 0-7 and 7-14, because we would not know where to place those youths who spent 7 hours a week studying. Should we put them in the first or second interval? If the intervals are mutually exclusive, as they are here, you will not mn into these problems.

SUM MARIZI NG U N IVARIATE DISTRI BUTIONS

Summary statistics, sbmetimes calTed desriptizte stati.stics, focus attention on particular aspects of a distribution and facilitate comparison among distributions. For example, suppose you wanted to report the rate of violent crimes for each city in the United States with over 100,000 in population. You could report each city's violent crime rate, but it is unlikely that two cities would have the same rate, and you would have to report approximately 200 rates, one for each

SECTION V. AFTER THE DATA ARE COLLECTED416

around sample statistics, which you learned about in Chapter 5, relies on an interesting property of normal curves. Areas under the normal curve correspond to particular distances from the mean, expressed in standard deviation units. If a variable is normally distributed, 680/o of the cases will lie between plus and minus 1 standard deviation from the distribution's mean, and95"/" of the cases will lie between 1.96 standard deviations above and below the mean. Cases that fall beyond plus or minus 1.96 standard deviations from the mean are termed outliers. Because of this property, the standard deviation tells us quite a bit about a distribution, if the distribution is normal. This same property of the standard deviation enables us to infer how confident we can be that the mean (or some other statistic) of a population sampled randomly is within a certain range of the sample mean (see Chapter 5).

CROSS.TABU LATI N G VARIABLES

Most data analyses focus on relationships among variables to test hypotheses or to describe or explore relationships. For each of these pu4)oses, we must examine the association among two or more variables. Cross-tabulation (cross-tab) is one of the simplest methods for doing so. A cross-tabulation displays the distribution of one variable for each category of another variable; it can also be called a biaariate distibution. Cross-tabs also provide a simple tool for satistically controlling one or more variables while examining the associations among others. In this section, you will learn how cross-tabs used in this way can help test for spuri- ous relationships and evaluate causal models. Cross-tabulations are usually used when both variables are measured at either the nominal or the ordinal level-that is, when the values of both variables are categories.

We are going to provide a series of examples of cross-abulations from our delinquency data. In our first example, the independent variable we are interested in is the youth's gender (V1, see Exhibit 14.2), and the dependent variable is the youtht self-reported involvement in delinquent behavior (DELINQI). Jb use the delinquency variable in a cross-tabulation, however, we first need to recode it into a categorical variable. We will make three approxi- mately equal categories of self-reported delinquency: low; medium, and high. Using the SPSS recode command, we will create another variable called DEIINQ2 using the following recode commands:

(o-2=1) (3 - t3 =2)

(14-118=3)

Anyone who reported from zero to two delinquent acts is now coded as I or lmt delin- qaeflcy; anyone reporting from three to 13 delinquent acts is now coded as2 or rneilium delin- quency; and anyone reporting 14 or more delinquent acrc is now codedas3 or high delinquenE. If you were to do a frequency disgibution of this new vaiable, DELINQ2, you would see that there are three approximately equal groups.

We are interested in the relationship between gender and delinquency because ^

great deal of delinquency theory would predict that males are more likely to be delinquent than females. The gender of the youth is the independent variable, and the level of self-reported delinquency is the dependent variable.

Exhibit 14.13 shows the cross-abuladonof gmderwiilDELlNQ2. Some explanation of this table is in order. Notice that there are two values of gender (male and female) that com- prise the values in the two rows of the able, and three values of delinquency (low, medium, and bigh) that comprise the values in the three columns of the able. Cross-abulations are

SECTION V o AFTER THE DATA ARE COLLECTED426

0

680

100%

usually referred to by the number of rows and columns the table has. Our cross-tabulation in Exhibit 14.13 is r 2 x 3 (pronounced "two-by-three") table because there are two rows and three columns. Notice also that there are values at the end of each row and at the end of each column. These totals are referred to as the ruarginals of the table. These marginal distributions provide the sum of the frequencies for each column and each row of the table. For example, there are 680 females in the data and 592 males. These row marginals should srrm to the total number of youths in the dataset: l,272.There are 450 youths who are low in delinquenry, 348 youths who are medium in delinquencS and4T4yotths who are high on the delinquencyvariable.These column marginals should also sum to the total number ofyouths in the dataset:1,272.

Now, notice that there are 2 x 3 or 6 data entries in the table (let us ignore the percentages for now). These data entries are called the cells of the cross-tabulation and represent the joint distribution of the two variables: gender and delinquenry. The table in Exhibit 14.13 has six cells for the joint distribution of two levels of gender with three levels of delinquency. In other words, notice where the value for female converges with the value of low for delinquenry. You see a frequency number of 27 5 in this cell. This frequenry is how many times there is the joint occurrence of a female and low delin- quency; it shows rhat 27 5 females were also low in delinquency. Moving to the cell to the right of this, we see that there arc 182 females who were medium in delinquency, and moving to the right again we see that t-here are 223 females who were high in delinquency. The sum of these three numbers is equal to the total number of females, 680. The row for the males shows the joint distribution of males with each level of delinquency.

What we would like to know is whether there is a relationship or an association between gender and delinquenry. In other words, are males more likely to be delinquent than females? Because raw frequencies can provide a deceptive picture, we determine whether there is any relationship between our independent and dependent variables by looking at the percentages. Keep in rnind that the idea in looking at relationships is that we want to know if variation on the independent variable has any effect on the dependent variable. To determine this, what we always do in cross-tabulation tables is to calculate our percentages on each value ofthe independent variable. For example, notice that in Exhibit 14.L3, gender is our independent variable. We calculated our percentages so that for each value of gender, dre percentages sum to 100% at the end of each row. The percentages for both females and males, therefore, sum to L00o/o rt the end of the row. We take a given category of the dependent variable and ask what percentage of each independent variable value falls into that category of the dependent variable. Another

Marginal distributions:

The surrrfia{\i {}i*tri huti ons

in th* rnfirgins of A ilro$$- tahr:lati*n thnt r;*rrespond tfi

th* lr*qu*ri*y di$trib*tirsn *t thc raur vxrta,*l* and *t tlt* **itjtrfi variabl*,

CHAPTER14 . ANALYZING QUANTITATIVE DATA 427

Exhibit 14.13 Gross-Tlabulation of Respondents' Gender by Delinquenry

way to say this is that we calculate our percentages on the independent variable and compare tfiem to percentages on the dependent variable. We compare the percentages for different levels of the independent variable on the same category or level of the dependent variable.

In Exhibit l4.L3,for example, notice that 40.4o/o of the female youths were low in delin- quency, but only 29.6% of the males were low. This tells us that females are more likely to be low in delinquenry than males. Now let us look at. the high category. We can see that 32.8"/" of the females were high in delinquenry atd 42.4"/" of the males were high. Together, this tells us that females are more likely to be low in delinquenry and males are more likely to be high in delinquency. There is, then, a relationship between gender and delinquency. Also notice that the independent variable was the row variable and the dependent variable was the column variable. It does not always have to be this way; the independent variable could just as easily have been the column variable. The important general rule to remember is to always calculate your percentages on the levels of the independent variable (e.g., use marginal toals for the independent variable as denominators), and compare percentages on a level of the dependent variable.

In Exhibit l4.l4,we report the same data as in Exhibit 14.13, this time switching the rows and the columns. Now, the independent variable (gender) is the column variable, so we calculate our percentage going down each of the two columns. We then compare percentages across rows. For example, we still see that 40.4% of the females were low in delinquenry, whereas only 29.6"/" of the males were. And 42.4o/o of the males were high in delinquency, but only 32.8% of the females were high in delinquency.

Describi ng Association

A cross-tabulation table reveals four aspects of the association between two variables:

Existence.Do the percentage distributions vary at all among categories of the independent variable?

Strength. How much do the percen tagedistributions varyamong categories of the independent variable?

Direction.For quantitative variables, do values on the dependent variable tend to increase or decrease with an increase in value of the independent variable?

Pattern For quantitative variables, are changes in the percentage distribution of the dependent variable fairly regular (simply increasing or decreasirg), or do th.y .vary (perhaps increasing, then decreasing, or perhaps gradually increasing, then rapidly increasing)?

Exhibit 14.14 shows that an association exists between delinquency and gender, although we can say only that it is a modest association. The percentage difference at the low and high ends of the delinquencyvariables is approximately 10 percenage points.

We provide anot-her example of a cross-tabulation in Exhibit 14.15. This is a 3 x 3 table that shows the relationship beween how morally wrong a youth thinks delin- quency is (the independent variable) and his or her self-reported involvement in delin- quency (the dependent variable). This table reveals a very strong relationship between moral beliefs and {elinquency. We can see that 5.6"/o of youths with weak moral beliefs are low on delinquency; this increases to 33.8"/" for those with medium beliefs and to 62.8% for those with strong moral beliefs. At the high end, over two thirds (72.1%) of those youths with weak moral beliefs are high in delinquency,29.4o/o of those with medium moral beliefs are high in delinquenry, and only 16.9% of those youths with

AFTER THE DATA ARE COLLECTED428 SECTION V

.il:0,CI.7o

strong moral beliefs are high in delinquency. Clearly, then, having strong moral beliefs serves to effectively inhibit involvement in delinquent behavior. This is exactly what control theory would have us believe.

Exhibit 14. 15 shows an example of a negative relationship between an independent and a dependent variable. As the independent variable increases (i.e., as one goes from weak to strong moral beliefs), the likelihood of delinquency decreases (one becomes less likely to commit delinquency). The independent and dependent variables move in opposite directions, so this is a negative relationship. The pattern in this table is close to what is called monotonic.In a monotonic relationship , the value of cases consistently increases (or decreases) on one variable as the value of cases increases (or decreises) on the other variable. Monotonic is often defined a bit less strictly with the idea that as the value of cases on one variable increases (or decreases), the value of cases on the other

Monotonic relationship: lr r.,r t t n '',., ,,

f-. '. ?, t-', r, ,'. '. ",lj, ,,i?11t".1 \1 l;! ;,iIIill'.i.lil111l it.1

,,.t-.:,.i, )..a^-. ,,^i.,-. ^{WtlitJil iil+ liJ.li.tr: t]1 i.;;1i)'t:ii tlli ".,",,;1 ,'."..,tttlrJ \; ,r l')t1tG !t11,/ *'-1,"'*\t ,71

tJiltt ILL1 ittl.Lll; ill\Ll r{-.liii,J t-}l

,i .., ,, ,. ,, ^

,. ^ ".

{ ..

"

.1 . , , . ^.

,.. , , i ".

,.1 . , 1 11-)t' l iA,1 nF\' ],ll l i1.l t rtl 1l1 2,t.Y 1t.l(2\ril:1.r(t..t\)\) Jlttl tY t l)\-1\.Jti;.t 1\

^ il.^. .^.^.+/. .+.*,,i ^^. .-{1;l'l?:f* llltJ t: )11.\{11\{tL-\\-' l\l l.t,iJ t \J ...! \.1 t.l 1 i.J r.j i.t l.r.j A \.! 1 i rJ * \J l

t * r'ritrt t: r ', t ,t ri .r'nl *Ai 1lj t_l lU I i L.'^l iU"lJ | \j,

CHAPTER 14. ANALYZING QUANTITATIVE DATA 429

Exhibit 14.14 Gross-Tbbulation of Respondents' Delinquengrby Gender

Exhibit 14.15 Gross-Tabulation of Respondents' Morals by Delinquency

JVleasure of association;

A tvp* tf d**i:r[pti"ic staii,ttit; th;rt sun:rrinrizes th* stre *gtlr *t art rrss*ciatit:n,

Gamma:

A n*asure tsf a's'**r:iatir* $*rxotiille g t'ls*d in ri'rss- l;t*r,tlar ana!***,

variable tends to increase (or decrease) and, at least, does not change direction. This describes the relationship between moral beliefs and delinquency. Delinquency is most likely when moral beliefs are low, less likely when moral beliefs are medium, and least likely when moral beliefs are strong.

We present another cross-tabulation table for you in Exhibit 14.16. This table shows the relationship between the variable "number ofhours studied" and the variable "certainty of punishment" (see Exhibit 14.2). Both variables were originally continuous variables that we recoded into three approximately equal groups for this example. We hypothesize that those youths who study more will have a greater perceived risk of punishment than those who study less, so hours studied is our independent variable and certainty is the dependent variable. Comparing levels of hours studied for those with high certainty, we see that there is not much variation. Of those who did not study very much (0-3 hours), 39.2o/o were high in perceived certainty. Of those who studied from 4 to 6 hours, 35.6% were high in perceived certainty, and 40.39o/o of those who studied 7+ hours per week were high in perceived certainty. Much the same levels prevail at low levels of perceived certainty. Those who do not study very much are no more or less likely to perceive a low certainty of punishment than those who study a lot. Variation in the independent variable, then, is not related to variation in the dependent variable. It looks like there is no association between the number of hours a youth studies and the extent to which he or she thinks punishment for delinquent acts is certain.

You will find when you read research reports and journal articles that social scientists usually make decisions about the existence and strength of association on the basis of more statistics than only percentage differences in a cross-tabulation table. A measure of association is a type of descriptive statistic used to summarize the strength of an ass6ciation. There are many measures of association, some of which are appropriate for variables measured at particular levels. One popular measure of association in cross-tabular analyses with variables measured at the ordinal level is gamma. As with many measures of association, the possible values of gamma vary from -1, meaning the variables are perfectly associated in a negative direction; to 0, meaning there is no

430 SECTION V o AFTER THE DATA ARE COLLECTED

Exhibit 1.4.76 Cross-Tabulation of Respondents' Hours Studied and Perceived Gertainty of Punishment

association of the type that gamma measures; to +1, meaning there is a perfect positive association of the type that gamma measures.

Inferential statistics are used in deciding whether it is likely that an association exists in the larger population from which the sample was drawn. Even when the association between two variables is consistent with the researcher's hypothesis, it is possible that the association was merely due to chance or to the vagaries of sampling on a random basis. (Of course, the problem is even worse if the sample is not random.) It is conventional in statistics to avoid concluding that an association exists in the population from which the sample was drawn unless the probability that the association was due to chance is less than 5"/".In otherwords, a statistician normallywill not conclude trhat an associa- tion exists between two variables unless he or she can be at least 95% confident that the association was not due to chance. This is dre same type of logic that you learned about in Chapter 5, which introduced the concept of 95"/" confidence limits for the mean. Estimation of the probability that an association is not due to chance will be based on one ofseveral inferential statistics, chi-square being the one used in most cross-tabular analyses. The probability is customarily reported in a summary form such as "P < .05," which can be translated as "the probability that the association was due to chance is less than 5 out of 100 l5%)."

When an association passes muster in this way, when the analyst feels reasonably con- fident (at least 95"/" confident) that it was not due to chance, it is said that the association is satistically significant. Statistical significance means that an association is not likely to be due to chance, according to some criterion set by the analyst. Convention (and the desire to avoid concluding that an association exists in the population when it does not) dictates that the criterion be a probability less than 5%.

But statistical significance is not everything. You may remember from Chapter 5 that sampling error decreases as sample size increases. For this same reason, an association is less likely to appear on the basis of chance in a larger sample than in a smaller sample. In a table with more than 1,000 cases, such as those involving the delinquency dataset, the odds of a chance association are very low indeed. For example, with our table based on 1,272 cases, the probability that the association between gender and delinquenry (see Exhibit 14.14) was due to chance was less than 1 in 1,000 (p < .001)! The association in that table was fairly weak, as indicated by a gamma of .20. Even weak associations can be statistically significant with such a large sample, which means that the analyst must be careful not to assume that merely because a statistically significant association exists, it is therefore important. In a large sample, an association may be statistically sig- nificant but still be too weak to be substantively significant or important. All this boils down to another reason for evaluating carefully both the existence and the suength of an association.

Controlling for aThird Variable Cross-tabulation can also be used to study the relationship between two variables while con- rolling for other variables. We will focus our attention on controlling for a third variable in this section, but v/e will say a bit about controlling for more variables at the section's end. We will examine three different uses for three-variable cross-tabulation: identifring an inter- vening variable, testing a relationship for spuriousness, and specif.ing the conditions for a relationship. Each ofthese uses for ttrree-variable cross-tabs helps determine the validity of our findings, either by evaluating criteria for causality (nonspuriousness and identification of a causal mechanism) or by increasing our understanding of the conditions required for a relationship to hold, an indication of the cross-population generalizability of the findings. All

Chi-square:

An in l*r*nti'ti sttiit;;it ttg'**, t* i*st liyts*trr*,t*s abi:trf rt lali*nsir iil* heis,i* r:* twr: *r t\1rlt* vrrritYsi*'s i*'d *r**.*- tahilati*rs,

Statistical siUnificance: A assfiiliaiirsn tl*t, is **r lik*l t i.* ls* rjn* t* r:h?sr,r:*,

:*rjfr*rj'fsti 'd r:rit:*rrrn set hy

th* a*al';*i. t*it** thal i.ls* S:r*Ysalsiiit'v is l*s* rhan n *ui *f 1** {}{ p ',: {i$i,

CHAPTER 14 . ANALYZTNG QUANTITATIVE DATA 431

Elahoration analysis:

Th* pr**e *,* rst intrrsrl**irrfi

alhxrj vartabl* ini.rs an ailalysis i* *rrj*r t* lt*'ti.*r * * 4 * r *ta n d*At *l xls rt r at ***- tn * *iv ariat e it',v r - v ari'*.?:i *\ r*lalicnshifr undar c ql n r i d * ra li*n', addi[i*nai

c*ntrr:l variabl*x als* *an be i*tr*r1***rl,

three uses are aspects ofelaboration analysis-the process ofintroducing control variables into a bivariate relationship in order to better understand (to elaborate on) the relationship (Rosenberg 1968). We will examine the gamma and chi-square statistics for each table in this analysis.

lntervening Variables

We have akeady discovered that females are less likely to be delinquent than males (see Exhibit l+.14). Finding this relationship between gender and delinquency is just the beginning of our work, however. What we would now like to know and investigate is why this relationship exists. What is it about females that makes them less likely to commit delinquent acts than males? Let us first rule out stricdy biological factors and explore some possible social reasons for this gender difference in delinquency. One possibility is that because they are more closely supervised than males, females have fewer opportuni- ties to be delinquent. In other words, females are under more strict parental supervision, and it is because they are under more strict supervision that they are less likely than males to commit delinquenry. This possible relationship is shown in Exhibit 14.17. Notice that in this relationship the vaiable parental superuisioz intervenes between gender and delin- quenry. It explains why females are at lower risk for delinquenry compared to males. To determine whether parental supervision intervenes in the relationship between gender and delinquenry and whether it explains this rql,ationship, we must examine the rela- tionship between gender and delinquenry while controlling for difference in parental supervision. If parental superrision intervenes in the gender-delinquency relationship,

Weak Parental Supervision

432 SECTION V . AFTER THE DATA ARE CoLLECTED

Exhibit 14.17 Gross-Ilabulation of Respondents' Gender by Delinquency Within. Levels of Parental Superuision

Strong Parental Supervision

the effect of controlling for this third variable would be to eliminate, or at least substan- tially reduce, the original relationship between gender and delinquenry.

To examine this possibiliry we first recode the parental saperoision variable (PARSU- PER; see Exhibit 14.2) into two approximately equal levels: ueak saperaision and strong saperztision;We then look at two subtables of the gender-delinquency relationship: once under the condition ofweak parental supervision and once under the condition ofstrong parental supervision (see Exhibit l+.17). For ease of presentation, we will report only the cell percentages and not the frequencies. What we see is that once parental supervi- sion is controlled, there is no real relationship between gender and delinquency. That is, if males and females have the same amount of supervision from their parents, they do not differ that much in their risk of being delinquent. For example, among females with weak parental supervision, +6.0y" are high in delinquency, and among males with weak parental supervision, 49.6"/o are high in delinquenry. There is less than four percentage points' difference between males and females in their risk of being high delinquents under these conditions. Among those with strong parental supervision, 19.87o of the females were high in delinquenry and 23.6o/" of the males were high, less than four per- centage points' difference.

This percentage analysis is borne out by the chi-square tests and measures of association. Under both the weak and strong levels of parental supervision, the relationship between gen- der and delinquency is not significant, and gamma is only ,067 when supervision is weak and .136 when supervision is strong. In neither case is the obtained gamma very different from zero (indicating no relationship). Collectively, these results would lead us to the conclusion that parental supervision intervenes in the relationship between gender and delinquency. A very important reason females are less delinquent than males, therefore, is that females are under stricter supervision from their parents than are males, and strong parental supervision leads to a redrrced risk of delinquenry.

Extraneous Variables

Another reason for introducing a third variable into a bivariate relationship is to see whether the original relationship is spurious due to the influence of m extraneous oariable, which is a variable that causes both the independent and dependent variables. The only reason the inde- pendent and dependent variables are related, therefore, is that they both are the effects of a common cause (another independent variable).

Exhibit 14.18 shows what a spurious relationship would look like. In this case, the relationship between x and y exists only because both are the effects of the common cause z. Controlling for z, therefore, will eliminate the x1 relationship. Ruling out possible extraneous variables will help considerably strengthen the conclusion that the relationship between the independent and dependent variables is causal, particularly if- all the variables that seem to have the potential for creating a spurious relationship can be controlled.

Notice that if a variable is acting as an exraneous variable, then controlling for it will cause the original relationship between the independent and dependent variables to disappear or substantially diminish. This was also the empirical test for an intervening variable. Therefore, the difference between intervening and extraneous variables is a logical one and not an empirical one. In both instances, con- trolling for the third variable will cause the original rela- tionship to diminish or disappear. There should, therefore, be sound theoretical grounds for suspecting that a variable is acting as an intervenin g yartable, explainirg the relation- ship between the independent and dependent variables.

Subtables:

T ;t bl * r;,J *s * r ib tn U lh r:

r ril ;t t i ti r, i;l1i fi h #',:i {j {i r t iw'lt 1) rrt,t I v r i..) I t I y rJ v L t

v itri r;lsi *ti,itiilti n ll*: il *t', t *t 0 t'tlf"{iftrinc r'f fti1r.1 nt rfinit:ilJ'iL,;'-4rJ t ?\'!] iJl \.Ji1\J Ul llllJl\.)

t:tl t*i {: {}til{ {ii,; er i't *l *:;,

' -.. : Original relationship between : xandydisappears

a a

yE

cHAprERl4. ANALyzTNGeUANTtTATIvEDATA 433

Exhibit !4.18 Example of a Spurious Relationship

Perceived certainty

Delinquency

As an example of a possible extraneous relationship, we will look at the association between a youth's perception of the certainty of punishment and self-reported involvement in delinquency. Deterrence theory should lead us to predict a neg- ative relationship benveen perceived certainty and delinquency. Indeed, this is exactly what we observe in our delinquency data. We will not show you the cross-tabulation table, but when we looked at the relationship benveen perceived certainty and delinquency, we found that 53 .2% of youth who were low in certainty were high in delinquency; 39.1o/o of those who per-

ceived medium certainty were high in delinquency; and only 23.60/o of rhose who perceived a high certainty of punishment were high in delinquenry. Youth who believed they would get caught if they engaged in delinquenql then, were less likely to be delinquent. The gailrma value for this table was -.382, indicating a moderate negative relationship between perceived certainty and delinquency, exacdy what deterrence theory would lead us to expect.

Someone may reasonably argue, however, that this discovered negative relationship may not be causal but instead may be spurious. It could be suggested that what is actually behind this relationship is the extraneous variable, moral beliefs. The argument is that those with suong moral inhibitions against committing delinquent acts think that punishment for mor- allyrmongfirl actions is certain and refrain from delinquent acts. Thus, the observed negative relationship between perceived certainty and delinquency is really due to the positive effect of moral beliefs on perceived certainty and the negative effect of moral beliefs on delinquenry (see Exhibit 14.19). If moral beliefs are acnrally the causal factor at work, then controlling for them will eliminate or substantially reduce the original relationship between perceived certainty and delinquency.

To look at this possibility, we examined the relationship between perceived certainty and delinquency under three levels of moral beliefs (weak, medium, and strong). The cross- abulations are shown in Exhibit 14.20. What we carl see is that in each of the subtables there is a negative and significant association between the perceived cerainty of punishment and delinquency. In rwo of the three subtables, however, the relationship is weaker than what was in the original able (there the gamma was -.382); we obtained g?.mmas of -.27L and, -.L97. Under the condition of strong moral beliefs, however, the original relationship is unchanged. What we would conclude from this elaboration analysis is that the variable mnral beliefs is not acting as a very strong extraneous variable. Although some of the relationship between perceived risk and delinquency is due to their joint relationship with moral beliefs, we cannot dismiss the possibility that the perceived certainty of punishment has a causal influence on delinquent behavior.

Specification

By adding a third variable to an evaluation of a bivariate relationship, the data analyst can also speci{z the conditions under which the bivariate relationship occurs. A specification occurs when the association between the independent and dependent variables varies across the categories of one or more other control variables-when the original relationship is stronger under some condition or conditions of a third variable and weaker under others.

In criminology, social learning theory would predict that youths who are exposed to peers who provide verbal support for delinquenry are at greater risk for their own delinquent conduct. We found support for this hypothesis in our delinquency dataset. We examined this relationship by recoding into two approximately equal groups the variable FROPINON (see Exhibit 14.2). The first group had weak verbal support from peers, whereas the second

Moral beliefs

Specification:

h +:t** r:f r*leiir:nshiil

inurlvinU thros or mffro

variaNsl*'t in rnhich the

xs*{} t i xli t}* h *llt't x *ti th * t*tlr:p*r,rs**t an d d*fi*nrj*nt uariabies ''Jaries afl?"fiss ih#

raie Ur:ri*s #f on* fir ffir:r# *l**r **fitr *l vari*h I es,

434 SECTION V o AFTER THE DATA ARE COLLECTED

Exhibit 1.4.L9 A Spurious Relationship Between x and y

Weak Moral Belief s

x'= 13.646 (p<.001), Gamma = -.271

Medium Moral Beliefs

Strong Moral

mma =Ga

group had strong verbal support. Among those youths who reported that their peers provided only weak verbal support for delinquenq, 15o/" were highly delinquent. Among those with suong verbal support from peers, nearly 58% were highly delinquent. The gamma value for this relationship was .711, a very strong positive relationship. Clearly, then, having friends grve you verbal support for delinquent acts (e.g., "it's okay to steal") puts you at risk for delinquency.

CHAPTER 14 . AN ALYZING QUANTITATIVE DATA 435

Exhibit 14.20 Gross-Thbulation of Perceived Risk by Delinquency Within Levels of Moral Beliefs

Few Delinquent Friends

Many Delinquent Friends

It is entirely possible, however, that this relationship exists only when friends'ver- bal support is backed up by their own behavior. That is, verbal support from our peers might not affect our delinquenry when they do not themselves commit delinquent acts or when they commit only a very few. In this case, their actions (inaction in this case) speak louder than their words, and their verbal support does not influence us. When they also commit delinquent acts, however, the verbal support of peers carries great weight.

We looked at this possibility to examine the relationship berween friends' verbal support for delinquency and a youth's own delinquenry within two levels of friends' behavior (FRBEHAI/E; see Exhibit 14.2).We recoded FRBEIUW into two approxi- mately equal groups. In the first group, fewer of one's friends are delinquent (few delinquent friends) than the other (many delinquent friends). This attempt to specifi, the relationship between friends' opinions and a youth's own delinquenry is shown in Exhibit l4.2lWhat we see is a litde complex. When only a few of a youth's friends are committing delinquent acts, their verbal support still has a significant and positive effect on self-reported delinquency. The gamma value in this subtable is .416, which is moderately strong but less than the original gamma of .771. When many of a youth's friends are delinquent, however, the positive relationship between peers'verbal support and self-reported delinquenry is much stronger, with a gamma of .608. The behavior of our peers, then, only weakly specifies the relationship between peer opinion and delin- quency. Clearly, then, what our peers say about delinquency matters, even if they are not committing delinquent acts all the time themselves.

436 SECTION V . AFTER THE DATA ARE COLLECTED

Exhibit L4.21' Cross-llabulation of Friends' Verbal Support by Delinquency Within Levels of Friends' Delinquent Behavior

REGRESSION AN D CORRELATION

Our goal in intoducing you to cross-tabulation has been to help you think about the associations among variables and to give you a relatively easy tool for describing associa- tion. To read most statistical reports and to conduct more sophisticated analyses of social data,yon will have to extend your statistical knowledge. Many statistical reports and articles published in the social sciences use statistical techniques called regression analysis and cor- relation analysis to describe the associations among two or more quantitative variables. The terms actually refer to different aspects of the same technique. Statistics based on regression and correlation are used frequendy in social science and have many advantages over cross- tabulation-as well as some disadvantages.

We provide only a brief overview of this approach here. Thke a look at Exhibit l+.22. It's a plot, termed a scatterplot, of the bivariate relationship berween two intervaVratio-level variables. The variables were obtained from a U.S. sate-level dataset. The dependent vari- able, presented on the y-axis (vertical) is the murder rate per 100,000 population, and the independent variable, presented on the r-axis (trorizontal), is the poverty rate (percentage of each state's population living under the poverty level).

You can see that the data points in the scatterplot tend to run from the lower left to the upper right of the chart, indicating a positive relationship. Sates with higher levels of poverty also tend to have higher rates of murder. This regression line is the best-fitting straight line for this relationship-it is the line that lies closest to all the poins in the chart, according to certain criteria. Butyou can easily see that quite a few points are pretty far from the regres- sion line.

Begression analysis:

rfiarat:lerizin g t h e {:;rtt* rn

rst n r*lati*n*hip hetw*en !"rt t * rJttarsti+"rtliu* vari ab I *s i n i*rrns *l r,li**ar opiali*ll anrJ f*r silffifi"rariring th* strtngtlr rf this r*lali*n*hirS,

Correlation analysis:

h st *nd ar diz* rJ *t xli sti r:al

f * c h ri trJu* llsat" *LfiTt{ilatiz**

the slr*n6th *t a r*lati**ship h *i'r're* n tvr* q u a nt ilaliv * variat:l*s ini"*rrn* rf its atJh*r**** t{} ?t lir,*ar peti*rn,

Yoo F L o CL

o+. G E L ott L

J

=

12.00

10.00

8.00

6.00

4.00

2.00

.00

9.00 12.00 15.00 18.00 21 .00 Percentage of Population Living Under the Poverty Level

CHAPTER 14. ANALYZING QUANTITATIVE DATA 437

Exhibit 74.22 Example of a Positive Relationship: Scatterplot of Murder , Rate (Dependent Variable) and Povergr Rate (Independent Variable) in IJ,S. States,2075

YB*@ E

Correlation coeffi cient (r):

fr, :surutr;zarrr siati st i r: th *t

';nri*s trrsm * trs 1 rsr * I , y;ith [i inrf i*ating tlt*'*b**rtr:* tsl'a ltn*ar re lati#nshiil brlwe*n t,r'rn q*ar*itai.e,; rt u ariatrsl r s ir n d

1 rsr *tr t**itatin6 lhat iho r* iali * rirh i# i* r;*ntfrl *t* l,l

#cst ri brtrJ lsu the I i nt r*pr****ii nu th* r*gra**i*n *{ th* {:*rs*ru}*,stuarixltl* *rs

the indspende r:t varlahie,

How well does the regression line fit the points? In other words, how close does the regression line come to the points? (Actually, it's the square of the vertical distance [on they-axis] between the points and the regression line that is used as the criterion.) The correlation coefficient, also called Pearson's r or 4 gives one answer to that ques- tion. The value of r for this relationship is .60, which indicates a moderately strong positive linear relationship (if it were a negative relationship, r would have a negative sign). The value of r is 0 when there is absolutely no linear relationship between the two variables, and it is I when all the points representing all the cases lie exactly on the regression line (which would mean that the regression line describes the relationship perfectly).

So, the correlation coefficient does for nvo intervaVratio-level variables what gamma does for a cross-tabulation table: It is a summary statistic that tells us about the strength of the association between the two variables. Values of r close to 0 indicate that the relation- ship is wealg values of rclose to tl indicate the relationship is strong-in between there is a lot of room for judgment. You will learn in a satistics course that I is often used instead of r Exhibit 14.23 provides an overview of how to interpret the values of z Although not all possible values of r are displayed in Exhibit 14.23, it highlights how the use of adjectives can describe various values between 0 and 1.

An example of a negative relationship is shown in Exhibit 14.24, where we pro- vide a scatterplot of the robbery rate in states (dependent variable) on the y-axis and the percentage of each state's population that resides in rural areas as the independent variable (r-axis). You can see here a clear negative relationship; a state that has a higher percentage of its population residing in rural areas will tend to have lower robbery rates. The correlation coefficient for this relationship is r = -.53, indicating a moderate negative relationship.

You can also use correlation coefficients and regression analysis to study simultaneously the association among three or more variables. Lett use the murder rate as the dependent variable to illustrate. In a multiple regression analysis, you could test to see whether several other variables in addition to poverty are associated simultaneously with the murder rate- that is, whether the variables have independent effects on murder after statistically control- ling for each other.

Controlling for tlle geography in a state is also important for predicting murder rates, so we will be including the percentage of rural residents in our equation. We

-1 .00 -.80 -.60 -.40 -.20 .00 +.20 +.40 +.60 +.80 +1 .00

s/ /n/ / / / / /:Q

'-"7 r$/$"n{rt{"{* ""7s :a

\o

f/ Pfl

Source: Bachman and Paternoster 20\7,356. Reprinted with permission from SAGE.

438 SECTIoN V . AFTER THE DATA ARE COLLECTED

Exhibit 14.23 A Guide to Interpreting strong to Weak Relationships

@

Qe%

300.00

250.00

200.00

150.00

100.00

50.00

.00

.00 20.00 40.00 60.00 Percentage of Population Living in Rural Areas

know that robberies sometimes have lethal outcomes, so conuolling for the robbery rate is important. Let's examine what a multiple regression equation would look like predicting the murder rate using the poverty rate, the percentage rural, and the robbery rate as the three independent variables. Interpreting regression output is way beyond the scope of this text; we are simply going to examine the standardized regression coefficients, called betas, and tlieir significance level for this illustration. Results are displayed in Exhibit 14.25.

First, look at the numbers under the "Beta Coefficient" heading. Beta coefficients are standardized statistics that indicate how strong the linear relationship is between t-he

Yo() l- L o o. o

F G tr L o .cl .cl o E

:lr.Percentage poo

CHAPTER 14 . AN ALYZING QUANTITATIVE DATA 439

Exhibit 14.24 Example of a Negative Relationship: Scatterplot of Robbery Rate (Dependent Variable) and Percentage Rural (Independent Variable) in U.S. States,2015

Exhibit74.25 Multiple Regression Predicting the Murder Rate in States Using Poverty, the Divorce Rate, and the Robbery Rate as Independent Variables

dependent variable (murder rate, in this case) and each independent variable, while the other independent variables are controlled. Like the correlation coefficient (r), values of beta range from 0, when there is no linear association, to +1.0, when the association falls exacdy on a straight line. You can see in the beta column that rural population is not significantly related to the murder rate when the other variables are controlled. Both the percentage poor and the robbery rate, however, are still significant predictors ofmurder. R2 (r-squared) is a model fit statistic and tells us, when multiplied by 100, the percentage of the dependent variable's variation that is explained by all the independent variables in the model. In this model, we learn from R2 that the three independent variables together explain or account for 687" ofthe total variation in murder rates. Our goal is to explain as much variation as possible of the 100%, so explaining over two-thirds of the variation is not badl

You will need to learn more about when correlation coefficients and regression analysis are appropriate (e.g., both variables have to be quantitative, and the relationship has to be linear [not curvilinear]), but that's for another time and place. To learn more about correlation coefEcients and regression analysis, you should take an entire satistics course. For now, this short introduction will enable you to make sense of more of the statistical analyses you find in research articles. You can also learn more about these techniques with the nrtorials on the Student Study Site.

440 SECTION V AFTER THE DATA ARE COLLECTED

aa

il

Dana Hunt, PhD, Principal Scientist

In the video for this chapter on the Student Study Site, Dana Hunt discusses two of the many lessons she has learned about measure- ment in a decades-Iong career in social research. Hunt received her bach- elor's degree in sociol- ogy from Hood College in Pennsylvania and

then earned her PhD in sociology at the University of Pennsylvania. After teaching at Hood for several years, she took an applied research position at National Development and Research Institutes (NDRI) in New York City. NDRI's description on its website gives you an idea of what drew the attention of a talented young social scientist.

Founded in 1967, NDRI is a nonprofit research and educational organLzation dedicated to advancing scientific knowledge in the areas of drug and alcohol abuse, treatment, and recovery; HIV, AIDS, and HCV

(hepatitis C virus); therapeutic communities; youth at risk; and related areas of public health, mental health, criminal justice, urban problems, prevention, and epidemiology.

Hunt moved from New York to the Boston area in 1990, where she is now a principal scientist at Abt Associates, Inc. in Cambridge. Abt Associates's web- site description conveys the scope of the research projects the company directs.

Abt Associates applies scientific research, con- sulting, and technical assistance expertise on a wide range of issues in social, economic, and health policy; international development; clinical trials; and regis- tries. As one of the largest for-profit government and business research and consulting firms in the world, Abt Associates delivers practical, measurable, high- value-added results.

TWo of Hunt's major research projects in recent years are the nationwide Arrestee Drug Abuse It{onitoring Program for the Office of National Drug Control Policy and a study of prostitution and sex traf- ficking demand reduction for the National Institute of Justice.

Source: Dana Hunt

ANALYZING DATA ETHIGALLY: HOW NOTTO LIE ABOUT RELATIONSHIPS

When the data analyst begins to examine relationships among variables in some real data, social

science research b".o*", "-ort

exciting. The moment of truth, it would seem, has arrived. Either

the hypotheses are supported or not. iut, in facg this is also a time to proceed with caution and

ao *Jrr"a" the analyses of others with even more caution. Once large datasets are entered into

a computer, it becomes very easy to check out a gfeat many relationships; when relationships

.r" .o*lrr"a among three or *o." rrr.irbles at a time, the possibilities become almost endless. This range of-possibilities presents a great hazatd for data analysis. It becomes very

tempting to .Ja.ch aiorrnd in the data until something interesting_emerges. Rejected hypoth-

"r", *"'fo.gotten in favor of highlighting whatt going on in_ the data. It is not wrong to

examine data for unanticipated r-elrtionthips; the problem is that inevitably some relation-

ships between variables wiil appear on the basis ofchance association alone. Ifyou search hard

*f torrg enough, it will be po*itt" to come up with something that really means nothing. Ariasonable balance mustbe struckbetwein deductive daa analysis to testhlryotheses and

inductive analysis to explore patterns in a dataset. Hlpotheses formulated in advance of data

collection must b. t.rted as they*ere origindly stated; any further analysel of these hypotheses

that involve a more exploratory strategy must be labeled as such in research reports. Serendipi-

tous findings do not need to be ignorJ, but it must be reported that they were serendipitous'

Subsequenlresearchers "* t y to d"doctively test the ideas generate{ by our explorations.

\Ve also hrrre to be honesi about the limitations of using survey data to test causal hypoth-

eses. The usual practice for those who seek to test a causal hypothesis with nonexperimental

,o*.y drt, i, to test for the relationship beween the independent and dependent variables, contr;iling for other variables that might possibly create spurious relationships' This is what

we did b/examining the relationshift"t*."" the perceived certainty-of punishment and delinquency while cJntrolling for moral beliefs. But finding that a h-5ryothesized relationship

i. ,roi"lt"."d by controllinglor only one variable does not establish that the relationship is

causal, nor does controlling-for two, -three,

or many more variables. There always is a possibil-

ity that some other variabie that we did not ahi* a control, or that was not even measured iri th" ,o*.y, has produced a spurious relationship between the ilflneldent and dependent variables in'our hlpothesis (Lieberson 1985). Wi must always think about the possibilities and be cautious in our causal conclusions.

CONCLUSION

This chapter has demonstrated how a researcher can describe phenomgna in criminal justice

and crimirology, identi!. relationships among them, explore the reasons for these relationships,

and test hypoii"r., about them. Statistics provide a remarkably useful tool for dweloping our

orrde.rt rrding of the social world, a tool that we can use to test our ideas and generate new ones'

Unfornr"nately, to the uninitiated, the use of statistics can seem to end the debate right

there; you cannot argue with the numbers. But you now know better than that' The numbers

will be worthless if the methods used to generate the data are not valid, and the numbers will be misleading if they are not used appiopriately, taking into account the type of data to

*Hi"f, an"y are aiplied. ind "rr"r,

,rr,r*i"s valid methods and proper use of statistics, there is one more c6tical step, for the numbers do ,rot speak for themselves. Illtimately, it is how

we interpret and repori the numbers that determines their usefulness. It is this topic we turn

to in the next chapter.

CHAPTER 14 O ANALYZING QUANTITATIVE DATA 441