# Hypothesis Testing

❶ Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations of interest:

Population 1: People who 6 months ago received $10 million. Population 2: The general population (consisting of people who 6 months ago did not receive $10 million).

The prediction of the personality psychologists, based on their theory of happiness, is that Population 1 people will on the average be happier than Popu- lation 2 people: in symbols, . The null hypothesis is that Population 1 people (those who get $10 million) will not be happier than Population 2 people (people in general who do not get $10 million).

❷ Determine the characteristics of the comparison distribution. The comparison distribution is the distribution that represents the population situation if the null hy- pothesis is true. If the null hypothesis is true, the distributions of Populations 1 and 2 are the same. We know Population 2’s distribution (it is normally distributed with

and ); so we can use it as the comparison distribution. ❸ Determine the cutoff sample score on the comparison distribution at which

the null hypothesis should be rejected. What kind of result would be extreme enough to convince us to reject the null hypothesis? In this example, assume that the researchers decided the following in advance: they will reject the null hypothesis as too unlikely if the results would occur less than 5% of the time if this null hypothesis were true. We know that the comparison distribution is a normal curve. Thus, we can figure that the top 5% of scores from the normal

� = 10� = 70

�1 7 �2

70

Z Score: 0 +1 95

+2−1−2 45Happiness Score: 9085807565605550

Figure 4–4 Distribution of happiness sources (fictional data).

IS B

N 0-558-46761-X

Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to Hypothesis Testing 117

curve table begin at a Z score of about 1.64. Thus the researchers set as the cut- off point for rejecting the null hypothesis a result in which the sample’s Z score on the comparison distribution is at or above 1.64. (The mean of the comparison distribution is 70 and the standard deviation is 10. Therefore, the null hypothe- sis will be rejected if the sample result is at or above 86.4.)

❹ Determine your sample’s score on the comparison distribution. Now for the results: six months after giving the randomly selected person $10 million, the now very wealthy research participant takes the happiness test. The person’s score is 80. As you can see from Figure 4–4, a score of 80 has a Z score of on the comparison distribution.

❺ Decide whether to reject the null hypothesis. The Z score of the sample indi- vidual is . The researchers set the minimum Z score to reject the null hypoth- esis at . Thus, the sample score is not extreme enough to reject the null hypothesis. The experiment is inconclusive; researchers would say the results are “not statistically significant.” Figure 4–5 shows the comparison distribution with the top 5% shaded and the location of the sample participant who received $10 million.

You may be interested to know that Brickman et al. (1978) carried out a more elaborate study based on the same question. They studied lottery winners as exam- ples of people suddenly having a very positive event happen to them. Their results were similar to those in our fictional example: those who won the lottery were not much happier 6 months later than people who did not win the lottery. Also, another group they studied, people who had become paraplegics through a random accident, were not much less happy than other people 6 months later. These researchers con- cluded that if a major event does have a lasting effect on happiness, it is probably not a very big one. This conclusion is consistent with the findings of more recent studies (e.g., Suh et al., 1996). Indeed, in recent years, a great deal of research has examined what factors contribute to people’s level of happiness. If you are interested in know- ing more about this topic, we highly recommend an article by Diener and colleagues (2006) and social psychologist Daniel Gilbert’s (2006) engaging best seller, Stumbling on Happiness.

+1.64 +1

+1

70

Z Score: 0 +1 95

+2−1−2 45Happiness Score: 9085807565605550

Top 5%

Cutoff Z Score = 1.64

Sample participant (Z = 1)

Figure 4–5 Distribution of happiness scores with upper 5% shaded and showing the location of the sample participant (fictional data).

IS B

N 0-

55 8-

46 76

1- X

Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

118 Chapter 4

How are you doing?

1. A sample of rats in a laboratory is given an experimental treatment intended to make them learn a maze faster than other rats. State (a) the null hypothesis and (b) the research hypothesis.

2. (a) What is a comparison distribution? (b) What role does it play in hypothesis testing?

3. What is the cutoff sample score? 4. Why do we say that hypothesis testing involves a double negative logic? 5. What can you conclude when (a) a result is so extreme that you reject the null

hypothesis and (b) a result is not very extreme so that you cannot reject the null hypothesis?

6. A training program to increase friendliness is tried on one individual randomly selected from the general public. Among the general public (who do not get this training program), the mean on the friendliness measure is 30 with a stan- dard deviation of 4. The researchers want to test their hypothesis at the 5% significance level. After going through the training program, this individual takes the friendliness measure and gets a score of 40. What should the re- searchers conclude?

Answers

1.(a) The population of rats like those that get the experimental treatment score the same on the time to learn the maze as the population of rats in general that do not get the experimental treatment. (b) The population of rats like those that get the experimental treatment learn the maze faster than the pop- ulation of rats in general that do not get the experimental treatment.

2.(a) A comparison distribution is a distribution to which you compare the re- sults of your study. (b) In hypothesis testing, the comparison distribution is the distribution for the situation when the null hypothesis is true. To decide whether to reject the null hypothesis, you check how extreme the score of your sample is on this comparison distribution—how likely it would be to get a sample with a score this extreme if your sample came from this comparison distribution.

3.The cutoff sample score is the Zscore at which, if the sample’s Zscore is more extreme than it is on the comparison distribution, you reject the null hypothesis.

4.We say that hypothesis testing involves a double negative logic because we are interested in the research hypothesis, but we test whether it is true by seeing if we can reject its opposite, the null hypothesis.

5.(a) The research hypothesis is supported when a result is so extreme that you reject the null hypothesis; the result is statistically significant. (b) The result is not statistically significant when a result is not very extreme; the result is in- conclusive.

6.The training program increases friendliness. (The cutoff sample Zscore on the comparison distribution is 1.64. The actual sample’s Zscore of 2.50 is more extreme—that is, farther in the tail—than the cutoff Zscore. Therefore, reject the null hypothesis; the research hypothesis is supported; the result is statis- tically significant.)

IS B

N 0-558-46761-X

Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to Hypothesis Testing 119

directional hypothesis research hy- pothesis predicting a particular direction of difference between populations—for example, a prediction that the population like the sample studied has a higher mean than the population in general.

one-tailed test hypothesis-testing procedure for a directional hypothesis; situation in which the region of the com- parison distribution in which the null hy- pothesis would be rejected is all on one side (tail) of the distribution.

nondirectional hypothesis research hypothesis that does not predict a partic- ular direction of difference between the population like the sample studied and the population in general.

One-Tailed and Two-Tailed Hypothesis Tests In our examples so far, the researchers were interested in only one direction of result. In our first example, researchers tested whether babies given the specially purified vitamin would walk earlier than babies in general. In the happiness exam- ple, the personality psychologists predicted the person who received $10 million would be happier than other people. The researchers in these studies were not in- terested in the possibility that giving the specially purified vitamin would cause babies to start walking later or that people getting $10 million might become less happy.

Directional Hypotheses and One-Tailed Tests The purified vitamin and happiness studies are examples of testing a directional hypothesis. Both studies focused on a specific direction of effect. When a researcher makes a directional hypothesis, the null hypothesis is also, in a sense, directional. Suppose the research hypothesis is that getting $10 million will make a person hap- pier than the general population. The null hypothesis, then, is that the money will either have no effect or make the person less happy. [In symbols, if the research hy- pothesis is , then the null hypothesis is (“ ” is the symbol for less than or equal to).] Thus, in Figure 4–5, to reject the null hypothesis, the sample has to have a score in one tail of the comparison distribution: the upper extreme or tail (in this example, the top 5%) of the comparison distribution. (When it comes to rejecting the null hypothesis with a directional hypothesis, a score at the other tail is the same as a score in the middle; that is, such a score does not allow you to reject the null hypothesis.) For this reason, the test of a directional hypothesis is called a one-tailed test. A one-tailed test can be one-tailed in either direction. In the happi- ness study example, the tail for the predicted effect was at the high end. In the baby study example, the tail for the predicted effect was at the low end (that is, the predic- tion tested was that babies given the specially purified vitamin would start walking unusually early).

Nondirectional Hypotheses and Two-Tailed Tests Sometimes, a research hypothesis states that an experimental procedure will have an effect, without saying whether it will produce a very high score or a very low score. Suppose an organizational psychologist is interested in how a new social skills program will affect productivity. The program could either improve produc- tivity by making the working environment more pleasant or hurt productivity by encouraging people to socialize instead of work. The research hypothesis is that the social skills program changes the level of productivity; the null hypothesis is that the program does not change productivity one way or the other. In symbols, the re- search hypothesis is (“ ” is the symbol for not equal); the null hypothesis is .

When a research hypothesis predicts an effect but does not predict a direction for the effect, it is called a nondirectional hypothesis. To test the significance of a nondirectional hypothesis, you have to consider the possibility that the sample could be extreme at either tail of the comparison distribution. Thus, this is called a two-tailed test.

�1 = �2 Z�1 Z �2

…�1 … �2�1 7 �2

two-tailed test hypothesis-testing procedure for a nondirectional hypothe- sis; the situation in which the region of the comparison distribution in which the null hypothesis would be rejected is di- vided between the two sides (tails) of the distribution.

IS B

N 0-

55 8-

46 76

1- X

120 Chapter 4

Determining Cutoff Scores with Two-Tailed Tests There is a special complication in a two-tailed test. You have to divide the signifi- cance percentage between the two tails. For example, with a 5% significance level, you reject a null hypothesis only if the sample is so extreme that it is in either the top 2.5% or the bottom 2.5% of the comparison distribution. This keeps the overall level of significance at a total of 5%.

Note that a two-tailed test makes the cutoff Z scores for the 5% level and . For a one-tailed test at the 5% level, the cutoff is not so extreme: only

or . But with a one-tailed test, only one side of the distribution is considered. These situations are shown in Figure 4–6a.

Using the 1% significance level, a two-tailed test (.5% at each tail) has cutoffs of and , while a one-tailed test’s cutoff is either or . These sit-

uations are shown in Figure 4–6b. The Z score cutoffs for one-tailed and two-tailed tests for the .05 and .01 significance levels are also summarized in Table 4–2.

-2.33+2.33-2.58+2.58

-1.64 +1.64-1.96

+1.96

0 +1 +2

.05 (one-tailed)

Z Score

(a)

.025 (=.05 two-tailed)

−1−2

(.05 two-tailed =) .025

0 +1 +2

.01 (one-tailed)

Z Score

(b)

.005 (=.01 two-tailed)

−1−2

(.01 two-tailed =) .005

−3 +3

1.64 1.96−1.96

−2.58

2.33 2.58

.01 significance level

.05 significance level

Figure 4–6 Significance level cutoffs for one-tailed and two-tailed tests: (a) .05 signi- ficance level; (b) .01 significance level. (The one-tailed tests in these examples assume the prediction was for a high score. You could instead have a one-tailed test where the prediction is for the lower, left tail.)

IS B

N 0-558-46761-X

Introduction to Hypothesis Testing 121

Table 4–2 One-Tailed and Two-Tailed Cutoff Z Scores for the .05 and .01 Significance Levels

Type of Test

One-Tailed Two-Tailed

Significance .05 �1.64 or 1.64 �1.96 and 1.96

Level .01 �2.33 or 2.33 �2.58 and 2.58

When to Use One-Tailed or Two-Tailed Tests If the researcher decides in advance to use a one-tailed test, then the sample’s score does not need to be so extreme to be significant compared to what would be needed with a two-tailed test. Yet there is a price. If the result is extreme in the direction op- posite to what was predicted—no matter how extreme—the result cannot be consid- ered statistically significant.

In principle, you plan to use a one-tailed test when you have a clearly directional hypothesis and a two-tailed test when you have a clearly nondirectional hypothesis. In practice, the decision is not so simple. Even when a theory clearly predicts a par- ticular result, the actual result may come out opposite to what you expected. Some- times, the opposite may be more interesting than what you had predicted. (For example, what if, as in all the fairy tales about wish-granting genies and fish, receiv- ing $10 million and being able to fulfill almost any desire had made that individual miserable?) By using one-tailed tests, we risk having to ignore possibly important results.

For these reasons, researchers disagree about whether one-tailed tests should be used, even when there is a clearly directional hypothesis. To be safe, many re- searchers use two-tailed tests for both nondirectional and directional hypotheses. If the two-tailed test is significant, then the researcher looks at the result to see the di- rection and considers the study significant in that direction. In practice, always using two-tailed tests is a conservative procedure because the cutoff scores are more ex- treme for a two-tailed test and so it is less likely that a two-tailed test will give a sig- nificant result. Thus, if you do get a significant result with a two-tailed test, you are more confident about the conclusion. In fact, in most psychology research articles, unless the researcher specifically states that a one-tailed test was used, it is assumed that the test was two-tailed.

In practice, however, our experience is that most research results are either so extreme that they will be significant whether you use a one-tailed or two-tailed test or so far from extreme that they would not be significant in either kind of test. But what happens when a result is less certain? The researcher’s decision about one- or two- tailed tests now can make a big difference. In this situation the researcher tries to use the type of test that will give the most accurate and noncontroversial conclusion. The idea is to let nature—not a researcher’s decisions—determine the conclusion as much as possible. Further, whenever a result is less than completely clear one way or the other, most researchers are not comfortable drawing strong conclusions until more research is done.

Example of Hypothesis Testing with a Two-Tailed Test Here is one more fictional example, this time using a two-tailed test. Clinical psy- chologists at a residential treatment center have developed a new type of therapy to reduce depression that they believe is more effective than the current therapy.

IS B

N 0-

55 8-

46 76

1- X

122 Chapter 4

69.5

Z Score: 0 +1 +2−1−2 Depression Score: 97.783.655.441.3

Figure 4–7 Distribution of depression scores at 4 weeks after admission for diagnosed depressed psychiatric patients receiving the standard therapy (fictional data).

However, as with any treatment, it could make patients’ depression worse. Thus, the clinical psychologists make a nondirectional hypothesis.

The psychologists randomly select an incoming patient to receive the new form of therapy instead of the usual therapy. (In a real study, of course, more than one pa- tient would be selected, but let’s assume that only one person has been trained to do the new therapy and she has time to treat only one patient.) After 4 weeks, the patient fills out a standard depression scale that is given automatically to all patients after 4 weeks. The standard scale has been given at this treatment center for a long time. Thus, the psychologists know in advance the distribution of depression scores at 4 weeks for those who receive the usual therapy: it follows a normal curve with a mean of 69.5 and a standard deviation of 14.1. [These figures correspond roughly to the depression scores found in a national survey of 75,000 psychiatric patients given a widely used standard test (Dahlstrom et al., 1986).] This distribution is shown in Figure 4–7.

The clinical psychologists then carry out the five steps of hypothesis-testing.

❶ Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations of interest:

Population 1: Patients diagnosed as depressed who receive the new therapy. Population 2: Patients diagnosed as depressed in general (who receive the usual therapy).

The research hypothesis is that when measured on depression 4 weeks after admis- sion, patients who receive the new therapy (Population 1) will on the average score differently from patients who receive the current therapy (Population 2). In symbols, the research hypothesis is . The opposite of the research hy- pothesis, the null hypothesis, is that patients who receive the new therapy will have the same average depression level as the patients who receive the usual ther- apy. (That is, the depression level measured after 4 weeks will have the same mean for Populations 1 and 2.) In symbols, the null hypothesis is

❷ Determine the characteristics of the comparison distribution. If the null hy- pothesis is true, the distributions of Populations 1 and 2 are the same. We know

�1 = �2.

�1 Z �2T I P F O R S U C C E S S Remember that the research hy- pothesis and null hypothesis must always be complete opposites. Researchers specify the research hypothesis and this determines the null hypothesis that goes with it.

IS B

N 0-558-46761-X

Introduction to Hypothesis Testing 123

the distribution of Population 2 (it is the one shown in Figure 4–7). Thus, we can use Population 2 as our comparison distribution. As noted, it follows a normal curve, with and

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. The clinical psychologists select the 5% significance level. They have made a nondirectional hypothesis and will therefore use a two-tailed test. Thus, they will reject the null hypothesis only if the patient’s depression score is in either the top or bottom 2.5% of the compar- ison distribution. In terms of Z scores, these cutoffs are �1.96 and �1.96 (see Figure 4–6 and Table 4–2).

❹ Determine your sample’s score on the comparison distribution. The patient who received the new therapy was measured 4 weeks after admission. The pa- tient’s score on the depression scale was 41, which is a Z score on the comparison distribution of �2.02. That is, Figure 4–8 shows the distribution of Population 2 for this study, with the upper and lower 2.5% areas shaded; the depression score of the sample patient is also shown.

➎ Decide whether to reject the null hypothesis. A Z score of �2.02 is slightly more extreme than a Z score of �1.96, which is where the lower 2.5% of the comparison distribution begins. Notice in Figure 4–8 that the Z score of �2.02 falls within the shaded area in the left tail of the comparison distribution. This Z score of �2.02 is a result so extreme that it is unlikely to have occurred if this pa- tient were from a population no different from Population 2. Therefore, the clini- cal psychologists reject the null hypothesis. The result is statistically significant, and it supports the research hypothesis that depressed patients receiving the new therapy have different depression levels than depressed patients in general who receive the usual therapy.

Z = (X – M)>SD = (41 – 69.5)>14.1 = -2.02.

� = 14.1.� = 69.5

69.5

Z Score: 0 +1 +2−1−2 Depression Score: 97.783.655.441.3

Sample patient depression = 41

Z = −2.02

Cutoff Z Score = −1.96

Cutoff Z Score = 1.96

Figure 4–8 Distribution of depression scores with upper and lower 2.5% shaded and showing the sample patient who received the new therapy (fictional data).

T I P F O R S U C C E S S When carrying out the five steps of hypothesis testing, always draw a figure like Figure 4–8. Be sure to include the cutoff score(s) and shade the appropriate tail(s). If the sample score falls inside a shaded tail region, you can reject the null hypothesis and the result is statis- tically significant. If the sample score does not fall inside a shaded tail region, you cannot reject the null hypothesis.

IS B

N 0-

55 8-

46 76

1- X

124 Chapter 4

How are you doing?

1. What is a nondirectional hypothesis test? 2. What is a two-tailed test? 3. Why do you use a two-tailed test when testing a nondirectional hypothesis? 4. What is the advantage of using a one-tailed test when your theory predicts a

particular direction of result? 5. Why might you use a two-tailed test even when your theory predicts a partic-

ular direction of result? 6. A researcher predicts that making people hungry will affect how they do on a

coordination test. A randomly selected person is asked not to eat for 24 hours before taking a standard coordination test and gets a score of 400. For peo- ple in general of this age group and gender, tested under normal conditions, coordination scores are normally distributed with a mean of 500 and a stan- dard deviation of 40. Using the .01 significance level, what should the re- searcher conclude?

Answers

1.A nondirectional hypothesis test is a hypothesis test in which you do not pre- dict a particular direction of difference.

2.Atwo-tailedtestisoneinwhichtheoverallpercentageforthecutoffisdivided between the two tails of the comparison distribution. A two-tailed test is used to test the significance of a nondirectional hypothesis.

3.You use a two-tailed test when testing a nondirectional hypothesis because an extreme result in either direction supports the research hypothesis.

4.The cutoff for a one-tailed test is not so extreme; thus, if your result comes out in the predicted direction, it is more likely to be significant. The cutoff is not so extreme because the entire percentage (say 5%) is put in one tail in- stead of being divided between two tails.

5.It lets you count as significant an extreme result in either direction; if you used a one-tailed test and the result came out opposite to the prediction, it could not be called statistically significant.

6.The cutoffs are and . The sample person’s Zscore is ( . The result is not significant; the study is inconclusive. 40=-2.5

400-500)> -2.58 +2.58

Controversy: Should Significance Tests Be Banned? In recent years, there has been a major controversy about significance testing itself, with a concerted movement on the part of a small but vocal group of psychologists to ban significance tests completely! This is a radical suggestion with far-reaching implications: for at least half a century, nearly every research study in psychology has used significance tests. There probably has been more written in the major psy- chology journals in the last dozen years or so about this controversy than ever before in history about any issue having to do with statistics.

The discussion has gotten so heated that one article began as follows:

It is not true that a group of radical activists held 10 statisticians and six editors hostage at the . . . convention of the American Psychological Society and chanted, “Support the total test ban!” and “Nix the null!” (Abelson, 1997, p. 12)

IS B

N 0-558-46761-X

Introduction to Hypothesis Testing 125

Since this is by far the most important controversy in years regarding statistics as used in psychology, we discuss the issues in at least three different places. In this chapter we focus on some basic challenges to hypothesis testing. In Chapters 5 and 6, we cover other topics that relate to aspects of hypothesis testing that you will learn about in those chapters.

Before discussing this controversy, you should be reassured that you are not learning about hypothesis testing for nothing. Whatever happens in the future, you absolutely have to understand hypothesis testing to make sense of virtually every re- search article published in the past. Further, in spite of the controversy that has raged for more than a decade, it is extremely rare to see new articles that do not use signif- icance testing. Thus, it is doubtful that any major shifts will occur in the near future. Finally, even if hypothesis testing is completely abandoned, the alternatives (which involve procedures you will learn about in Chapters 5 and 6) require understanding virtually all of the logic and procedures we are covering here.

So what is the big controversy? Some of the debate concerns subtle points of logic. For example, one issue relates to whether it makes sense to worry about reject- ing the null hypothesis when a hypothesis of no effect whatsoever is extremely un- likely to be true. Another issue is about the foundation of hypothesis testing in terms of populations and samples, since in most experiments the samples we use are not randomly selected from any definable population. We discussed some points relating to this issue in Chapter 3. Finally, some have questioned the appropriateness of con- cluding that if the data are inconsistent with the null hypothesis, this should be counted as evidence for the research hypothesis. This controversy becomes rather technical, but our own view is that, given recent considerations of the issues, the way researchers in psychology use hypothesis testing is reasonable (Balluerka et al., 2005; Iacobucci, 2005; Nickerson, 2000).

However, the biggest complaint against significance tests, and the one that has received almost universal agreement, is that they are misused (Balluerka et al., 2005). In fact, opponents of significance tests argue that even if there were no other problems with the tests, they should be banned simply because they are so often and so badly misused. They are misused in two main ways: one we can consider now; the other must wait until we have covered a topic you learn in Chapter 6.

A major misuse of significance tests is the tendency for researchers to decide that if a result is not significant, the null hypothesis is shown to be true (see Box 4–1). We have emphasized that when you can’t reject the null hypothesis, the results are simply inconclusive. The error of concluding the null hypothesis is true from failing to reject it is extremely serious, because important theories and methods may be con- sidered false just because a particular study did not get strong enough results. [You learn in Chapter 6 that it is quite easy for a true research hypothesis not to come out significant just because there were too few people in the study or the measures were not very accurate. In fact, Hunter (1997) argues that in about 60% of psychology studies, we are likely to get nonsignificant results even when the research hypothesis is actually true.]

What should be done? The general consensus seems to be that we should keep significance tests, but better train our students not to misuse them (hence the empha- sis on these points in this book). We should not, as it were, throw the baby out with the bathwater. To address this controversy, the American Psychological Association (APA) established a committee of eminent psychologists renowned for their statisti- cal expertise. The committee met over a two-year period, circulated a preliminary report, and considered reactions to it from a large number of researchers. In the end, they strongly condemned various misuses of significance testing of the kind we have

IS B

N 0-

55 8-

46 76

1- X

126 Chapter 4

been discussing, but they left its use up to the decision of each researcher. In their report they concluded:

Some had hoped that this task force would vote to recommend an outright ban on the use of significance tests in psychology journals. Although this might eliminate some abuses, the committee thought there were enough counterexamples (e.g., Abelson, 1997) to justify forbearance. (Wilkinson & Task Force on Statistical Inference, 1999, pp. 602–603)

Balluerka and colleagues (2005) reviewed the arguments for and against signif- icance testing. Their conclusion, with which we agree (as do probably most psychol- ogy researchers), is that “. . . rigorous research activity requires the use of . . . [significance testing] in the appropriate context, the complementary use of other methods which provide information about aspects not addressed by . . . [significance testing], and adherence to a series of recommendations which promote its rational use in psychological research” (p. 55).

really began to force the issue of the mindless use of sig- nificance testing. But he still used humor to tease behav- ioral and social scientists for their failure to see the problems inherent in the arbitrary yes-no decision fea- ture of null hypothesis testing. For example, he liked to remind everyone that significance testing came out of Sir Ronald Fisher’s work in agriculture (see Box 9–1), in which the decisions were yes-no matters such as whether a crop needed manure. He pointed out that behavioral and social scientists “do not deal in manure, at least not knowingly” (Cohen, 1990, p. 1307)! He really disliked the fact that Fisher-style decision making is used to de- termine the fate of not only doctoral dissertations, re- search funds, publications, and promotions, “but whether to have a baby just now” (1990, p. 1307). And getting more serious, he charged that significance testing’s “arbitrary unreasonable tyranny has led to data fudging of varying degrees of subtlety, from grossly altering data to dropping cases where there ‘must have been’ errors” (p. 1307).

Cohen was active in many social causes, especially desegregation in the schools and fighting discrimination in police departments. He cared passionately about everything he did. He was deeply loved. And he suffered from major depression, becoming incapacitated by it four times in his life.

Got troubles? Got no more math than high school al- gebra? It doesn’t have to stop you from contributing to science.

BOX 4–1 Jacob Cohen, the Ultimate New Yorker: Funny, Pushy, Brilliant, and Kind

New Yorkers can be proud of Jacob Cohen, who single- handedly introduced to behavioral and social scientists some of our most important statistical tools. Never worried about being popular—although he was—he almost single-handedly forced the current debate over significance testing, which he liked to joke was en- trenched like a “secular religion.” About the asterisk that accompanies a significant result, he said the religion must be “of Judeo-Christian derivation, as it employs as its most powerful icon a six-pointed cross” (1990, p. 1307).

Jacob entered graduate school at New York Univer- sity (NYU) in clinical psychology in 1947 and three years later had a masters and a doctorate. He then worked in rather lowly roles for the Veterans Adminis- tration, doing research on various practical topics, until he returned to NYU in 1959. There he became a very famous faculty member because of his creative, off- beat ideas about statistics. Amazingly, he made his con- tributions having no mathematics training beyond high school algebra.

But a lack of formal training may have been Jacob Cohen’s advantage because he emphasized looking at data and thinking about them, not just applying a stan- dard analysis. In particular, he demonstrated that the standard methods were not working very well, especially for the “soft” fields of psychology such as clinical, per- sonality, and social psychology. Many of his ideas were hailed as great breakthroughs. Starting in the 1990s he

IS B

N 0-558-46761-X

Introduction to Hypothesis Testing 127

Hypothesis Tests in Research Articles In general, hypothesis testing is reported in research articles using one of the specific methods of hypothesis testing you learn in later chapters. For each result of interest, the researcher usually first indicates whether the result was statistically significant. (Note that, as with the first of the following examples, the researcher will not neces- sarily use the word significant; so look out for other indicators, such as reporting that scores on a variable decreased, increased, or were associated with scores on another variable.) Next, the researcher usually gives the symbol associated with the specific method used in figuring the probability that the result would have occurred if the null hypothesis was true, such as t, F, r, or (see Chapters 7 to 13). Finally, there will be an indication of the significance level, such as p .05 or p .01. (The re- searcher will usually also provide other information, such as the mean and standard deviation of sample scores.) For example, in a study of competitive Scrabble play- ers, Halpern and Wai (2007) reported: “Contrary to expectations, the number of cor- rectly defined words correlated significantly with participants’ official Scrabble rating, � .45, p .05, showing a moderate relationship (Cohen & Cohen, 1983), with higher-rated players defining more words correctly.” There is a lot here that you will learn about in later chapters, but the key thing to understand now about this result is the “p .05.” This means that the probability of the results if the null hypothesis were true is less than .05 (5%).

When a result is close but does not reach the significance level chosen, it may be reported anyway as a “near significant trend” or as having “approached signifi- cance.” When a result is not even close to being extreme enough to reject the null hy- pothesis, it may be reported as “not significant,” or the abbreviation ns will be used. Finally, whether or not a result is significant, it is increasingly common for re- searchers to report the exact p level, such as p � .03 or p � .27 (these are given in computer outputs of hypothesis testing results). The p reported here is based on the proportion of the comparison distribution that is more extreme than the sample score information that you could figure from the Z score for your sample and a normal curve table.

A researcher will usually note if he or she used a one-tailed test. When reading research articles, assume the researcher used a two-tailed test if nothing is said oth- erwise. Even though a researcher has chosen a significance level in advance, such as .05, the researcher may note that results meet more rigorous standards. Thus, in the same article, you may see some results noted as “p � .05,” others as “p � .01,” and still others as “p � .001.”

Finally, often researchers show hypothesis testing results only as asterisks (stars) in a table of results. In such tables, a result with an asterisk means it is signif- icant, while a result without an asterisk is not. For example, Table 4–3 shows the re- sults of part of a study by Bohnert and colleagues (2007) comparing various aspects of social adjustment to college of male and female college students during the sum- mer before their first year of college (Time 1) and 10 months later (Time 2). The table gives figures for means, standard deviations, and t statistics—the “t(83)” is about details of the specific hypothesis testing procedure used in this study called a t test, which you will learn in Chapters 7 and 8. The important things to look at now are the asterisks (and the notes at the bottom of the table that go with them). The as- terisks tell you the significance levels for the various comparisons. For example, fe- males had a higher level of friendship quality at Time 1 (M � 2.82) than males (M � 2.49); thus there are three asterisks at the end of the row for this result, which the note at the bottom tells you means that the probability of getting this big a difference

6

6r(21)

66 �2

IS B

N 0-

55 8-

46 76

1- X

128 Chapter 4

if the null hypothesis was true is less than one in a thousand (.001). At Time 1, males reported being more lonely (M � 39.30) than females (M � 34.78), but you can see that there was no significant gender difference in loneliness at Time 2 (the means were 37.88 and 34.71, and the lack of an asterisk in this row indicates that these were not different enough to be significant in this study). At Time 2, females again reported a significantly higher level of friendship quality (M � 3.21) than males (M � 2.84); the asterisks show that the difference was significant at the .001 (one in a thousand) level.

In reporting results of significance testing, researchers rarely talk explicitly about the research hypothesis or the null hypothesis, nor do they describe any of the other steps of the process in detail. It is assumed that readers of psychology research understand all of this very well.

Table 4–3 Means and Standard Deviation for Main Study Variables by Gender Total

(n � 85) Males

(n � 31) Females (n � 54)

M SD M SD M SD t(83)

Adolescence (Time 1)

Friendship quality 2.70 0.40 2.49 0.46 2.82 0.32 13.98***

Loneliness 36.39 8.71 39.30 9.98 34.78 7.56 5.47*

Emerging adulthood (Time 2)

Friendship quality 3.10 0.48 2.84 0.57 3.21 0.38 11.31***

Loneliness 35.84 9.98 37.88 11.38 34.71 9.21 1.76

Activities: Intensity 8.09 8.27 10.00 10.19 7.18 7.18 0.98

Activities: Breadth 1.71 1.06 1.84 1.18 1.65 1.01 0.51

*p � .05. **p � .01. ***p � .001. Source: Bohnert, A. M., Aikins, J. W., & Edidin, J. (2007). The role of organized activities in facilitating social adaptation across the transition to college. Journal of Adolescent Research, 22, 189–208. Sage Publications, Ltd. Reprinted by permission of Sage Publications, Thousands Oaks, London, and New Delhi.

1. Hypothesis testing considers the probability that the result of a study could have come about even if the experimental procedure had no effect. If this probability is low, the scenario of no effect is rejected and the hypothesis behind the exper- imental procedure is supported.

2. The expectation of an effect is the research hypothesis, and the hypothetical situation of no effect is the null hypothesis.

3. When a result (that is, a sample score) is so extreme that the result would be very unlikely if the null hypothesis were true, the researcher rejects the null hy- pothesis and describes the research hypothesis as supported. If the result is not that extreme, the researcher does not reject the null hypothesis, and the study is inconclusive.

4. Psychologists usually consider a result too extreme if it is less likely than 5% (that is, a significance level of p � .05) to have come about if the null hypothe- sis were true. Psychologists sometimes use a more stringent 1% (p � .01 signif- icance level), or even .1% (p � .001 significance level), cutoff.

Summary

IS B

N 0-558-46761-X

Introduction to Hypothesis Testing 129

5. Thecutoffpercentage is theprobabilityof the result beingextreme inapredicted di- rection in a directional or one-tailed test. The cutoff percentages are the probability of the result being extreme in either direction in a nondirectional or two-tailed test.

6. The five steps of hypothesis testing are: ❶ Restate the question as a research hypothesis and a null hypothesis

about the populations. ❷ Determine the characteristics of the comparison distribution. ❸ Determine the cutoff sample score on the comparison distribution at

which the null hypothesis should be rejected. ❹ Determine your sample’s score on the comparison distribution. ❺ Decide whether to reject the null hypothesis.

7. There has been much controversy about significance tests, including critiques of the basic logic and, especially, that they are often misused. One major way researchers misuse significance tests is by interpreting not rejecting the null hypothesis as demonstrating that the null hypothesis is true.

8. Research articles typically report the results of hypothesis testing by saying a re- sult was or was not significant and giving the probability level cutoff (usually 5% or 1%) that the decision was based on.

hypothesis testing (p. 107) hypothesis (p. 107) theory (p. 107) research hypothesis (p. 110) null hypothesis (p. 110)

comparison distribution (p. 111) cutoff sample score (p. 111) conventional levels of significance

(p � .05, p � .01) (p. 113) statistically significant (p. 113)

directional hypothesis (p. 119) one-tailed test (p. 119) nondirectional hypothesis

(p. 119) two-tailed test (p. 119)

Key Terms

A randomly selected individual, after going through an experimental treatment, has a score of 27 on a particular measure. The scores of people in general on this measure are normally distributed with a mean of 19 and a standard deviation of 4. The researcher predicts an effect, but does not predict a particular direction of effect. Using the 5% sig- nificance level, what should you conclude? Solve this problem explicitly using all five steps of hypothesis testing and illustrate your answer with a sketch showing the compar- ison distribution, the cutoff (or cutoffs), and the score of the sample on this distribution.

Answer ❶ Restate the question as a research hypothesis and a null hypothesis about

the populations. There are two populations of interest:

Population 1: People who go through the experimental procedure. Population 2: People in general (that is, people who do not go through the experimental procedure).

The research hypothesis is that Population 1 will score differently than Popula- tion 2 on the particular measure. The null hypothesis is that the two populations are not different on the measure.

Example Worked-Out Problems

IS B

N 0-

55 8-

46 76

1- X

130 Chapter 4

11

–2

Raw Score:

Z Score:

15

–1

19

0

23

+1

27

+2

Sample participant

Raw Score = 27

Z Score = 2

Cutoff Z Score = −1.96

Cutoff Z Score = 1.96

Figure 4–9 Diagram for Example Worked-Out Problem showing comparison distribu- tion, cutoffs (2.5% shaded area in each tail), and sample score.

❷ Determine the characteristics of the comparison distribution: , normally distributed.

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. For a two-tailed test at the 5% level (2.5% at each tail), the cutoff scores are and (see Figure 4–6 or Table 4–2).

❹ Determine your sample’s score on the comparison distribution. Z � (27 � 19)�4 � 2.

❺ Decide whether to reject the null hypothesis. A Z score of 2 is more extreme than the cutoff Z of Reject the null hypothesis; the result is significant. The experimental procedure affects scores on this measure. The diagram is shown in Figure 4–9.

Outline for Writing Essays for Hypothesis-Testing Problems Involving a Single Sample of One Participant and a Known Population

1. Describe the core logic of hypothesis testing. Be sure to explain terminology such as research hypothesis and null hypothesis, and explain the concept of pro- viding support for the research hypothesis when the study results are strong enough to reject the null hypothesis.

2. Explain the concept of the comparison distribution. Be sure to mention that it is the distribution that represents the population situation if the null hypothesis is true. Note that the key characteristics of the comparison distribution are its mean, stan- dard deviation, and shape.

;1.96.

-1.96+1.96

� = 4, � = 19

IS B

N 0-558-46761-X

Introduction to Hypothesis Testing 131

These problems involve figuring. Most real-life statistics problems are done on a computer with special statistical software. Even if you have such software, do these problems by hand to ingrain the method in your mind.

All data are fictional unless an actual citation is given.

Set I (for Answers to Set I Problems, see pp. 675–677) 1. Define the following terms in your own words: (a) hypothesis-testing proce-

dure, (b) .05 significance level, and (c) two-tailed test. 2. When a result is not extreme enough to reject the null hypothesis, explain why it

is wrong to conclude that your result supports the null hypothesis. 3. For each of the following, (a) say which two populations are being compared,

(b) state the research hypothesis, (c) state the null hypothesis, and (d) say whether you should use a one-tailed or two-tailed test and why.

i. Do Canadian children whose parents are librarians score higher than Canadian children in general on reading ability?

ii. Is the level of income for residents of a particular city different from the level of income for people in the region?

iii. Do people who have experienced an earthquake have more or less self- confidence than the general population?

4. Based on the information given for each of the following studies, decide whether to reject the null hypothesis. For each, give (a) the Z-score cutoff (or cutoffs) on the comparison distribution at which the null hypothesis should be rejected, (b) the Z score on the comparison distribution for the sample score, and (c) your conclusion. Assume that all populations are normally distributed.

Practice Problems

3. Describe the logic and process for determining (using the normal curve) the cut- off sample scores on the comparison distribution at which you should reject the null hypothesis.

4. Describe how to figure the sample’s score on the comparison distribution. 5. Explain how and why the scores from Steps ❸ and ❹ of the hypothesis-testing

process are compared. Explain the meaning of the result of this comparison with regard to the specific research and null hypotheses being tested.

5. Based on the information given for each of the following studies, decide whether to reject the null hypothesis. For each, give (a) the Z-score cutoff (or cutoffs) on the comparison distribution at which the null hypothesis should be rejected, (b) the Z score on the comparison distribution for the sample score, and (c) your conclusion. Assume that all populations are normally distributed.

Population

Study � � Sample Score p Tails of Test

A 10 2 14 .05 1 (high predicted) B 10 2 14 .05 2 C 10 2 14 .01 1 (high predicted) D 10 2 14 .01 2 E 10 4 14 .05 1 (high predicted)

IS B

N 0-

55 8-

46 76

1- X

132 Chapter 4

Population

Study � � Sample Score p Tails of Test

A 70 4 74 .05 1 (high predicted) B 70 1 74 .01 2 C 70 2 76 .01 2 D 72 2 77 .01 2 E 72 2 68 .05 1 (low predicted)

6. A psychologist studying the senses of taste and smell has carried out many studies in which students are given each of 20 different foods (apricot, choco- late, cherry, coffee, garlic, and so on). She administers each food by dropping a liquid on the tongue. Based on her past research, she knows that for students overall at the university, the mean number of the 20 foods that students can identify correctly is 14, with a standard deviation of 4, and the distribution of scores follows a normal curve. The psychologist wants to know whether peo- ple’s accuracy on this task has more to do with smell than with taste. In other words, she wants to test whether people do worse on the task when they are only able to taste the liquid compared to when they can both taste and smell it (note that this is a directional hypothesis). Thus, she sets up special procedures that keep a person from being able to use the sense of smell during the task. The psychologist then tries the procedure on one randomly selected student. This student is able to identify only 5 correctly. (a) Using the .05 significance level, what should the psychologist conclude? Solve this problem explicitly using all five steps of hypothesis testing and illustrate your answer with a sketch showing the comparison distribution, the cutoff (or cutoffs), and the score of the sample on this distribution. (b) Then explain your answer to some- one who has never had a course in statistics (but who is familiar with mean, standard deviation, and Z scores).

7. A psychologist is working with people who have had a particular type of major surgery. This psychologist proposes that people will recover from the operation more quickly if friends and family are in the room with them for the first 48 hours after the operation. It is known that time to recover from this kind of surgery is normally distributed with a mean of 12 days and a standard deviation of 5 days. The procedure of having friends and family in the room for the period after the surgery is tried with a randomly selected pa- tient. This patient recovers in 18 days. (a) Using the .01 significance level, what should the researcher conclude? Solve this problem explicitly using all five steps of hypothesis testing, and illustrate your answer with a sketch showing the comparison distribution, the cutoff (or cutoffs), and the score of the sample on this distribution. (b) Then explain your answer to someone who has never had a course in statistics (but who is familiar with mean, stan- dard deviation, and Z scores).

8. What is the effect of going through a natural disaster on the attitude of police chiefs about the goodness of the people in their city? A researcher studying this expects a more positive attitude (because of the many acts of heroism and help- ing of neighbors), but a more negative attitude is also possible (because of loot- ing and scams). It is known that, using a 1-to-10 scale (from 1 � extremely negative attitude to 10 � extremely positive attitude), in general police chiefs’ attitudes about the goodness of the people in their cities is normally distributed,

IS B

N 0-558-46761-X

Introduction to Hypothesis Testing 133

with a mean of 6.5 and a standard deviation of 2.1. A major earthquake has just occurred in an isolated city, and shortly afterward the researcher is able to give the attitude questionnaire to the police chief of that city. The chief’s score is 8.2. (a) Using the .05 significance level, what should the researcher conclude? Solve this problem explicitly using all five steps of hypothesis testing and illustrate your answer with a sketch showing the comparison distribution, the cutoff (or cutoffs), and the score of the sample on this distribution. (b) Then explain your answer to someone who has never had a course in statistics (but who is familiar with mean, standard deviation, and Z scores).

9. Robins and John (1997) carried out a study on narcissism (self-love), comparing people who scored high versus low on a narcissism questionnaire. (An example item was, “If I ruled the world it would be a better place.”) They also had other questionnaires, including one that had an item about how many times the partic- ipant looked in the mirror on a typical day. In their results section, the re- searchers noted “. . . as predicted, high-narcissism individuals reported looking at themselves in the mirror more frequently than did low narcissism individuals (Ms � 5.7 vs. 4.8), . . . p � .05” (p. 39). Explain this result to a person who has never had a course in statistics. (Focus on the meaning of this result in terms of the general logic of hypothesis testing and statistical significance.)

10. Reber and Kotovsky (1997), in a study of problem solving, described one of their results comparing a specific group of participants within their overall con- trol condition as follows: “This group took an average of 179 moves to solve the puzzle, whereas the rest of the control participants took an average of 74 moves, t (19) � 3.31, p � .01” (p. 183). Explain this result to a person who has never had a course in statistics. (Focus on the meaning of this result in terms of the general logic of hypothesis testing and statistical significance.)

Set II 11. List the five steps of hypothesis testing, and explain the procedure and logic of

each. 12. When a result is significant, explain why it is wrong to say the result “proves”

the research hypothesis. 13. For each of the following, (a) state which two populations are being compared,

(b) state the research hypothesis, (c) state the null hypothesis, and (d) say whether you should use a one-tailed or two-tailed test and why.

i. In an experiment, people are told to solve a problem by focusing on the details. Is the speed of solving the problem different for people who get such instruc- tions compared to the speed for people who are given no special instructions?

ii. Based on anthropological reports in which the status of women is scored on a 10-point scale, the mean and standard deviation across many cultures are known.Anew culture is found in which there is an unusual family arrangement. The status of women is also rated in this culture. Do cultures with the unusual family arrangement provide higher status to women than cultures in general?

iii. Do people who live in big cities develop more stress-related conditions than people in general?

14. Based on the information given for each of the following studies, decide whether to reject the null hypothesis. For each, give (a) the Z-score cutoff (or cutoffs) on the comparison distribution at which the null hypothesis should be rejected, (b) the Z score on the comparison distribution for the sample score, and (c) your conclusion. Assume that all populations are normally distributed.

IS B

N 0-

55 8-

46 76

1- X

134 Chapter 4

Population

Study � � Sample Score p Tails of Test

A 5 1 7 .05 1 (high predicted) B 5 1 7 .05 2 C 5 1 7 .01 1 (high predicted) D 5 1 7 .01 2

Population

Study � � Sample Score p Tails of Test

A 100.0 10.0 80 .05 1 (low predicted) B 100.0 20.0 80 .01 2 C 74.3 11.8 80 .01 2 D 16.9 1.2 80 .05 1 (low predicted) E 88.1 12.7 80 .05 2

15. Based on the information given for each of the following studies, decide whether to reject the null hypothesis. For each, give (a) the Z-score cutoff (or cutoffs) on the comparison distribution at which the null hypothesis should be rejected, (b) the Z score on the comparison distribution for the sample score, and (c) your conclusion. Assume that all populations are normally distributed.

16. A researcher wants to test whether a certain sound will make rats do worse on learning tasks. It is known that an ordinary rat can learn to run a particular maze correctly in 18 trials, with a standard deviation of 6. (The number of trials to learn this maze is normally distributed.) The researcher now tries an ordinary rat in the maze, but with the sound. The rat takes 38 trials to learn the maze. (a) Using the .05 level, what should the researcher conclude? Solve this problem explicitly using all five steps of hypothesis testing, and illustrate your answer with a sketch showing the comparison distribution, the cutoff (or cutoffs), and the score of the sample on this distribution. (b) Then explain your answer to someone who has never had a course in statistics (but who is familiar with mean, standard deviation, and Z scores).

17. A family psychologist developed an elaborate training program to reduce the stress of childless men who marry women with adolescent children. It is known from previous research that such men, one month after moving in with their new wife and her children, have a stress level of 85 with a standard devi- ation of 15, and the stress levels are normally distributed. The training program is tried on one man randomly selected from all those in a particular city who during the preceding month have married a woman with an adolescent child. After the training program, this man’s stress level is 60. (a) Using the .05 level, what should the researcher conclude? Solve this problem explicitly using all five steps of hypothesis testing and illustrate your answer with a sketch show- ing the comparison distribution, the cutoff (or cutoffs), and the score of the sample on this distribution. (b) Then explain your answer to someone who has never had a course in statistics (but who is familiar with mean, standard devia- tion, and Z scores).

IS B

N 0-558-46761-X

Introduction to Hypothesis Testing 135

18. A researcher predicts that listening to music while solving math problems will make a particular brain area more active. To test this, a research participant has her brain scanned while listening to music and solving math problems, and the brain area of interest has a percentage signal change of 58. From many previous studies with this same math problems procedure (but not listening to music), it is known that the signal change in this brain area is normally distributed with a mean of 35 and a standard deviation of 10. (a) Using the .01 level, what should the researcher conclude? Solve this problem explicitly using all five steps of hy- pothesis testing, and illustrate your answer with a sketch showing the comparison distribution, the cutoff (or cutoffs), and the score of the sample on this distribu- tion. (b) Then explain your answer to someone who has never had a course in sta- tistics (but who is familiar with mean, standard deviation, and Z scores).

19. Pecukonis (1990), as part of a larger study, measured ego development (a mea- sure of overall maturity) and ability to empathize with others among a group of 24 aggressive adolescent girls in a residential treatment center. The girls were di- vided into high- and low-ego development groups, and the empathy (“cognitive empathy”) scores of these two groups were compared. In his results section, Pecukonis reported, “The average score on cognitive empathy for subjects scor- ing high on ego development was 22.1 as compared with 16.3 for low scorers, . . . p � .005” (p. 68). Explain this result to a person who has never had a course in sta- tistics. (Focus on the meaning of this result in terms of the general logic of hy- pothesis testing and statistical significance.)

20. In an article about antitobacco campaigns, Siegel and Biener (1997) discuss the results of a survey of tobacco usage and attitudes, conducted in Massachusetts in 1993 and 1995; Table 4–4 shows the results of this survey. Focusing on just

Table 4–4 Selected Indicators of Change in Tobacco Use, ETS Exposure, and Public Attitudes toward Tobacco Control Policies—Massachusetts, 1993–1995

1993 1995

Adult Smoking Behavior

Percentage smoking 25 cigarettes daily 24 10*

Percentage smoking �15 cigarettes daily 31 49*

Percentage smoking within 30 minutes of waking 54 41

Environmental Tobacco Smoke Exposure

Percentage of workers reporting a smoke free worksite 53 65*

Mean hours of ETS exposure at work during prior week 4.2 2.3*

Percentage of homes in which smoking is banned 41 51*

Attitudes Toward Tobacco Control Policies

Percentage supporting further increase in tax on

tobacco with funds earmarked for tobacco control 78 81

Percentage believing ETS is harmful 90 84

Percentage supporting ban on vending machines 54 64*

Percentage supporting ban on support of sports and cultural events by tobacco companies 59 53*

* p � .05 Source: Siegel, M., & Biener, L. (1997). Evaluating the impact of statewide anti-tobacco campaigns: The Massachusetts and California tobacco control programs. Journal of Social Issues, 53, 147–168. Copyright © 1997 by Blackwell Publishing. Reprinted by permission of Blackwell Publishers Journals.

IS B

N 0-

55 8-

46 76

1- X

136 Chapter 4

the first line (the percentage smoking 25 cigarettes daily), explain what this result means to a person who has never had a course in statistics. (Focus on the meaning of this result in terms of the general logic of hypothesis testing and statistical significance.)

1. We are oversimplifying a bit to make the initial learning easier. The research hy- pothesis is that one population will walk earlier than the other, . Thus, to be precise, its opposite is that the other group will either walk at the same time or later. That is, the opposite of the research hypothesis in this example in- cludes both no difference and a difference in the direction opposite to what we predicted. In terms of symbols, if our research hypothesis is , then its opposite is (the symbol “ ” means “greater than or equal to”). We discuss this issue in some detail later in the chapter.

2. In practice, since hypothesis testing is usually done on a computer, you have to decide in advance only on the cutoff probability. The computer prints out the exact probability of getting your result if the null hypothesis were true. You then just compare the printed-out probability to see if it is less than the cutoff proba- bility level you set in advance. However, to understand what these probability levels mean, you need to learn the entire process, including how to figure the Z score for a particular cutoff probability.

**10 %**discount on an order above

**$ 100**

Use the following coupon code :

ULTIMATE