Exploring Statistics

by

Summarize the important material after reading the attached chapter-1 file.  Also, select one of the statisticians and identify the main philosophy and accomplishments.

Exploring Statistics
Tales of Distributions

Never use plagiarized sources. Get Your Original Essay on
Exploring Statistics
Hire Professionals Just from $11/Page
Order Now Click here

12th Edition

Chris Spatz

Outcrop Publishers Conway, Arkansas

Exploring Statistics: Tales of Distributions
12th Edition
Chris Spatz

Cover design: Grace Oxley
Answer Key: Jill Schmidlkofer
Webmaster & Ebook: Fingertek Web Design, Tina Haggard
Managers: Justin Murdock, Kevin Spatz

Copyright © 2019 by Outcrop Publishers, LLC
All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any
means, including photocopying, recording, or other electronic or mechanical methods, without the prior written
permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other
noncommercial uses permitted by copyright law. For permission requests, contact [email protected] or
write to the publisher at the address below.

Outcrop Publishers
615 Davis Street
Conway, AR 72034
Email: [email protected]
Website: outcroppublishers.com
Library of Congress Control Number: [Applied for]

ISBN-13 (hardcover): 978-0-9963392-2-3
ISBN-13 (ebook): 978-0-9963392-3-0
ISBN-13 (study guide): 978-0-9963392-4-7

Examination copies are provided to academics and professionals to consider for adoption as a course textbook.
Examination copies may not be sold or transferred to a third party. If you adopt this textbook, please acept it as your
complimentary desk copy.

Ordering information:
Students and professors – visit exploringstatistics.com
Bookstores – email [email protected]

Photo Credits – Chapter 1
Karl Pearson – Courtesy of Wellcomeimages.org
Ronald A. Fisher – R.A. Fisher portrait, 0006973, Special Collections Research Center, North Carolina State

University Libraries, Raleigh, North Carolina
Jerzy Neyman – Paul R. Halmos Photograph Collection, e_ph 0223_01, Dolph Briscoe Center for American History,

The University of Texas at Austin
Jacob Cohen – New York University Archives, Records of the NYU Photo Bureau

Printed in the United States of America by Walsworth ®
1 2 3 4 5 6 7 24 23 22 21 20 19 18

Online study guide available at
http://exploringstatistics.com/studyguide.php

http://exploringstatistics.com/studyguide.php

mailto:[email protected]

http://outcroppublishers.com

http://exploringstatistics.com

mailto:[email protected]

v About The Author

Chris Spatz is at Hendrix College where he twice served as chair of
the Psychology Department. Dr. Spatz’s undergraduate education was
at Hendrix and his PhD in experimental psychology is from Tulane
University in New Orleans. He subsequently completed postdoctoral
fellowships in animal behavior at the University of California, Berkeley,
and the University of Michigan. Before returning to Hendrix to teach,
Spatz held positions at The University of the South and the University
of Arkansas at Monticello.

Spatz served as a reviewer for the journal Teaching of Psychology
for more than 20 years. He co-authored a research methods textbook,
wrote several chapters for edited books, and was a section editor for the
Encyclopedia of Statistics in Behavioral Science.

In addition to writing and publishing, Dr. Spatz enjoys the outdoors,
especially canoeing, camping, and gardening. He swims several times
a week (mode = 3). Spatz has been an opponent of high textbook prices for years, and he is
happy to be part of a new wave of authors who provide high-quality textbooks to students at
affordable prices.

About The Author

vi Dedication

With love and affection,

this textbook is dedicated to

Thea Siria Spatz, Ed.D., CHES

Introduction
CHAPTER

1

O B J E C T I V E S F O R C H A P T E R 1

After studying the text and working the problems in this chapter, you should be
able to:

1. Distinguish between descriptive and inferential statistics
2. Define population, sample, parameter, statistic, and variable as they are

used in statistics
3. Distinguish between quantitative and categorical variables
4. Distinguish between continuous and discrete variables
5. Identify the lower and upper limits of a continuous variable
6. Identify four scales of measurement and distinguish among them
7. Distinguish between statistics and experimental design
8. Define independent variable, dependent variable, and extraneous variable

and identify them in experiments
9. Describe statistics’ place in epistemology
10. List actions to take to analyze a data set
11. Identify a few events in the history of statistics

WE BEGIN OUR exploration of statistics with a trip to London. The year is 1900.
Walking into an office at University College

London, we meet a tall, well-dressed man about
40 years old. He is Karl Pearson, Professor of
Applied Mathematics and Mechanics. I ask him
to tell us a little about himself and why he is an
important person. He seems authoritative, glad
to talk about himself. As a young man, he says,
he wrote essays, a play, and a novel, and he also
worked for women’s suffrage. These days, he is
excited about this new branch of biology called
genetics. He says he supervises lots of data
gathering.

1

Karl Pearson

2 Chapter 1

Pearson, warming to our group, lectures us about the major problem in science—there
is no agreement on how to decide among competing theories. Fortunately, he just published
a new statistical method that provides an objective way to decide among competing theories,
regardless of the discipline. The method is called chi square.1 Pearson says, “Now, arguments
will be much fewer. Gather a thousand data points and calculate a chi square test. The result
gives everyone an objective way to determine whether or not the data fit the theory.”

Exploration Notes from a student: Exploration off to good start. Hit on a nice, easy-to-
remember date to start with, visited a founder of statistics, and had a statistic called chi square
described as a big deal.

Our next stop is Rothamsted Experiment Station just
north of London. Now the year is 1925. There are fields all
around the agricultural research facility, each divided into
many smaller plots. The growth in the fields seems quite
variable.

Arriving at the office, the atmosphere is congenial. The
staff is having tea. There are two topics—a new baby and
a new book. We get introduced to Ronald Fisher, the chief
statistician. Fisher is a small man with thick glasses and red
hair.

He tells us about his new child2 and then motions to
a book on the table. Sneaking a peek, we read the title:
Statistical Methods for Research Workers. Fisher becomes
focused on his book, holding forth in an authoritative way.

He says the book explains how to conduct experiments
and that an experiment is just a comparison of two or more conditions. He tells us we don’t need
a thousand data points. He says that small samples, randomly selected, are the way for science
to progress. “With an experiment and my technique of analysis of variance,” he exclaims, “you
can determine why that field out there”—here he waves toward the window—“is so variable.
We can find out what makes some plots lush and some mimsy.” Analysis of variance,3 he says,
works in any discipline, not just agriculture.

Exploration Notes: Looks like statistics had some controversy in it.4 Also looks like
progress. Statistics is used for experiments, too, and not just for testing theories. And Fisher
says experiments can be used to compare anything. If that’s right, I can use statistics no matter
what I major in.

1 Chi square, which is explained in this book in Chapter 14, has been called one of the 20 most important inventions
in the 20th century (Hacking, 1984).
2 (in what will become a family with eight children).
3 explained in Chapters 11-13
4 The slight sniping I’ve built into this story is just a hint of the strong animosity between Fisher and Pearson.

Ronald A. Fisher

3 Introduction

Next we go to Poland to visit Jerzy Neyman at his
office at the University of Warsaw. It is 1933. As we walk
in, he smiles, seems happy we’ve arrived, and makes us feel
completely welcome.

Motioning to an envelope on his desk, he tells us it holds
a manuscript that he and Egon Pearson5 wrote. “The problem
with Fisher’s analysis of variance test is that it focuses
exclusively on finding a difference between groups. Suppose
the statistical test doesn’t detect a difference. Does that prove
there is no difference? No, of course not. It may be that the test
was just not sensitive enough to detect the difference. Right?”

At his question, a few of us nod in agreement. Seeing
uncertainty, he notes, “Maybe a larger sample is needed to
find the difference, you see? Anyway, what we’ve done is
expand statistics to cover not just finding a difference, but
also what it means when the test doesn’t find a difference.
Our approach is what you people in your time will call null
hypothesis significance testing.”

Exploration Notes: Statistics seems like a work in progress. Changing. Now it is not just
about finding a difference but also about what it means not to find a difference. Also, looks like
null hypothesis significance testing is a phrase that might turn up on tests.

Our next trip is to libraries, say, anytime between 1940 and 2000. For this exploration, the task
is to examine articles in professional journals published in various disciplines. The disciplines
include anthropology, biology, chemistry, defense strategy, education, forestry, geology, health,
immunology, jurisprudence, manufacturing, medicine, neurology, ophthalmology, political
science, psychology, sociology, zoology, and others. I’m sure you get the idea—the whole range
of disciplines that use quantitative measures in their research. What this exploration produces is
the discovery that all of these disciplines rely on a data analysis technique called null hypothesis
significance testing (NHST).6 Many different statistical tests are employed. However, for all the
tests in all the disciplines, the phrase, “p < .05” turns up frequently. Exploration Notes: It seems that all that earlier controversy has subsided and scientists in all sorts of disciplines have agreed that NHST is the way to analyze quantitative data. All of them seem to think that if there is a comparison to be made, applying NHST is a necessary step to get correct conclusions. All of them use “p < .05,” so I’ll have to be sure to find out exactly what that means. 5 Egon Pearson was Karl Pearson’s son. 6 Null hypothesis significance testing is first explained in Chapters 9 and 10. Jerzy Neyman 4 Chapter 1 Our next excursion is a 1962 visit with Jacob Cohen at New York University in New York City. He is holding his article about studies published in the Journal of Abnormal and Social Psychology, a leading psychology journal. He tells us that the NHST technique has problems. Also, he says we should be calculating an effect size statistic, which will show whether the differences observed in our experiments are large or small. Exploration Notes: The idea of an effect size index makes a lot of sense. Just knowing there is a difference isn’t enough. How big is the difference? Wonder what “problems with NHST” is all about. Back to the library for a final excursion to check out recent events. We come across a 2014 article by Geoff Cumming on the “new statistics.” We find things like, “avoid NHST and use better techniques” (p. 26) and “we should not trust any p value” (p. 13). This seems like awfully strong advice. Are researchers taking this advice? Looking through more of today’s research in journals in several fields, we find that most statistical analyses use NHST and there are many instances of “p < .05.” Exploration Notes, Conclusion: These days, it looks like statistics is in transition again. There’s a lot of controversy out there about how to analyze data from experiments. The NHST approach is still very common, though, so it’s clear I must learn it. But I want to be prepared for changes. I hope knowing NHST will be helpful for the future.7 Welcome to statistics at a time when the discipline is once again in transition. A well- established tradition (null hypothesis significance testing) has been in place for almost a century but is now under attack. New ways of thinking about data analysis are emerging, and along with them, a collection of statistics that do not include the traditional NHST approach. As for the immediate future, though, NHST remains the method most widely used by researchers in many fields. In addition, much of the thinking required for NHST is required for other approaches. Our exploration tour is over, so I’ll quit supplying notes; they are your responsibility now. As your own experience probably shows, making up your own summary notes improves retention of what you read. In addition, I have a suggestion. Adopt a mindset that thinks growth. A student with a growth mindset expects to learn new things. When challenges arise, as they 7 Not only helpful, but necessary, I would say. Jacob Cohen 5 Introduction Disciplines that Use Quantitative Data inevitably do, acknowledge them and figure out how to meet the challenge. A growth mindset treats ability as something to be developed (see Dweck, 2016). If you engage yourself in this course, you can expect to use what you learn for the rest of your life. The main title of this book is “Exploring Statistics.” Exploring conveys the idea of uncovering something that was not apparent before. An attitude of searching, wondering, checking, and so forth is what I want to encourage. (Those who object to traditional NHST procedures are driven by this exploration motivation.) As for this book’s subtitle, “Tales of Distributions,” I’ll have more to say about it as we go along. Which disciplines use quantitative data? The list is long and more variable than the list I gave earlier. The examples and problems in this textbook, however, come from psychology, biology, sociology, education, medicine, politics, business, economics, forestry, and everyday life. Statistics is a powerful method for getting answers from data, and this makes it popular with investigators in a wide variety of fields. Statistics is used in areas that might surprise you. As examples, statistics has been used to determine the effect of cigarette taxes on smoking among teenagers, the safety of a new surgical anesthetic, and the memory of young school-age children for pictures (which is as good as that of college students). Statistics show which diseases have an inheritance factor, how to improve short-term weather forecasts, and why giving intentional walks in baseball is a poor strategy. All these examples come from Statistics: A Guide to the Unknown, a book edited by Judith M. Tanur and others (1989). Written for those “without special knowledge of statistics,” this book has 29 essays on topics as varied as those above. In American history, the authorship of 12 of The Federalist papers was disputed for a number of years. (The Federalist papers were 85 short essays written under the pseudonym “Publius” and published in New York City newspapers in 1787 and 1788. Written by James Madison, Alexander Hamilton, and John Jay, the essays were designed to persuade the people of the state of New York to ratify the Constitution of the United States.) To determine authorship of the 12 disputed papers, each was graded with a quantitative value analysis in which the importance of such values as national security, a comfortable life, justice, and equality was assessed. The value analysis scores were compared with value analysis scores of papers known to have been written by Madison and Hamilton (Rokeach, Homant, & Penner, 1970). Another study, by Mosteller and Wallace, analyzed The Federalist papers using the frequency of words such as by and to (reported in Tanur et al., 1989). Both studies concluded that Madison wrote all 12 essays. Here is an example from law. Rodrigo Partida was convicted of burglary in Hidalgo County, a border county in southern Texas. A grand jury rejected his motion for a new trial. Partida’s attorney filed suit, claiming that the grand jury selection process discriminated against Mexican-Americans. In the end (Castaneda v. Partida, 430 U.S. 482 [1976]), Justice Harry 6 Chapter 1 Inferential statistics Method that uses sample evidence and probability to reach conclusions about unmeasurable populations. Descriptive statistic A number that conveys a particular characteristic of a set of data. Mean Arithmetic average; sum of scores divided by number of scores. Blackmun of the U.S. Supreme Court wrote, regarding the number of Mexican-Americans on grand juries, “If the difference between the expected and the observed number is greater than two or three standard deviations, then the hypothesis that the jury drawing was random (is) suspect.” In Partida’s case, the difference was approximately 12 standard deviations, and the Supreme Court ruled that Partida’s attorney had presented prima facie evidence. (Prima facie evidence is so good that one side wins the case unless the other side rebuts the evidence, which in this case did not happen.) Statistics: A Guide to the Unknown includes two essays on the use of statistics by lawyers. Gigerenzer et al. (2007), in their public interest article on health statistics, point out that lack of statistical literacy among both patients and physicians undermines the information exchange necessary for informed consent and shared decision making. The result is anxiety, confusion, and undue enthusiasm for testing and treatment. Whatever your current interests or thoughts about your future as a statistician, I believe you will benefit from this course. A successful statistics course teaches you to identify questions a set of data can answer; determine the statistical procedures that will provide the answers; carry out the procedures; and then, using plain English and graphs, tell the story the data reveal. The best way for you to acquire all these skills (especially the part about telling the story) is to engage statistics. Engaged students are easily recognized; they are prepared for exams, are not easily distracted while studying, and generally finish assignments on time. Becoming an engaged student may not be so easy, but many have achieved it. Here are my recommendations. Read with the goal of understanding. Attend class. Do all the assignments (on time). Write down questions. Ask for explanations. Expect to understand. (Disclaimer: I’m not suggesting that you marry statistics, but just engage for this one course.) Are you uncertain about whether your background skills are adequate for a statistics course? For most students, this is an unfounded worry. Appendix A, Getting Started, should help relieve your concerns. What Do You Mean, “Statistics”? The Oxford English Dictionary says that the word statistics came into use almost 250 years ago. At that time, statistics referred to a country’s quantifiable political characteristics—characteristics such as population, taxes, and area. Statistics meant “state numbers.” Tables and charts of those numbers turned out to be a very satisfactory way to compare different countries and to make projections about the future. Later, tables and charts proved useful to people studying trade (economics) and natural phenomena (science). Statistical thinking spread because it helped. Today, two different techniques are called statistics. Descriptive statistics8 produce a number or a figure that summarizes or describes a set of data. You are already familiar with some descriptive statistics. For example, you know about the arithmetic average, called 7 Introduction 8 Boldface words and phrases are defined in the margin and also in Appendix D, Glossary of Words. 9 A summary of this study can be found in Ellis (1938). The complete reference and all others in the text are listed in the References section at the back of the book. the mean. You have probably known how to compute a mean since elementary school—just add up the numbers and divide the total by the number of entries. As you already know, the mean describes the central tendency of a set of numbers. The basic idea of descriptive statistics is simple: They summarize a set of data with one number or graph. This book covers about a dozen descriptive statistics. The other statistical technique is inferential statistics. Inferential statistics use measurements from a sample to reach conclusions about a larger, unmeasured population. There is, of course, a problem with samples. Samples always depend partly on the luck of the draw; chance helps determine the particular measurements you get. If you have the measurements for the entire population, chance doesn’t play a part—all the variation in the numbers is “true” variation. But with samples, some of the variation is the true variation in the population and some is just the chance ups and downs that go with a sample. Inferential statistics was developed as a way to account for the effects of chance that come with sampling. This book will cover about a dozen and a half inferential statistics. Here is a textbook definition: Inferential statistics is a method that takes chance factors into account when samples are used to reach conclusions about populations. Like most textbook definitions, this one condenses many elements into a short sentence. Because the idea of using samples to understand populations is perhaps the most important concept in this course, please pay careful attention when elements of inferential statistics are explained. Inferential statistics has proved to be a very useful method in scientific disciplines. Many other fields use inferential statistics, too, so I selected examples and problems from a variety of disciplines for this text and its auxiliary materials. Null hypothesis significance testing, which had a prominent place in our exploration tour, is an inferential statistics technique. Here is an example from psychology that uses the NHST technique. Today, there is a lot of evidence that people remember the tasks they fail to complete better than the tasks they complete. This is known as the Zeigarnik effect. Bluma Zeigarnik asked participants in her experiment to do about 20 tasks, such as work a puzzle, make a clay figure, and construct a box from cardboard.9 For each participant, half the tasks were interrupted before completion. Later, when the participants were asked to recall the tasks they worked on, they listed more of the interrupted tasks (average about 7) than the completed tasks (about 4). One good question to start with is, “Did interrupting make a big difference or a small difference?” In this case, interruption produced about three additional memory items compared to the completion condition. This is a 75% difference, which seems like a big change, given our experience with tests of memory. The question of “How big is the difference?” can often be answered by calculating an effect size index. 8 Chapter 1 clue to the future So, should you conclude that interruption improves memory? Not yet. It might be that interruption actually has no effect but that several chance factors happened to favor the interrupted tasks in Zeigarnik’s particular experiment. One way to meet this objection is to conduct the experiment again. Similar results would lend support to the conclusion that interruption improves memory. A less expensive way to meet the objection is to use inferential statistics such as NHST. NHST begins with the actual data from the experiment. It ends with a probability—the probability of obtaining data like those actually obtained if it is true that interruption has no effect on memory. If the probability is very small, you can conclude that interruption does affect memory. For Zeigarnik’s data, the probability was tiny. Now for the conclusion. One version might be, “After completing about 20 tasks, memory for interrupted tasks (average about 7) was greater than memory for completed tasks (average about 4). The approximate 75% difference cannot be attributed to chance because chance by itself would rarely produce a difference between two samples as large as this one.” The words chance and rarely tell you that probability is an important element of inferential statistics. My more complete answer to what I mean by “statistics” is Chapter 6 in 21st Century Psychology: A Reference Handbook (Spatz, 2008). This 8-page chapter summarizes in words (no formulas) the statistical concepts usually covered in statistics courses. This chapter can orient you as you begin your study of statistics and later provide a review after you finish your course. clue to the future The first part of this book is devoted to descriptive statistics (Chapters 2–6) and the second part to inferential statistics (Chapters 7–15). Inferential statistics is the more comprehensive of the two because it combines descriptive statistics, probability, and logic. Calculating effect size indexes is first addressed in Chapter 5. It is also a topic in Chapters 9-14. Statistics: A Dynamic Discipline Many people continue to think of statistics as a collection of techniques that were developed long ago, that have not changed, and that will be the same in the future. That view is mistaken. Statistics is a dynamic discipline characterized by more than a little controversy. New techniques in both descriptive and inferential statistics continue to be developed. Controversy 9 Introduction Some Terminology continues too, as you saw at the end of our exploration tour. To get a feel for the issues when the controversy entered the mainstream, see Dillon (1999) or Spatz (2000) for nontechnical summaries. For more technical explanations, see Nickerson (2000). To read about current approaches, see Erceg-Hurn and Mirosevich (2008), Kline (2013), or Cumming (2014). In addition to controversy over techniques, attitudes toward data analysis shifted in recent years. The shift has been toward the idea of exploring data to see what it reveals and away from using statistical analyses to nail down a conclusion. This shift owes much of its impetus to John Tukey (1915–2000), who promoted Exploratory Data Analysis (Lovie, 2005). Tukey invented techniques such as the boxplot (Chapter 5) that reveal several characteristics of a data set simultaneously. Today, statistics is used in a wide variety of fields. Researchers start with a phenomenon, event, or process that they want to understand better. They make measurements that produce numbers. The numbers are manipulated according to the rules and conventions of statistics. Based on the outcome of the statistical analysis, researchers draw conclusions and then write the story of their new understanding of the phenomenon, event, or process. Statistics is just one tool that researchers use, but it is often an essential tool. Family incomes of college students in the fall of 2017 Weights of crackers eaten by obese male students Depression scores of Alaskans Gestation times for human beings Memory scores of human beings10 Population All measurements of a specified group. Sample Measurements of a subset of a population. Like most courses, statistics introduces you to many new words. In statistics, most of the terms are used over and over again. Your best move, when introduced to a new term, is to stop, read the definition carefully, and memorize it. As the term continues to be used, you will become more and more comfortable with it. Making notes is helpful. Populations and Samples A population consists of all the scores of some specified group. A sample is a subset of a population. The population is the thing of interest. It is defined by the investigator and includes all cases. The following are some populations: 10 I didn’t pull these populations out of thin air; they are all populations that researchers have gathered data on. Studies of these populations will be described in this book. 10 Chapter 1 Parameter Numerical or nominal characteristic of a population. Statistic Numerical or nominal characteristic of a sample. Variable Something that exists in more than one amount or in more than one form. Investigators are always interested in populations. However, as you can determine from these examples, populations can be so large that not all the members can be studied. The investigator must often resort to measuring a sample that is small enough to be manageable. A sample taken from the population of incomes of families of college students might include only 40 students. From the last population on the list, Zeigarnik used a sample of 164. Most authors of research articles carefully explain the characteristics of the samples they use. Often, however, they do not identify the population, leaving that task to the reader. The answer to the question “What is the population?” depends on the specifics of a …