Published On:Wednesday, 14 December 2011
Posted by Muhammad Atif Saeed
Introduction to Statistics
What Are Statistics?
For many people, statistics means numbers—numerical facts, figures, or information. Statistics is about data. Data consists of information about statistical variables. There are two types of variables: Quantitative variables are variables that can be measured or described by values, such as height. Categorical variables have values that are categories, such as type of pet. The data for these variables are usually counts or frequencies of the numbers for each category.
Reports of industry production, baseball batting averages, government deficits, and so forth, are often called statistics. To be precise, these numbers are descriptive statistics because they are numerical data that describe phenomena. Descriptive statistics are as simple as the number of children in each family along a city block or as complex as the annual report released from the U.S. Department of the Treasury.
Method of Statistical Inference
Statistics is also a method, a way of working with numbers to answer puzzling questions about both human and nonhuman phenomena. Questions answerable by using the “method” of statistics are many and varied: Which of several techniques is best for teaching reading to third-graders? Will a new medicine be more effective than the old one? Can you expect it to rain tomorrow? What is the probable outcome of the next presidential election? Which assembly-line process produces fewer faulty carburetors? How can a polling organization make an accurate prediction of a national election by questioning only a few thousand voters? And so on.
For our purposes, statistics is both a collection of numbers and/or pictures and a process: the art and science of making accurate guesses about outcomes involving numbers.
So, fundamentally, the goals of statistics are
- To describe variables and data
- To make accurate inferences about groups based upon incomplete information
Types of Statistics
In these pages, we are concerned with two ways of representing descriptive statistics: numerical and pictorial.
Numerical statistics
Numerical statistics are numbers, but clearly, some numbers are more meaningful than others. For example, if you are offered a purchase price of $1 for an automobile on the condition that you also buy a second automobile, the price of the second automobile would be a major consideration (its price could be $1,000,000 or $1,000); thus, the average—or mean—of the two prices would be the important statistic.
Pictorial statistics
Taking numerical data and presenting it in pictures or graphs is known as pictorial statistics. Showing data in the form of a graphic can make complex and confusing information appear more simple and straight-forward. Different types of graphs are used for quantitative and categorical variables.
Steps in the Process
Making accurate guesses requires groundwork. The statistician must do the following in order to make an educated, better-than-chance hunch:
- Gather data (numerical information).
- Organize the data (sometimes in a pictorial).
- Analyze the data (using tests of significance and so forth).
This book shows you how to do precisely these procedures and how to use your analysis to draw an inference—an educated statistical guess—to solve a particular problem. While these steps may appear simple (and indeed some of them are), sound statistical method requires that you perform them in certain prescribed ways. Those ways are the heart and soul of statistics.
Making Predictions
Suppose that you decide to sell commemorative T-shirts at your town's centennial picnic. You know that you can make a tidy profit, but only if you can sell most of your supply of shirts because your supplier will not buy any of them back. How many shirts can you reasonably plan on selling?
Your first question, of course, is this: How many people will be attending the picnic? Suppose you know that 100,000 tickets to the event have been sold. How many T-shirts should you purchase from your supplier to sell on the day of the event? 10,000? 50,000? 100,000? 150,000? How many of each size—small, medium, large, extra-large? And the important question: How many T-shirts must you sell in order to make some profit for your time and effort and not be left with an inventory of thousands of unsold shirts?
Ideally, before you buy your inventory of T-shirts, you need to have an accurate idea of just how many ticket holders will want to purchase centennial T-shirts and which sizes they will want. But, obviously, you have neither the time nor the resources to ask all 100,000 people whether they plan to purchase commemorative T-shirts. If, however, you could locate a small number of those ticket holders—for example, 100—and get an accurate count of how many of those 100 would purchase a T-shirt, you would have a better idea of how many of the 100,000 ticket holders would be willing to buy one.
That is, of course, if the 100 ticket holders that you asked (called the sample) are not too different in their intentions to purchase T-shirts from the total 100,000 ticket holders (called the population). If the sample is indeed representative (typical) of the population, you could expect about the same percentage of T-shirt sales (and sizes) for the population as for the sample, all things being equal. So, if 50 of your sample of 100 people say they cannot wait to plunk down $10 for a centennial T-shirt, it would be reasonable to expect that you would sell about 50,000 T-shirts to your population of 100,000. (At just $1 profit per T-shirt, that is $50,000!)
But before you start shopping to buy a yacht with your profits, remember that this prediction of total T-shirt sales relies heavily upon the sample being representative (similar to the population), which may not necessarily be the case with your sample. You may have inadvertently selected a sample that has more expendable income or is a greater proportion of souvenir T-shirt enthusiasts or who knows what else. Are you reasonably certain that the intentions of the sample of 100 ticket holders reflect the intentions of the 100,000 ticket holders? If not, you may quite possibly be stuck with tens of thousands of centennial T-shirts and no profit to splurge on a yacht.
You can see why choosing a random sample is a critical part of the process of statistics. Even with careful sampling methods, our conclusions are still educated guesses. This is because one sample does not perfectly represent the population, and different samples may give different results.
Comparing Results
Making predictions is only one use of statistics. Suppose you have recently developed a new headache/pain remedy that you call Ache-Away. Should you produce Ache-Away in quantity and make it available to the public? That would depend upon, among other factors, whether Ache-Away is more effective than the old remedy. How can you determine that?
One way might be to administer both remedies to two separate groups of people, collect data on the results, and then statistically analyze that data to determine if Ache-Away is more effective than the old remedy. And what if the results of this test showed Ache-Away to be more effective? How certain can you be that this particular test administration is indicative of all tests of these two remedies? Perhaps the group taking the old remedy (the control group) and the group taking Ache-Away (the treatment group) were so dissimilar that the results were due not to the pain remedies but to the differences between the groups.
It is possible that the results of this test are far off the results that you would get if you tried the test several more times. You certainly do not want to foist a questionable drug upon an unsuspecting public based upon atypical test results. How certain can you be that you can put your faith in the results of your tests? You can see that the problems of comparing headache remedy results can produce headaches of their own.
Probability
One of the most familiar uses of statistics is to determine the chance of some occurrence. For instance, what are the chances that it will rain tomorrow or that the Chicago Cubs will win a World Series? These kinds of probabilities, although interesting, are not the variety under discussion here. Rather, we are examining the probability in statistics that deals with classic theory and frequency theory—events that can be repeated over and over again, independently, and under the same conditions.
Coin tossing and card drawing are two such examples. A fair coin (one that is not weighted or fixed) has an equal chance of landing heads as landing tails. A typical deck of cards has 52 different cards—13 of each suit (hearts, clubs, diamonds, and spades)—and each card or suit has an equal chance of being drawn. This kind of event forms the basis of your understanding of probability and enables you to find solutions to everyday problems that seem far removed from coin tossing or card drawing.
Probability helps us describe our conclusions in a way that takes the uncertainty from random sampling into account.