Reading Questions.
Prob 11.01. Two basic operations that you need to perform on probability models are these:
To illustrate by example, suppose that you are dealing with a probability model for an IQ test score that is a normal distribution with these parameters: mean = 100 and standard deviation = 15.
Percentile question: What is the percentile that corresponds to a test score of 120? Answer: 0.91 or, in other words, the 91st percentile.
Quantile question: What score will 95% of scores be less than or equal to? Answer: a score of 125.
Here are two very basic questions about percentile and quantile calculations:
Sometimes to answer more complicated questions, you need first to answer one or more percentile or quantile questions.
Answer the following questions, using the normal probability model with the parameters given above:
Prob 11.02. A coverage interval gives a range of values. The “level” of the interval is the probability that a random trial will fall inside that range. For example, in a 95% coverage interval, 95% of the trials will fall within the range.
To construct a coverage interval, you need to translate the level into two quantiles, one for the left side of the range and one for the right side. For example, a 50% coverage interval runs from the 0.25 quantile on the left to the 0.75 quantile on the right; a 60% coverage interval runs from 0.20 on the left to 0.80 on the right. The probabilities used in calculating the quantiles are set so that
A classroom of students was asked to calculate the left and right probabilities for various coverage intervals. Some of their answers were wrong. Explain what is wrong, if anything, for each of these answers.
A
| The difference between them isn’t 0.70 |
B
| They are not symmetrical. |
C
| Nothing is wrong. |
A
| The difference between them isn’t 0.95 |
B
| They are not symmetrical. |
C
| Nothing is wrong. |
A
| The difference between them isn’t 0.95 |
B
| They are not symmetrical. |
C
| Nothing is wrong. |
Prob 11.04. For each of the following probability models, calculate a 95% coverage interval. This means that you should specify a left value and a right value. The left value corresponds to a probability of 0.025 and the right value to a probability of 0.975.
Left side of interval:
Right side of interval:
Left side of interval:
Right side of interval:
Left side of interval:
Right side of interval:
Prob 11.05. For each of these families of probability distributions, what are the parameters used to describe a specific distribution?
A
| Mean and Standard Deviation |
B
| Max and Min |
C
| Probability and Size |
D
| Average Number per Interval |
A
| Mean and Standard Deviation |
B
| Max and Min |
C
| Probability and Size |
D
| Average Number per Interval |
A
| Mean and Standard Deviation |
B
| Max and Min |
C
| Probability and Size |
D
| Average Number per Interval |
A
| Mean and Standard Deviation |
B
| Max and Min |
C
| Probability and Size |
D
| Average Number per Interval |
A
| Mean and Standard Deviation |
B
| Max and Min |
C
| Probability and Size |
D
| Average Number per Interval |
Prob 11.10. College admissions offices collect information about each year’s applicants, admitted students, and matriculated students. At one college, the admissions office knows from past years that 30% of admitted students will matriculate.
The admissions office explains to the administration each year that the results of the admissions process vary from year to year due to random sampling fluctuations. Each year’s results can be interpreted as a draw from a random process with a particular distribution.
Which family of probability distribution can best be used to model each of the following situations?
A
| Normal |
B
| Uniform |
C
| Binomial |
D
| Poisson |
E
| Exponential |
F
| Lognormal |
A
| Normal |
B
| Uniform |
C
| Binomial |
D
| Poisson |
E
| Exponential |
F
| Lognormal |
A
| Normal |
B
| Uniform |
C
| Binomial |
D
| Poisson |
E
| Exponential |
F
| Lognormal |
Prob 11.11. In 1898, Ladislaus von Bortkiewicz published The Law of Small Numbers, in which he demonstrated the applicability of the Poisson probability law. One example dealt with the number of Prussian cavalry soldiers who were kicked to death by their horses. The Prussian army monitored 10 cavalry corps for 20 years and recorded the number X of fatalities each year in each corps. There were a total of 10 × 20 = 200 one-year observations, as shown in the table:
Number of | Number of Times X |
Deaths X | Deaths Were Observed |
0 | 109 |
1 | 65 |
2 | 22 |
3 | 3 |
4 | 1 |
A
| (109+65+22+3+1)/5 |
B
| (109+65+22+3+1)/200 |
C
| (0*109 + 1*65 + 2*22 + 3*3 + 4*1)/5 |
D
| (0*109 + 1*65 + 2*22 + 3*3 + 4*1)/200 |
E
| Can’t tell from the information given. |
A
| 112.67 64.29 16.22 5.11 1.63 |
B
| 108.67 66.29 20.22 4.11 0.63 |
C
| 102.67 70.29 22.22 6.11 0.63 |
D
| 106.67 68.29 17.22 6.11 1.63 |
Prob 11.12. Experience shows that the number of cars entering a park on a summer’s day is approximately normally distributed with mean 200 and variance 900. Find the probability that the number of cars entering the park is less than 195.
Prob 11.20. The graph shows a cumulative probability.
Prob 11.21. Ralph’s bowling scores in a single game are normally distributed with a mean of 120 and a standard deviation of 10.
Mean:
Standard deviation:
Mean:
Standard deviation:
Lucky Lolly bowls games that with scores randomly distributed with a mean of 100 and standard deviation of 15.
A
| Lolly |
B
| Ralph |
C
| Equally likely |
D
| Can’t tell from the information given. |
A
| Lolly |
B
| Ralph |
C
| Equally likely |
D
| Can’t tell from the information given. |
Prob 11.22. Jim scores 700 on the mathematics part of the SAT. Scores on the SAT follow the normal distribution with mean 500 and standard deviation 100. Julie takes the ACT test of mathematical ability, which has mean 18 and standard deviation 6. She scores 24. If both tests measure the same kind of ability, who has the higher score?
A
| Jim |
B
| Julie |
C
| They are the same. |
D
| No way to tell. |
Prob 11.23. For each of the following, decide whether the random variable is binomial or not. Then choose the best answer from the set offered.
A
| It is binomial. |
B
| It’s not because the sample size is not fixed. |
C
| It’s not because the probability is not fixed for every individual component. |
D
| It’s not for both of the above reasons. |
A
| It is binomial. |
B
| It’s not because the sample size is not fixed. |
C
| It’s not because the probability is not fixed for every individual component. |
D
| It’s not for both of the above reasons. |
A
| It is binomial. |
B
| It’s not because the sample size is not fixed. |
C
| It’s not because the probability is not fixed for every individual component. |
D
| It’s not for both of the above reasons. |
A
| It is binomial. |
B
| It’s not because the sample size is not fixed. |
C
| It’s not because the probability is not fixed for every individual component. |
D
| It’s not for both of the above reasons. |
Observe the sex of the next 50 children born at a local hospital. Let x= # of girls among them.
A
| It is binomial. |
B
| It’s not because the sample size is not fixed. |
C
| It’s not because the probability is not fixed for every individual component. |
D
| It’s not for both of the above reasons. |
A couple decides to continue to have children until their first daughter. Let x = # of children the couple has.
A
| It is binomial. |
B
| It’s not because the sample size is not fixed. |
C
| It’s not because the probability is not fixed for every individual component. |
D
| It’s not for both of the above reasons. |
A
| It is binomial. |
B
| It’s not because the sample size is not fixed. |
C
| It’s not because the probability is not fixed for every individual component. |
D
| It’s not for both of the above reasons. |
Prob 11.30. Just before a referendum on a school budget, a local newspaper plans to poll 400 random voters out of 50,000 registered voters in an attempt to predict whether the budget will pass. Suppose that the budget actually has the support of 52% of voters.
A
| qbinom(.5,size=400,prob=.52) |
B
| pbinom(.52,size=400,prob=.50) |
C
| rnorm(400,mean=0.52,sd=.50) |
D
| pbinom(199,size=400,prob=0.52) |
E
| qnorm(.52,mean=400,sd=.50) |
A
| pbinom(250,size=400,prob=.50) |
B
| pbinom(249,size=400,prob=0.52) |
C
| 1-pbinom(250,size=400,prob=0.52) |
D
| qbinom(.5,size=400,prob=.52) |
E
| 1-qnorm(.52,mean=400,sd=.50) |
Prob 11.31. Here is a graph of a probability density.
Prob 11.32. A student is asked to calculate the probability that x = 3.5 when x is chosen from a normal distribution with the following parameters: mean=3, sd=5. To calculate the answer, he uses this command:
This is not right. Why not?
A
| He should have used pnorm. |
B
| The parameters are wrong. |
C
| The answer is zero since the variable x is continuous. |
D
| He should have used qnorm. |
Prob 11.33. A paint manufacture has a daily production, x, that is normally distributed with a mean 100,000 gallons and a standard deviation of 10,000 gallons. Management wants to create an incentive for the production crew when the daily production exceeds the 90th percentile of the distribution. You are asked to calculate at what level of production should management pay the incentive bonus?
A
| qnorm(0.90,mean=100000,sd=10000) |
B
| pnorm(0.90,mean=100000,sd=10000) |
C
| qbinom(10000,size=100000,prob=0.9) |
D
| dnorm(0.90,mean=100000,sd=10000) |
Prob 11.34. Suppose that the height, in inches, of a randomly selected 25-year-old man is a normal random variable with standard deviation 2.5 inches. In the strange universe in which statistics problems are written, we don’t know the mean of this distribution but we do know that approximately 12.5% of 25-year-old man are taller than 6 feet 2 inches. Using this information, calculate the following.
Prob 11.35. In commenting on the “achievement gap” between different groups in the public school, the Saint Paul Public School Board released the following information:
Saint Paul Public Schools (SPPS) serve more than 42,000 students. Thirty percent are African American, 30% Asian, and 13% Hispanic. The stark reality is that reading scores for two-thirds of our district’s African American students fall below the national average, while reading scores for 90% of their white counterparts surpass it.
The point of this exercise is to translate this information into the point-score increase needed to bring African American students’ scores into alignment with the white students.
Imagine that the test scores for white students form a normal distribution with a mean of 100 and a standard deviation of 25. Suppose also that African American students have test scores that form a normal distribution with a standard deviation of 25. What would have to be the mean of the African American students’ test scores in order to match the information given by the School Board?
A
| pnorm(0.1,mean=100,sd=25) |
B
| pnorm(0.9,mean=100,sd=25) |
C
| qnorm(25,mean=100,sd=0.9) |
D
| qnorm(0.1,mean=100,sd=25) |
E
| qnorm(0.9,mean=100,sd=25) |
F
| qnorm(25,mean=100,sd=0.1) |
G
| rnorm(25,mean=100,sd=0.9) |
Start by proposing an answer; a guess will do. Look at the resulting response and
use that to guide refining your proposal until you hit the target response:
0.667. When you are at the target, your proposal will be close to one of
these:
It would be more informative if school districts gave the actual distribution of scores rather than the passing rate.
Prob 11.36. A manufacturer of electrical fiber optic cables produces spools that are 50,000 feet long. In the production process, flaws are introduced randomly. A study of the flaws indicates that, on average, there is 1 flaw per 10000 feet.
A
| normal |
B
| uniform |
C
| binomial |
D
| exponential |
E
| poisson |
Prob 11.37. To help reduce speeding, the local governments sometimes put up speed signs at locations where speeding is a problem. These signs measure the speed of each passing car and display that speed to the driver. In some countries, such as the UK, the devices are equiped with a camera which records an image of each speeding car and a speeding ticket is sent to the registered owner.
At one location, the data recorded from such a device indicates that between 7 and 10 PM, 32% of cars are speeding and that 4.3 cars per minute pass the intersection, on average.
Which probability distributions can be best used to model each of the following situations for 7 to 10PM?
A
| Normal |
B
| Uniform |
C
| Binomial |
D
| Poisson |
E
| Exponential |
F
| Lognormal |
A
| Normal |
B
| Uniform |
C
| Binomial |
D
| Poisson |
E
| Exponential |
F
| Lognormal |
A
| Normal |
B
| Uniform |
C
| Binomial |
D
| Poisson |
E
| Exponential |
F
| Lognormal |
A
| Normal |
B
| Uniform |
C
| Binomial |
D
| Poisson |
E
| Exponential |
F
| Lognormal |
Prob 11.40. As part of a test of the security inspection system in an airport, a government supervisor adds 5 suitcases with illegal materials to an otherwise shipping load, bringing the total to 150 suitcases.
In order to determine whether the shipment should be accepted, security officers randomly select 15 of the suitcases and X-rays them. If one or more of the suitcases is found to contain the materials, the entire shipment will be searched.
A
| Normal |
B
| Uniform |
C
| Binomial |
D
| Poisson |
E
| Exponential |
A
| 1-pnorm(5,mean=150,sd=15) |
B
| 1-pnorm(15,mean=150,sd=5) |
C
| 1-punif(5,min=0,max=150) |
D
| 1-punif(15,min=0,max=150) |
E
| 1-pbinom(0,size=15,prob=5/150) |
F
| 1-ppois(0,45/150) |
G
| 1-ppois(0,5/150) |
H
| Not enough information to tell. |
Prob 11.41. Do the best job you can answering this question. The information provided is not complete, but that’s the way things often are.
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear rallies.
From ’Judgments of and by representativeness,’ in “Judgment under uncertainty : heuristics and biases” / edited by Daniel Kahneman, Paul Slovic, Amos Tversky. Pub info Cambridge ; New York : Cambridge University Press, c1982.
Prob 11.42. According to the website http://www.wikihealth.com/Pregnancy, approximately 3.6% of pregnant women give birth on the predicted date (using the method that calculates gestational duration starting at the time of the last menstrual period). Assume that the probability of giving birth is a normal distribution whose mean is at the predicted date. The standard deviation quantifies the spread of gestational durations.
Using just the 3.6% “fact,” make an estimate of the standard deviation of all pregnancies assuming that pregnancies are distributed as a normal distribution centered on the predicted date. Hint: Think of the area under the distribution over the range that covers one 24-hour period.
Your answer should be in days. Select the closest value:
Prob 11.43. You have decided to become a shoe-maker. Contrary to popular belief, this is a highly competitive field and you had to take the Shoe-Maker Apprenticeship Trial (SAT) as part of your apprenticeship application.
According to this information, what is the percentile corresponding to your SAT score of 750?
A
| 1-pnorm(750,mean=700,sd=35) |
B
| pnorm(750,mean=700,sd=35) |
C
| qnorm(750,mean=700,sd=35) |
D
| 1-qnorm(750,mean=700,sd=35) |
E
| Not enough information to answer. |
A
| pnorm(.95,mean=700,sd=35) |
B
| pnorm(.95,mean=750,sd=35) |
C
| qnorm(.95,mean=700,sd=35) |
D
| qnorm(.95,mean=750,sd=35) |
E
| Not enough information to answer. |
Prob 11.44. Both the poisson and binomial probability distributions describe a count of events.
The binomial distribution describes a series of identical discrete events with two possible outcomes to each event: yes or no, true or false, success or failure, and so on. The number of “success” or “true” or “yes” events in the series is given by the binomial distribution, so long as the individual events are independent of one another. An example is the number of heads that occur when flipping a coin ten times in a row. Each flip has a heads or tails outcome. The individual flips are independent.
There are two parameters to the binomial distribution: the number of events (“size”) and the probability of the outcome that will be counted as a success. For the distribution to be binomial, the number of events must be fixed ahead of time and the probability of success must be the same for each event. The outcome whose probability is represented by the binomial distribution is the number of successful events.
Example: You flip 10 fair coins and count the number of heads. In this case the size is 10 and the probability of success is 1∕2.
Counter-example: You flip coins and count the number of flips until the 10th head. This is not a binomial distribution because the size is not fixed.
The poisson probability model is different. It describes a situation where the rate at which events happen is fixed but there is no fixed number of events.
Example: Cars come down the street in a random way but at an average rate of 3 per minute. The poisson distribution describes the probability of seeing any given number of cars in one minute. Unlike the binomial distribution, there is no fixed number of events; potentially 50 cars could pass by in one minute (although this is very, very unlikely).
Both the poisson and binomial distributions are discrete. You can’t have 5.5 heads in 10 flips of a coin. You can’t have 4.2 cars pass by in one minute. Because of this, the basic way to use the tabulated probabilities is as a probability assigned to each possible outcome.
Here is the table of outcomes for the number of heads in 6 flips of a fair coin:
So, the probability of exactly zero heads is 0.016, the prob. of 1 head is 0.094, and so on.
You may also be interested in the cumulative probability:
So, the probability of 1 head or fewer is 0.109, just the sum of the probability of exactly 0 heads and exactly one head.
Note that in both cases, there was no point in asking for the probability of more than 6 heads; six is the most that could possibly happen. If you do ask, the answer will be “zero”: it can’t happen.
Similarly, there is no point in asking for the probability of 3.5 heads, that can’t happen either.
The software sensibly returns a probability of zero, but warns that you are asking something silly.
The poisson distribution is similar, but different in important ways. If cars pass by a point randomly at an average rate of 3 per minute, here is the probability of seeing 0, 1, 2, ... cars in any randomly selected minute.
So, there is a 5% chance of seeing no cars in one minute.
But unlike the binomial situation, where the maximum number of successful outcomes is fixed by the number of events, it’s possible for a very large number of cars to pass by.
For instance, there is a 0.2% chance that 9 cars will pass by in one minute. That’s small, but it’s definitely non-zero.
The poisson model, like the binomial, describes a situation where the outcome is a whole number of events. It makes little sense to ask for a fractional outcome. The probability of a fractional outcome is always zero.
Often one wants to consider a poisson event over a longer or shorter interval than the one implicit in the specified rate. For example, when you say that the average rate of cars passing a spot is 3 per minute, the interval of one-minute is implicit. Suppose, however, that you want to know the number of cars that might pass by a spot in one hour. To calculate this, you need to find the rate in terms of the new interval. Since one hour is 60 minutes, the rate of 3 per minute is equivalent to 180 per hour. You can use this to find the probability.
For example, the probability that 150 or fewer cars will pass by in one hour (when the average rate is 3 per minute) is given by a cumulative probability:
It can be hard to remember whether the above means “150 or fewer” or “fewer than 150.” When in doubt, you can always make the situation explicit by using a non-integer argument
This works only when asking for cumulative probabilities, since 150.1 or less includes the integers 150, 149, and so on. Were you to ask for the probability of getting exactly 150.1 cars in one hour, using the dpois operator, the answer would be zero:
For each of the following, figure out the computer statement with which you can compute the probability.
Prob 11.45. Government data indicates that the average hourly wage for manufacturing workers in the United States is $14. (Statistics Abstract of the United States, 2002) Suppose the distribution of the manufacturing wage rate nationwide can be approximated by a normal distribution. If a worker did a nationwide job search and found that 15% of the jobs paid more than $15.30 per hour. In order to find the standard deviation of the hourly wage for manufacturing workers, what process should we try?
A
| qnorm(0.15, mean=14, sd=15.3) |
B
| Look for x such that pnorm(15.3, mean=14,sd=x) gives 0.85 |
C
| Calculate a z-score using 1.3 as the standard deviation |
D
| Not enough information is being given. |
Prob 11.46. In many social issues, policy recommendations are based on cases from the extremes of a distribution. Consider, for example, a news story (Minnesota Public Radio, March 19, 2006) comparing the high-school graduation rates of Native Americans (85%) and whites (97%). The disparity becomes more glaring when one compares high-school drop-out rates, 3% for whites and 15% for Native Americans.
One way to compare these two drop-out rates is simply to take the ratio: five times as many children in one group drop out as in the other. Or, one could claim that the graduation rate for one group is “only” a factor of 1.14 higher than that for the other. While both of these descriptions are accurate, neither of them has a unique claim to truth.
Another way to interpret the data is to imagine a student’s high-school experience as a point on a quantitative continuum. If the experience is below a threshold, the student does not graduate. We can imagine the outcome as being the sum of many contributions: quality of the school and teachers, support from family, peer influences, personality of the student, and so on.
Suppose that we model the high-school experience as a normal distribution with the same standard deviation for whites and Native Americans but with different means. For the sake of specificity, let the mean for whites be 100 with a standard deviation of 20.
This model is, of course, arbitrary. We don’t know that there is anything that corresponds to a quantitative high-school “experience,” and we certainly don’t know that even if there were it would be distributed according to a normal distribution. Nevertheless, this can be a helpful way to interpret data about the “extremes” when making comparisons of the means.
Enter the work you used to answer these questions in the box. You can cut and paste from the computer output, but make sure to indicate clearly and concisely what your answers are.
Prob 11.47. Geology Professor Karl Wirth studies the age of rocks as determined by ratios of isotropes. The figure shows the results of an age assay of rocks collected at seven sites. Because of the intrinsically random nature of radioactive decay, the measured age is a random variable and has been reported as a mean (in millions of years before the present) and a standard deviation (in the same units).
From the geology of the sites, four of them have been classified as “early stage” and three as “main stage.” The graph clearly indicates that the early stage rocks tend to be younger than the main stage rocks. But perhaps this is just the luck of the draw.
Professor Wirth wants to calculate a new random variable: the difference in mean ages between the early and main stage rocks. Since the age difference is a random variable, Prof. Wirth needs to know both the variable’s mean and it’s standard deviation.
To get you started on the calculation, here’s the formula for the difference in mean ages, Δ_{age} of the rocks from the two different stages.
To remind you, here are the arithmetic rules for the means and variances of random variables V and W when summed and multiplied by fixed constants a and b:
Do calculations based on the above formulas to answer these questions:
Prob 11.48. Below are two different graphs of cumulative probability distributions. Using the appropriate graph, not the computer, estimate the items listed below. Your estimates are not expected to be perfect, but do mark on the graph to show the reasoning behind your answer: