Chapter 11 Problems      AGid      Statistical Modeling: A Fresh Approach (2/e)

• What is a probability model?
• What are some of the different probability models and what different situations do they describe?
• What is a “parameter” in a probability model? Give some examples.

Prob 11.01. Two basic operations that you need to perform on probability models are these:

percentile
Given a value, what is the probability (according to the model) of the outcome from one trial being that value or less?
quantile
Given a probability, what is the value whose percentile is that probability?

To illustrate by example, suppose that you are dealing with a probability model for an IQ test score that is a normal distribution with these parameters: mean = 100 and standard deviation = 15.

Percentile question: What is the percentile that corresponds to a test score of 120? Answer: 0.91 or, in other words, the 91st percentile.

> pnorm(120, mean=100,sd=15)
 0.9087888

Quantile question: What score will 95% of scores be less than or equal to? Answer: a score of 125.

> qnorm(0.95, mean=100, sd=15)
 124.6728

Here are two very basic questions about percentile and quantile calculations:

(a)
True or False
The output of a percentile question will always be a probability, that is, a number between 0 and 1.
(b)
True or False
The output of a quantile question will always be a value, that is, something in the same units as the random variable.

Sometimes to answer more complicated questions, you need first to answer one or more percentile or quantile questions.

Answer the following questions, using the normal probability model with the parameters given above:

(a)
What’s the test score that almost everybody, say, 99% of people, will do better than?
• Which kind of calculation is this?
percentile  quantile
55  65  75  95  115  125  135
(b)
To calculate a coverage interval on a probability model, you need to calculate two quantities: one for the left end of the interval and one for the right. Which type of calculation are these probabilities from:
percentile  quantile
(c)
Calculate a 50% coverage interval on the test scores, that is the range from the 0.25 quantile to the 0.75 quantile:
• Left end of interval:
80  85  90  95  100  105  110  115  120
• Right end of interval:
80  85  90  95  100  105  110  115  120
(d)
Calculate an 80% coverage interval, that is the range from the 0.10 to the 0.90 quantile:
• Left end of interval:
52  69  73  81  85  89
• Right end of interval:
117  119  123  128  136
(e)
To calculate the probability of an outcome falling in a given range, you need to do two percentile calculations, one for each end of the range. Then you substract the two different probabilities. What is the probability of a test score falling between 100 and 120?
0.25  0.37  0.41  0.48  0.52  0.61  0.73

Prob 11.02. A coverage interval gives a range of values. The “level” of the interval is the probability that a random trial will fall inside that range. For example, in a 95% coverage interval, 95% of the trials will fall within the range.

To construct a coverage interval, you need to translate the level into two quantiles, one for the left side of the range and one for the right side. For example, a 50% coverage interval runs from the 0.25 quantile on the left to the 0.75 quantile on the right; a 60% coverage interval runs from 0.20 on the left to 0.80 on the right. The probabilities used in calculating the quantiles are set so that

• the difference between them is the level of the interval. For instance, 0.75 and 0.25 give a 50% interval.
• they are symmetric. That is, the left probability should be exactly as far from 0 as the right probability is from 1

A classroom of students was asked to calculate the left and right probabilities for various coverage intervals. Some of their answers were wrong. Explain what is wrong, if anything, for each of these answers.

(a)
For a 70% interval, the 0.20 and 0.90 quantiles

 A The difference between them isn’t 0.70 B They are not symmetrical. C Nothing is wrong.

(b)
For a 95% interval, the 0.05 and 0.95 quantiles.

 A The difference between them isn’t 0.95 B They are not symmetrical. C Nothing is wrong.

(c)
For a 95% interval, the 0.025 and 0.975 quantiles.

 A The difference between them isn’t 0.95 B They are not symmetrical. C Nothing is wrong.

Prob 11.04. For each of the following probability models, calculate a 95% coverage interval. This means that you should specify a left value and a right value. The left value corresponds to a probability of 0.025 and the right value to a probability of 0.975.

(a)
The number of cars driving along a highway in one hour, when the mean number of cars is 2000 per hour. Hint: Poisson model

Left side of interval:

1812  1904  1913  1928  1935

Right side of interval:

2064  2072  2077  2088  2151
(b)
The number of heads out of 100 flips of a fair coin. Hint: Binomial model.

Left side of interval:

36  38  40  42  44  46

Right side of interval:

54  56  58  60  62  64
(c)
The angle of a random spinner, ranging from 0 to 360 degrees. Hint: Uniform model.

Left side of interval:

9  15  25  36  42  60

Right side of interval:

300  318  324  335  345  351

Prob 11.05. For each of these families of probability distributions, what are the parameters used to describe a specific distribution?

(a)
Uniform distribution

 A Mean and Standard Deviation B Max and Min C Probability and Size D Average Number per Interval

(b)
Normal distribution

 A Mean and Standard Deviation B Max and Min C Probability and Size D Average Number per Interval

(c)
Exponential distribution

 A Mean and Standard Deviation B Max and Min C Probability and Size D Average Number per Interval

(d)
Poisson distribution

 A Mean and Standard Deviation B Max and Min C Probability and Size D Average Number per Interval

(e)
Binomial distribution

 A Mean and Standard Deviation B Max and Min C Probability and Size D Average Number per Interval

Prob 11.10. College admissions offices collect information about each year’s applicants, admitted students, and matriculated students. At one college, the admissions office knows from past years that 30% of admitted students will matriculate.

The admissions office explains to the administration each year that the results of the admissions process vary from year to year due to random sampling fluctuations. Each year’s results can be interpreted as a draw from a random process with a particular distribution.

Which family of probability distribution can best be used to model each of the following situations?

(a)
1500 students are offered admission. The number of students who will actually matriculate is:

 A Normal B Uniform C Binomial D Poisson E Exponential F Lognormal

(b)
The average SAT score of the admitted applicants:

 A Normal B Uniform C Binomial D Poisson E Exponential F Lognormal

(c)
The number of women in the matriculated class:

 A Normal B Uniform C Binomial D Poisson E Exponential F Lognormal

Prob 11.11. In 1898, Ladislaus von Bortkiewicz published The Law of Small Numbers, in which he demonstrated the applicability of the Poisson probability law. One example dealt with the number of Prussian cavalry soldiers who were kicked to death by their horses. The Prussian army monitored 10 cavalry corps for 20 years and recorded the number X of fatalities each year in each corps. There were a total of 10 × 20 = 200 one-year observations, as shown in the table:

 Number of Number of Times X Deaths X Deaths Were Observed 0 109 1 65 2 22 3 3 4 1

(a)
From the data in the table, what was the mean number of deaths per year per cavalry corps?

 A (109+65+22+3+1)/5 B (109+65+22+3+1)/200 C (0*109 + 1*65 + 2*22 + 3*3 + 4*1)/5 D (0*109 + 1*65 + 2*22 + 3*3 + 4*1)/200 E Can’t tell from the information given.

(b)
Use this mean number of deaths per year and the Poisson probability law to determine the theoretical proportion of years that 0 deaths should occur in any given calvary corps.
0.3128  0.4286  0.4662  0.5210  0.5434
(c)
Repeat the probability calculation for 1, 2, 3, and 4 deaths per year per calvary corps. Multiply the probabilities by 200 to find the expected number of calvary corps with each number of deaths. Which of these tables is closest to the theoretical values:

 A 112.67    64.29   16.22   5.11   1.63 B 108.67    66.29   20.22   4.11   0.63 C 102.67   70.29   22.22   6.11   0.63 D 106.67   68.29   17.22   6.11   1.63

Prob 11.12. Experience shows that the number of cars entering a park on a summer’s day is approximately normally distributed with mean 200 and variance 900. Find the probability that the number of cars entering the park is less than 195.

(a)
Which type of calculation is this?
percentile  quantile
(b)
Do the calculation with the given parameters. (Watch out! Look carefully at the parameters and make sure they are in a standard form.)
0.3125  0.4338  0.4885  0.5237  0.6814  0.7163

Prob 11.20. The graph shows a cumulative probability. (a)
Use the graph to estimate by eye the 20th percentile of the probability distribution. (Select the closest answer.)

0  2  4  6  8
(b)
Using the graph, estimate by eye the probability of a randomly selected x falling between 5 and 8? (Select the closest answer.)

0.05  0.25  0.50  0.75  0.95

Prob 11.21. Ralph’s bowling scores in a single game are normally distributed with a mean of 120 and a standard deviation of 10.

• He plays 5 games. What is the mean and standard deviation of his total score?

Mean:

10  120  240  360  600  710

Standard deviation:

20.14  21.69  22.36  24.31  24.71
• What is the mean and standard deviation of his average score for the 5 games?

Mean:

10  60  90  120  150  180

Standard deviation:

4.47  4.93  5.10  5.62  6.18

Lucky Lolly bowls games that with scores randomly distributed with a mean of 100 and standard deviation of 15.

• What is the z-score of 150 for Lolly?
1.00  2.00  2.33  3.00  3.33  7.66  120  150
• What is the z-score of 150 for Ralph?
1.00  2.00  2.33  3.00  3.33  7.66  120  150
• Is Lolly or Ralph more likely to score over 150?

 A Lolly B Ralph C Equally likely D Can’t tell from the information given.

• What is the z-score of 130 for Lolly?
1.00  2.00  2.33  3.00  3.33  7.66  120  150
• What is the z-score of 130 for Ralph?
1.00  2.00  2.33  3.00  3.33  7.66  120  150
• Is Lolly or Ralph more likely to score over 130?

 A Lolly B Ralph C Equally likely D Can’t tell from the information given.

Prob 11.22. Jim scores 700 on the mathematics part of the SAT. Scores on the SAT follow the normal distribution with mean 500 and standard deviation 100. Julie takes the ACT test of mathematical ability, which has mean 18 and standard deviation 6. She scores 24. If both tests measure the same kind of ability, who has the higher score?

 A Jim B Julie C They are the same. D No way to tell.

Prob 11.23. For each of the following, decide whether the random variable is binomial or not. Then choose the best answer from the set offered.

(a)
Number of aces in a draw of 10 cards from a shuffled deck with replacement.

 A It is binomial. B It’s not because the sample size is not fixed. C It’s not because the probability is not fixed for every individual component. D It’s not for both of the above reasons.

(b)
Number of aces in a draw of 10 cards from a shuffled deck without replacement.

 A It is binomial. B It’s not because the sample size is not fixed. C It’s not because the probability is not fixed for every individual component. D It’s not for both of the above reasons.

(c)
A broken typing machine has probability of 0.05 to make a mistake on each character. The number of erroneous characters in each sentence of a report.

 A It is binomial. B It’s not because the sample size is not fixed. C It’s not because the probability is not fixed for every individual component. D It’s not for both of the above reasons.

(d)
Suppose screws produced by a certain company will be defective with probability 0.01 independent of each other. The company sells the screws in a package of 10. The number of defective screws in a randomly selected pack.

 A It is binomial. B It’s not because the sample size is not fixed. C It’s not because the probability is not fixed for every individual component. D It’s not for both of the above reasons.

(e)

Observe the sex of the next 50 children born at a local hospital. Let x= # of girls among them.

 A It is binomial. B It’s not because the sample size is not fixed. C It’s not because the probability is not fixed for every individual component. D It’s not for both of the above reasons.

(f)

A couple decides to continue to have children until their first daughter. Let x = # of children the couple has.

 A It is binomial. B It’s not because the sample size is not fixed. C It’s not because the probability is not fixed for every individual component. D It’s not for both of the above reasons.

(g)
Jason buys the state lottery ticket every month using his favorite combination based on his birthday and his wife’s. x= # of times he wins a prize in one year.

 A It is binomial. B It’s not because the sample size is not fixed. C It’s not because the probability is not fixed for every individual component. D It’s not for both of the above reasons.

Prob 11.30. Just before a referendum on a school budget, a local newspaper plans to poll 400 random voters out of 50,000 registered voters in an attempt to predict whether the budget will pass. Suppose that the budget actually has the support of 52% of voters.

(a)
What is the probability that the newspaper’s sample will wrongly lead them to predict defeat, that is, less than 50% of the poll respondents will indicate support?

 A qbinom(.5,size=400,prob=.52) B pbinom(.52,size=400,prob=.50) C rnorm(400,mean=0.52,sd=.50) D pbinom(199,size=400,prob=0.52) E qnorm(.52,mean=400,sd=.50)

(b)
What is the probability that more than 250 of those 400 voters will support the budget?

 A pbinom(250,size=400,prob=.50) B pbinom(249,size=400,prob=0.52) C 1-pbinom(250,size=400,prob=0.52) D qbinom(.5,size=400,prob=.52) E 1-qnorm(.52,mean=400,sd=.50)

Prob 11.31. Here is a graph of a probability density. (a)
Using the graph, estimate by eye the probability of a randomly selected x falling between 2 and 4. (Give the closest answer.)

0.05  0.25  0.50  0.75  0.95
(b)
Using the graph of probability density above, estimate by eye the probability of a randomly selected x being less than 2. (Give the closest answer.)

0.05  0.25  0.50  0.75  0.95

Prob 11.32. A student is asked to calculate the probability that x = 3.5 when x is chosen from a normal distribution with the following parameters: mean=3, sd=5. To calculate the answer, he uses this command:

> dnorm(3.5, mean=3, sd=5)
 0.0794

This is not right. Why not?

 A He should have used pnorm. B The parameters are wrong. C The answer is zero since the variable x is continuous. D He should have used qnorm.

Prob 11.33. A paint manufacture has a daily production, x, that is normally distributed with a mean 100,000 gallons and a standard deviation of 10,000 gallons. Management wants to create an incentive for the production crew when the daily production exceeds the 90th percentile of the distribution. You are asked to calculate at what level of production should management pay the incentive bonus?

 A qnorm(0.90,mean=100000,sd=10000) B pnorm(0.90,mean=100000,sd=10000) C qbinom(10000,size=100000,prob=0.9) D dnorm(0.90,mean=100000,sd=10000)

Prob 11.34. Suppose that the height, in inches, of a randomly selected 25-year-old man is a normal random variable with standard deviation 2.5 inches. In the strange universe in which statistics problems are written, we don’t know the mean of this distribution but we do know that approximately 12.5% of 25-year-old man are taller than 6 feet 2 inches. Using this information, calculate the following.

(a)
What’s the average height of 25-year-old men? That is, find the mean of a normal distribution with standard deviation of 2.5 inches such that 12.5% of the distribution lies at or above 74 inches.
68.54  70.13  71.12  73.82  74.11  75.23  75.88  76.14
(b)
Using this distribution, how tall should a man be in order to be in the tallest 5% of 25-year-old men?
68.54  70.13  71.12  73.82  74.11  75.23  75.88  76.14

Prob 11.35. In commenting on the “achievement gap” between different groups in the public school, the Saint Paul Public School Board released the following information:

Saint Paul Public Schools (SPPS) serve more than 42,000 students. Thirty percent are African American, 30% Asian, and 13% Hispanic. The stark reality is that reading scores for two-thirds of our district’s African American students fall below the national average, while reading scores for 90% of their white counterparts surpass it.

The point of this exercise is to translate this information into the point-score increase needed to bring African American students’ scores into alignment with the white students.

Imagine that the test scores for white students form a normal distribution with a mean of 100 and a standard deviation of 25. Suppose also that African American students have test scores that form a normal distribution with a standard deviation of 25. What would have to be the mean of the African American students’ test scores in order to match the information given by the School Board?

(a)
What is the score threshold for passing the test if 90% of white students pass? One of the following R commands will calculate it. Which one?

 A pnorm(0.1,mean=100,sd=25) B pnorm(0.9,mean=100,sd=25) C qnorm(25,mean=100,sd=0.9) D qnorm(0.1,mean=100,sd=25) E qnorm(0.9,mean=100,sd=25) F qnorm(25,mean=100,sd=0.1) G rnorm(25,mean=100,sd=0.9)

(b)
Using that threshold, what would be the mean score for African Americans such that two-thirds (66.7%) are below the threshold? Hint: If you knew the answer, then it would produce this result.
 0.667

Start by proposing an answer; a guess will do. Look at the resulting response and use that to guide refining your proposal until you hit the target response: 0.667. When you are at the target, your proposal will be close to one of these:

47  57  63  71  81
(c)
Suppose scores for the African American students were to increase by 15 points on average. What would be the failure rate (in percent)?
21  35  44  53  67  74
(d)
A common way to report the difference between two groups is the number of standard deviations that separate the means. How big is this for African American students in the Saint Paul Public Schools compared to whites (under the assumptions made for this problem)?
0.32  0.64  1.18  1.72  2.66  5.02

It would be more informative if school districts gave the actual distribution of scores rather than the passing rate.

Prob 11.36. A manufacturer of electrical fiber optic cables produces spools that are 50,000 feet long. In the production process, flaws are introduced randomly. A study of the flaws indicates that, on average, there is 1 flaw per 10000 feet.

(a)
Which probability distribution describes the situation of how many flaws there will be in a spool of cable.

 A normal B uniform C binomial D exponential E poisson

(b)
What’s the probability that a 50,000 foot-long cable has 3 or fewer flaws? (Enter your answer to 3 decimal places, e.g., 0.456.)

# to within ±0.001

Prob 11.37. To help reduce speeding, the local governments sometimes put up speed signs at locations where speeding is a problem. These signs measure the speed of each passing car and display that speed to the driver. In some countries, such as the UK, the devices are equiped with a camera which records an image of each speeding car and a speeding ticket is sent to the registered owner.

At one location, the data recorded from such a device indicates that between 7 and 10 PM, 32% of cars are speeding and that 4.3 cars per minute pass the intersection, on average.

Which probability distributions can be best used to model each of the following situations for 7 to 10PM?

(a)
The number of speeding cars in any 1 hour period.

 A Normal B Uniform C Binomial D Poisson E Exponential F Lognormal

(b)
The time that elapses between cars:

 A Normal B Uniform C Binomial D Poisson E Exponential F Lognormal

(c)
Out of 100 successive cars passing the device, the number that are speeding

 A Normal B Uniform C Binomial D Poisson E Exponential F Lognormal

(d)
The mean speed of 100 successive cars passing the device.

 A Normal B Uniform C Binomial D Poisson E Exponential F Lognormal

Prob 11.40. As part of a test of the security inspection system in an airport, a government supervisor adds 5 suitcases with illegal materials to an otherwise shipping load, bringing the total to 150 suitcases.

In order to determine whether the shipment should be accepted, security officers randomly select 15 of the suitcases and X-rays them. If one or more of the suitcases is found to contain the materials, the entire shipment will be searched.

1.
What probability model best applies here in describing the probability that at one or more of the five added suitcases will be X-rayed?

 A Normal B Uniform C Binomial D Poisson E Exponential

2.
What is the probability that one or more of the five added suitcases will be X-rayed?

 A 1-pnorm(5,mean=150,sd=15) B 1-pnorm(15,mean=150,sd=5) C 1-punif(5,min=0,max=150) D 1-punif(15,min=0,max=150) E 1-pbinom(0,size=15,prob=5/150) F 1-ppois(0,45/150) G 1-ppois(0,5/150) H Not enough information to tell.

Prob 11.41. Do the best job you can answering this question. The information provided is not complete, but that’s the way things often are.

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear rallies.

(a)
Rank the following possibilities from most likely to least likely, for instance “A B C D E”:
• Linda is a teacher.
• Linda works in a bookstore and takes yoga classes.
• Linda is a bank teller.
• Linda sells insurance.
• Linda is a bank teller and is active in the feminist movement.

(b)
Is there any relationship between the probabilities of the above items that you can be absolutely sure is true?

From ’Judgments of and by representativeness,’ in “Judgment under uncertainty : heuristics and biases” / edited by Daniel Kahneman, Paul Slovic, Amos Tversky. Pub info Cambridge ; New York : Cambridge University Press, c1982.

Prob 11.42. According to the website http://www.wikihealth.com/Pregnancy, approximately 3.6% of pregnant women give birth on the predicted date (using the method that calculates gestational duration starting at the time of the last menstrual period). Assume that the probability of giving birth is a normal distribution whose mean is at the predicted date. The standard deviation quantifies the spread of gestational durations.

Using just the 3.6% “fact,” make an estimate of the standard deviation of all pregnancies assuming that pregnancies are distributed as a normal distribution centered on the predicted date. Hint: Think of the area under the distribution over the range that covers one 24-hour period.

8  9  10  11  12

Prob 11.43. You have decided to become a shoe-maker. Contrary to popular belief, this is a highly competitive field and you had to take the Shoe-Maker Apprenticeship Trial (SAT) as part of your apprenticeship application.

(a)
The Shoe-maker’s union has told you that among all the people taking the SAT, the mean score is 700 and the standard deviation is 35.

According to this information, what is the percentile corresponding to your SAT score of 750?

 A 1-pnorm(750,mean=700,sd=35) B pnorm(750,mean=700,sd=35) C qnorm(750,mean=700,sd=35) D 1-qnorm(750,mean=700,sd=35) E Not enough information to answer.

(b)
Your friend just told you that she scored at the 95th percentile, but she can’t remember her numerical score. Using the information from the Shoe-maker’s union, what was her score?

 A pnorm(.95,mean=700,sd=35) B pnorm(.95,mean=750,sd=35) C qnorm(.95,mean=700,sd=35) D qnorm(.95,mean=750,sd=35) E Not enough information to answer.

Prob 11.44. Both the poisson and binomial probability distributions describe a count of events.

The binomial distribution describes a series of identical discrete events with two possible outcomes to each event: yes or no, true or false, success or failure, and so on. The number of “success” or “true” or “yes” events in the series is given by the binomial distribution, so long as the individual events are independent of one another. An example is the number of heads that occur when flipping a coin ten times in a row. Each flip has a heads or tails outcome. The individual flips are independent.

There are two parameters to the binomial distribution: the number of events (“size”) and the probability of the outcome that will be counted as a success. For the distribution to be binomial, the number of events must be fixed ahead of time and the probability of success must be the same for each event. The outcome whose probability is represented by the binomial distribution is the number of successful events.

Example: You flip 10 fair coins and count the number of heads. In this case the size is 10 and the probability of success is 12.

Counter-example: You flip coins and count the number of flips until the 10th head. This is not a binomial distribution because the size is not fixed.

The poisson probability model is different. It describes a situation where the rate at which events happen is fixed but there is no fixed number of events.

Example: Cars come down the street in a random way but at an average rate of 3 per minute. The poisson distribution describes the probability of seeing any given number of cars in one minute. Unlike the binomial distribution, there is no fixed number of events; potentially 50 cars could pass by in one minute (although this is very, very unlikely).

Both the poisson and binomial distributions are discrete. You can’t have 5.5 heads in 10 flips of a coin. You can’t have 4.2 cars pass by in one minute. Because of this, the basic way to use the tabulated probabilities is as a probability assigned to each possible outcome.

Here is the table of outcomes for the number of heads in 6 flips of a fair coin:

> dbinom(0:5, size=6, prob=.5)
 0.016 0.094 0.234 0.313 0.234 0.094 0.016

So, the probability of exactly zero heads is 0.016, the prob. of 1 head is 0.094, and so on.

You may also be interested in the cumulative probability:

> pbinom(0:5, size=6, prob=.5)
 0.016 0.109 0.344 0.656 0.891 0.984 1.000

So, the probability of 1 head or fewer is 0.109, just the sum of the probability of exactly 0 heads and exactly one head.

Note that in both cases, there was no point in asking for the probability of more than 6 heads; six is the most that could possibly happen. If you do ask, the answer will be “zero”: it can’t happen.

> dbinom(7, size=6, prob=.5)
 0

Similarly, there is no point in asking for the probability of 3.5 heads, that can’t happen either.

> dbinom(3.5, size=6, prob=.5)
 0
Warning message:
non-integer x = 3.500000

The software sensibly returns a probability of zero, but warns that you are asking something silly.

The poisson distribution is similar, but different in important ways. If cars pass by a point randomly at an average rate of 3 per minute, here is the probability of seeing 0, 1, 2, ... cars in any randomly selected minute.

> dpois(0:6, 3)
 0.05 0.15 0.22 0.22 0.17 0.10 0.05

So, there is a 5% chance of seeing no cars in one minute.

But unlike the binomial situation, where the maximum number of successful outcomes is fixed by the number of events, it’s possible for a very large number of cars to pass by.

> dpois(0:9, 3)
 0.0498 0.1494 0.2240 0.2240 0.1680
 0.1008 0.0504 0.0216 0.0081 0.0027

For instance, there is a 0.2% chance that 9 cars will pass by in one minute. That’s small, but it’s definitely non-zero.

The poisson model, like the binomial, describes a situation where the outcome is a whole number of events. It makes little sense to ask for a fractional outcome. The probability of a fractional outcome is always zero.

> dpois(3.5, 3)
 0
Warning message:
non-integer x = 3.500000

Often one wants to consider a poisson event over a longer or shorter interval than the one implicit in the specified rate. For example, when you say that the average rate of cars passing a spot is 3 per minute, the interval of one-minute is implicit. Suppose, however, that you want to know the number of cars that might pass by a spot in one hour. To calculate this, you need to find the rate in terms of the new interval. Since one hour is 60 minutes, the rate of 3 per minute is equivalent to 180 per hour. You can use this to find the probability.

For example, the probability that 150 or fewer cars will pass by in one hour (when the average rate is 3 per minute) is given by a cumulative probability:

> ppois(150, 180)
 0.0122061

It can be hard to remember whether the above means “150 or fewer” or “fewer than 150.” When in doubt, you can always make the situation explicit by using a non-integer argument

> ppois(150.1, 180)  # includes 150
 0.0122061
> ppois(149.9, 180)  # excludes 150
 0.00991012

This works only when asking for cumulative probabilities, since 150.1 or less includes the integers 150, 149, and so on. Were you to ask for the probability of getting exactly 150.1 cars in one hour, using the dpois operator, the answer would be zero:

> dpois(150.1, 180)
 0
Warning message:
non-integer x = 150.100000

For each of the following, figure out the computer statement with which you can compute the probability.

1.
If cars pass a point randomly at an average rate of 10 per minute, what is the probability of exactly 15 cars passing in one minute?

0.000  0.0026  0.035  0.053  0.087  0.263  0.337  0.334  0.437  0.559  0.915  0.951
2.
If cars pass a point randomly at an average rate of 10 per minute, what is the probability of 15 or fewer cars passing in one minute?

0.000  0.0026  0.035  0.053  0.087  0.263  0.334  0.337  0.437  0.559  0.915  0.951
3.
If cars pass a point randomly at an average rate of 10 per minute, what is the probability of 20 or fewer cars passing in two minutes?

0.000  0.0026  0.035  0.053  0.087  0.263  0.334  0.337  0.437  0.559  0.915  0.951
4.
If cars pass a point randomly at an average rate of 10 per minute, what is the probability of 1200 or fewer cars passing in two hours and ten minutes (that is, 130 minutes)?

0.000  0.0026  0.035  0.053  0.087  0.263  0.334  0.337  0.437  0.559  0.915  0.951
5.
A department at a small college has 5 faculty members. If those faculty are effectively random samples from a population of potential faculty that is 40% female, what is the probability that 1 or fewer of the five deparment members will be female?

0.000  0.0026  0.035  0.053  0.087  0.263  0.334  0.337  0.437  0.559  0.915  0.951
6.
What is the probability that 4 or more will be female?

0.000  0.0026  0.035  0.053  0.087  0.263  0.334  0.337  0.437  0.559  0.915  0.951

Prob 11.45. Government data indicates that the average hourly wage for manufacturing workers in the United States is \$14. (Statistics Abstract of the United States, 2002) Suppose the distribution of the manufacturing wage rate nationwide can be approximated by a normal distribution. If a worker did a nationwide job search and found that 15% of the jobs paid more than \$15.30 per hour. In order to find the standard deviation of the hourly wage for manufacturing workers, what process should we try?

 A qnorm(0.15, mean=14, sd=15.3) B Look for x such that pnorm(15.3, mean=14,sd=x) gives 0.85 C Calculate a z-score using 1.3 as the standard deviation D Not enough information is being given.

Prob 11.46. In many social issues, policy recommendations are based on cases from the extremes of a distribution. Consider, for example, a news story (Minnesota Public Radio, March 19, 2006) comparing the high-school graduation rates of Native Americans (85%) and whites (97%). The disparity becomes more glaring when one compares high-school drop-out rates, 3% for whites and 15% for Native Americans.

One way to compare these two drop-out rates is simply to take the ratio: five times as many children in one group drop out as in the other. Or, one could claim that the graduation rate for one group is “only” a factor of 1.14 higher than that for the other. While both of these descriptions are accurate, neither of them has a unique claim to truth.

Another way to interpret the data is to imagine a student’s high-school experience as a point on a quantitative continuum. If the experience is below a threshold, the student does not graduate. We can imagine the outcome as being the sum of many contributions: quality of the school and teachers, support from family, peer influences, personality of the student, and so on.

Suppose that we model the high-school experience as a normal distribution with the same standard deviation for whites and Native Americans but with different means. For the sake of specificity, let the mean for whites be 100 with a standard deviation of 20.

• What is the threshold for graduation? Find a number such that 3% of whites are below this.
• Using the threshold you found above, find the mean for Native Americans such that 15% of children are below the threshold.

This model is, of course, arbitrary. We don’t know that there is anything that corresponds to a quantitative high-school “experience,” and we certainly don’t know that even if there were it would be distributed according to a normal distribution. Nevertheless, this can be a helpful way to interpret data about the “extremes” when making comparisons of the means.

• Suppose, contrary to fact, that the drop-out rate for group A is 15% and that for group B were five times as high: 75%. If group A has a high-school experience with mean 100 and standard deviation 20, and group B has a standard deviation of 20, what should be the mean of group B to produce the higher drop-out rate.

Enter the work you used to answer these questions in the box. You can cut and paste from the computer output, but make sure to indicate clearly and concisely what your answers are.

Prob 11.47. Geology Professor Karl Wirth studies the age of rocks as determined by ratios of isotropes. The figure shows the results of an age assay of rocks collected at seven sites. Because of the intrinsically random nature of radioactive decay, the measured age is a random variable and has been reported as a mean (in millions of years before the present) and a standard deviation (in the same units). From the geology of the sites, four of them have been classified as “early stage” and three as “main stage.” The graph clearly indicates that the early stage rocks tend to be younger than the main stage rocks. But perhaps this is just the luck of the draw.

Professor Wirth wants to calculate a new random variable: the difference in mean ages between the early and main stage rocks. Since the age difference is a random variable, Prof. Wirth needs to know both the variable’s mean and it’s standard deviation.

To get you started on the calculation, here’s the formula for the difference in mean ages, Δage of the rocks from the two different stages. where Mi is a rock from the main stage and Ei is a rock from the early stage.

To remind you, here are the arithmetic rules for the means and variances of random variables V and W when summed and multiplied by fixed constants a and b:

• mean( aV ) = a mean( V )
• var( aV ) = a2 var( V )
• mean( aV + bW ) = a mean( V ) + b mean( W )
• var( aV + bW ) = a2 var( V ) + b2 var( W )

Do calculations based on the above formulas to answer these questions:

1.
What is the mean of Δage? What are the units?
2.
What is the variance of Δage? What are the units?
3.
What is the variance of Δage? What are the units?
4.
A skeptic claims that the two stages do not differ in age. He points out, correctly, that since Δage is a random variable, there is some possibility that its value is zero? What is the z-score of the value 0 in the distribution of Δage?

Prob 11.48. Below are two different graphs of cumulative probability distributions. Using the appropriate graph, not the computer, estimate the items listed below. Your estimates are not expected to be perfect, but do mark on the graph to show the reasoning behind your answer:

(a)
The 75th percentile of a normal distribution with mean 2 and standard deviation 3.
(b)
When flipping 20 fair coins, the probability of getting 7 or fewer heads.
(c)
The probability of x being 1 standard deviation or more below the mean of a normal distribution.
(d)
The range that covers 90% of the most likely number of heads when flipping 20 fair coins.  