Chapter 16 Problems      AGid      Statistical Modeling: A Fresh Approach (2/e)

• What’s the difference between a “link value” and a “probability value” in a logistic regression model? How are they related to one another?
• How does the logistic function serve to keep the fitted probability values always within the range 0 to 1?
• What is maximum likelihood and how is it used as a criterion to fit a model?

Prob 16.01. The graphs show the link values and the corresponding probability values for a logistic model where x is the explanatory variable.

• Probability Values

Use the graphs to look up answers to the following. Choose the closest possibility to what you see in the graphs.

• At what value of x is the link value 0?
-2  -1  0  1  2
• What probability corresponds to a link of 0?
0.0  0.1  0.5  0.9  1.0
• At what value of x is the link value 1?
-1.50  -0.75  0.00  1.25  1.75
• What probability corresponds to a link of 1?
0.0  0.25  0.50  0.75  1.00
• What probability corresponds to a link of -1?
0.0  0.25  0.50  0.75  1.00
• What probability corresponds to a link of ? (This isn’t on the graph.)
0.0  0.25  0.50  0.75  1.00
• What probability corresponds to a link of -∞? (This isn’t on the graph.)
0.0  0.25  0.50  0.75  1.00

Prob 16.02. The NASA space shuttle Challenger had a catastrophic accident during launch on January 28, 1986. Photographic evidence from the launch showed that the accident resulted from a plume of hot flame from the side of one of the booster rockets which cut into the main fuel tank. US President Reagan appointed a commission to investigate the accident. The commission concluded that the jet was due to the failure of an O-ring gasket between segments of the booster rocket.

A NASA photograph showing the plume of flame from the side of the booster rocket during the Challenger launch.

An important issue for the commission was whether the accident was avoidable. Attention focused on the fact that the ground temperature at the time of launch was 31F, much lower than for any previous launch. Commission member and Nobel laureate physicist Richard Feynman famously demonstrated, using a glass of ice water and a C-clamp, that the O-rings were very inflexible when cold. But did the data available to NASA before the launch indicate a high risk of an O-ring failure?

Here is the information available at the time of Challenger’s launch from the previous shuttle launches:

 Flight Temp Damage Flight Temp Damage STS-1 66 no STS-2 70 yes STS-3 69 no STS-4 80 NA STS-5 68 no STS-6 67 no STS-7 72 no STS-8 73 no STS-9 70 no STS 41-B 57 yes STS 41-C 63 yes STS 41-D 70 yes STS 41-G 78 no STS 51-A 67 no STS 51-B 75 no STS 51-C 53 yes STS 51-D 67 no STS 51-F 81 no STS 51-G 70 no STS 51-I 76 no STS 51-J 79 no STS 61-A 75 yes STS 61-B 76 no STS 61-C 58 yes

Using these data, you can fit a logistic model to estimate the probability of failure at any temperature.

> mod = glm(Damage ~ Temp, family=’binomial’)
> summary(mod)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)  15.0429     7.3786   2.039   0.0415
Temp         -0.2322     0.1082  -2.145   0.0320

Use the coefficients to find the link value for these launch temperatures:

• 70F (a typical launch temperature)
-2.4  -1.2  1.6  2.7  4.3  7.8  9.4
• 53F (the previous low temperature)
-2.4  -1.2  1.6  2.7  4.3  7.8  9.4
• 31F (the Challenger temperature)
-2.4  -1.2  1.6  2.7  4.3  7.8  9.4

Convert the link value to a probability value for the launch temperatures:

• 70F
0.08  0.23  0.83  0.94  0.985  0.9996  0.9999
• 53F
0.08  0.23  0.83  0.94  0.985  0.9996  0.9999
• 31F
0.08  0.23  0.83  0.94  0.985  0.9996  0.9999

A more complete analysis of the situation would take into account the fact that there are multiple O-rings in each booster, while the Damage variable describes whether any O-ring failed. In addition, there were two O-rings on each booster segment, both of which would have to fail to create a leakage problem. Thus, the probabilities estimated from this model and these data do not accurately reflect the probability of a catastrophic accident.

Prob 16.04. George believes in astrology and wants to check whether a person’s sign influences whether they are left- or right-handed. With great effort, he collects data on 100 people, recording their dominant hand and their astrological sign. He builds a logistic model hand ~ sign. The deviance from the model hand ~ 1 is 102.8 on 99 degrees of freedom. Including the sign term in the model reduces the deviance to 63.8 on 88 degrees of freedom.

The sign term only reduced the degrees of freedom by 11 (that is, from 99 to 88) even though there are 12 astrological signs. Why?

 A There must have been one sign not represented among the 100 people in George’s sample. B sign is redundant with the intercept and so one level is lost. C hand uses up one degree of freedom.

According to theory, if sign were unrelated to hand, the 11 degrees of freedom ought to reduce the deviance by how much, on average?

 A 11∕99 × 102.8 B 1∕11 × 102.8 C to zero D None of the above.

Prob 16.05. This model traces through some of the steps in fitting a model of a yes/no process. For specificity, pretend that the data are from observations of a random sample of teenaged drivers. The response variable is whether or not the driver was in an accident during one year (birthday to birthday). The explanatory variables are sex and age of the driver. The model being fit is accident ~ 1 + age + sex.

Here is a very small, fictitious set of data.

 Case Age Sex Accident? 1 17 F Yes 2 17 M No 3 18 M Yes 4 19 F No

Even if it weren’t fictitious, it would be too small for any practical purpose. But it will serve to illustrate the principles of fitting.

In fitting the model, the computer compares the likelihoods of various candidate values for the coefficients, choosing those coefficients that maximize the likelihood of the model.

Consider these two different candidate coefficients:

 Candidate A Coefficients Intercept age sexF 35 -2 -1 Candidate B Coefficients Intercept age sexF 35 -2 0

The link value is found by multiplying the coefficients by the values of the explanatory variables in the usual way.

• Using the candidate A coefficients, what is the link value for case 1?

 A 35 - 2 × 17 - 0 = 1 B 35 - 2 × 17 - 1 = 0 C 35 - 2 × 18 - 1 = -2 D 35 - 2 × 19 - 1 = -4

• Using the candidate B coefficients, what is the link value for case 3?

 A 35 - 2 × 18 - 0 = -1 B 35 - 2 × 18 - 1 = -2 C 35 - 2 × 18 + 1 = 0 D 35 - 2 × 18 - 2 = -3

The link value is converted to a probability value by using the logistic transform.

• The link value under the candidate A coefficients for case 4 is 35 - 2 × 19 - 1 = -4. What is the corresponding probability value? (Hint: Plug in the link value to the logistic transform!)
0.004  0.018  0.027  0.047  0.172  0.261
• The link value under the candidate B coefficients for case 4 is 35 - 2 × 19 - 0 = -3. What is the corresponding probability value?
0.004  0.018  0.027  0.047  0.172  0.261

The probability value is converted to a likelihood by calculating the probability of the observed outcome according to the probability value. When the outcome is “Yes,” the likelihood is just the same as the probability value. But when the outcome is “No,” the likelihood is 1 minus the probability value.

• The link value for case 3 using the candidate A coefficients is -1 and the corresponding probability value is 0.269. What is the likelihood of the observed value of case 3 under the candidate A coefficients?
0.000  0.269  0.500  0.731  1.000
• The link value for case 2 using the candidate A coefficients is 1 and the corresponding probability value is 0.731. What is the likelihood of the observed value of case 2 under the candidate A coefficients?
0.000  0.269  0.500  0.731  1.000

To compute the likelihood of the entire set of observations under the candidate coefficients, multiply together the likelihoods for all the cases. Do this calculation separately for the candidate A coefficients and the candidate B coefficients. Show your work. Say which of the two candidates gives the bigger likelihood?

In an actual fitting calculation, the computer goes through large numbers of candidate coefficients in a systematic way to find the candidate with the largest possible likelihood: the maximum likelihood candidate. Explain why it makes sense to choose the candidate with the maximize rather than the minimum likelihood.

Prob 16.10. The National Osteoporosis Risk Assessment (NORA)[?] studied about 200,000 postmenopausal women aged 50 years or old in the United States. When entering the study, 14,412 of these women had osteoporosis as defined by a bone-mineral density “T score.” In studying the risk factors for the development of osteoporosis, the researchers fit a logistic regression model.

The coefficients in a logistic regression model can be directly interpreted as the logarithm of an odds ratio — the “log odds ratio.” In presenting results from logistic regression, it’s common to exponentiate the coefficients, that is, to compute ecoef to produce a simple odds ratio.

The table below shows the coefficients in odds ratio form from the NORA model. There were many explanatory variables in the model: Age group, years since menopause, health status, etc. All of these were arranged to be categorical variables, so there is one coefficient for each level of each variable. As always, one level of each variable serves as a reference level. For instance, in the table below, the age group 50-54 is the reference level. In the table below, the odds ratio for the reference level is always given as 1.00. The other odds ratios are always with respect to this reference. So, women in the 55-59 age group have odds of having osteoporosis that are 1.79 time bigger than women in the 50-54 age group. In contrast, women who are 6-10 years since menopause have odds of having osteoporosis that are 0.79 as big as women who are 5 years since menopause.

An odds ratio of 1 means that the group has the same probability value as the reference group. Odds ratios bigger than 1 mean the group is more likely to have osteoporosis than the reference group; odds ratios smaller than 1 mean the group is less likely to have the condition.

The 95% confidence interval on the odds ratio indicates the precision of the estimate from the available data. When the confidence interval for a coefficient includes 1.00, the null hypothesis that the population odds ratio is 1 cannot be rejected at a 0.05 significance level. For example, the odds ratio for a self-rated health status level of “very good” is 1.04 compared to those in “excellent” health. But the confidence interval, 0.97 to 1.13, includes 1.00, indicating that the evidence is weak that women in very good health have a different risk of developing osteoporosis compared to women in excellent health.

For some variables, e.g., “college education or higher,” no reference level is given. This is simply because the variable has just two levels. The other level serves as the reference.

 Age group (years) Odds Ratio (95% CI) 50-54 1.00 (Referent) 55-59 1.79 (1.56-2.06) 60-64 3.84 (3.37-4.37) 65-69 5.94 (5.24-6.74) 70-74 9.54 (8.42-10.81) 75-79 14.34 (12.64-16.26) ≥80 22.56 (19.82-25.67) Years since menopause Odds Ratio (95% CI) ≤ 5 1.00 (Referent) 6-10 0.79 (0.70-0.89) 11-15 0.83 (0.76-0.91) 16-20 0.96 (0.89-1.03) 21-25 1.01 (0.95-1.08) 26-30 1.02 (0.95-1.09) 31-35 1.10 (1.03-1.19) 36-40 1.14 (1.05-1.24) ≥41 1.24 (1.14-1.35) College educ or higher Odds Ratio (95% CI) 0.91 (0.87-0.94) Self-rated health status Odds Ratio (95% CI) Excellent 1.00 (Referent) Very good 1.04 (0.97-1.13) Good 1.23 (1.14-1.33) Fair/poor 1.62 (1.50-1.76)
 Fracture history Odds Ratio (95% CI) Hip 1.96 (1.75-2.20) Wrist 1.90 (1.77-2.03) Spine 1.34 (1.17-1.54) Rib 1.43 (1.32-1.56) Maternal history of osteoporosis Odds Ratio (95% CI) 1.08 (1.01-1.17) Maternal history of fracture Odds Ratio (95% CI) 1.16 (1.11-1.22) Race/ethnicity Odds Ratio (95% CI) White 1.00 (Referent) African American 0.55 (0.48-0.62) Native American 0.97 (0.82-1.14) Hispanic 1.31 (1.19-1.44) Asian 1.56 (1.32-1.85)
 Body mass index, kg/m2 Odds Ratio (95% CI) ≤ 23 1.00 (Referent) 23.01-25.99 0.46 (0.44-0.48) 26.00-29.99 0.27 (0.26-0.28) ≥ 30 0.16 (0.15-0.17) Current medication use Odds Ratio (95% CI) Cortisone 1.63 (1.47-1.81) Diuretics 0.81 (0.76-0.85) Estrogen use Odds Ratio (95% CI) Former 0.77 (0.73-0.80) Current 0.27 (0.25-0.28) Cigarette smoking Odds Ratio (95% CI) Former 1.14 (1.10-1.19) Current 1.58 (1.48-1.68) Regular Exercise Odds Ratio (95% CI) Regular 0.86 (0.82-0.89) Alcohol use, drinks/wk Odds Ratio (95% CI) None 1.00 (Referent) 1-6 0.85 (0.80-0.90) 7-13 0.76 (0.69-0.83) ≥ 14 0.62 (0.54-0.71) Technology Odds Ratio (95% CI) Heel x-ray 1.00 (Referent) Forearm x-ray 2.86 (2.75-2.99) Finger x-ray 4.86 (4.56-5.18) Heel ultrasound 0.79 (0.70-0.90)

Since all the variables were included simultaneously in the model, the various coefficients can be interpreted as indicating partial change: the odds ratio comparing the given level to the reference level for each variable, adjusting for all the other variables as if they had been held constant.

• For which ethnicity are women least likely to have osteoporosis?
White  African.American  Native.American  Hispanic  Asian
• Is regular exercise (compared to no regular exercise) associated with a greater or lesser risk of having osteoporosis?
greater  lesser  same
• Is current cigarette smoking (compared to never having smoked) associated with a greater or lesser risk of having osteoporosis?
greater  lesser  same
• The body mass index (BMI) is a measure of overweight. For adults, a BMI greater than 25 is considered overweight (although this is controversial) and a BMI greater than 30 is considered “obese.” Are women with BMI 30 (compared to those with BMI < 23) at greater, lesser, or the same risk of having osteoporosis?
greater  lesser  same
• There are different technologies for detecting osteoporosis. Since the model adjusts for all the other risk factors, it seems fair to interpret the risk ratios for the different technologies as indicating how sensitive each technology is in detecting osteoporosis.

Which technology is the most sensitive?

heel.x-ray  forearm.x-ray  finger.x-ray  heel.ultrasound

Which technology is the least sensitive?

heel.x-ray  forearm.x-ray  finger.x-ray  heel.ultrasound
• In combining the odds ratios of multiple variables, you can multiply the individual odds ratios. For instance, the odds of a woman in very good health with a body mass index of 24 is 1.04 × 0.46 as large as a woman in excellent health with a BMI of < 23 (the reference levels for the variables involved). (If log odds ratios were used, rather than the odds ratios themselves, the values would be added, not multiplied.)

What is the odds ratio of a women having osteoporosis who is in fair/poor health, drinks 7-13 drinks per week, and is Asian?

 A 0.76 × 0.27 × 1.00 B 1.62 × 0.27 × 1.56 C 1.62 × 0.76 × 1.56 D 0.76 × 1.62 × 1.00

• The two variables “age” and “years since menopause” are likely to be somewhat collinear. Explain why. What effect might this collinearity have on the width of the confidence intervals for the various variables associated with those variables? If you were recommending to remove one of the variables in the list of potential risk factors, which one would it be?

Notice that the table gives no intercept coefficient. The intercept corresponds to the probability of having osteoporosis when belonging to the reference level of each of the explanatory variables. Without knowing this, you cannot use the coefficients calculate the absolute risk of osteoporosis in the different conditions. Instead, the odds ratios in the table tell about relative risk. Gigerenzer [??] points out that physicians and patients often have difficulty interpreting relative risks and encourages information to be presented in absolute terms.

To illustrate, in the group from whom the NORA subjects was drawn, the absolute risk of osteoporosis was 72 in 1000 patients. This corresponds to an odds of osteoporosis of 72(1000 - 72) = 0.776. Now consider a woman taking cortisone. According to the table, this increases her odds of osteoporosis by a factor of 1.63, to 0.776 × 1.63 = 0.126. Translating this back into an absolute risk means converting from odds into probability. The probability will be 0.126(1 + 0.126) = 0.112, or, in other words, an absolute risk of 112 in 1000 patients.

Now suppose the woman was taking cortisone to treat arthritis. Knowing the absolute risk (an increase of 40 women per 1000) puts the woman and her physician in a better position to compare the positive effects of cortisone for arthritis to the negative effects in terms of osteoporosis.

[This problem is based on an item used in a test of the statistical expertise of medical residents reported in [?].]

Prob 16.11. The National Osteoporosis Risk Assessment (NORA)[?] studied about 200,000 postmenopausal women aged 50 years or old in the United States. When entering the study, 14,412 of these women had osteoporosis as defined by a bone-mineral density “T score.” In studying the risk factors for the development of osteoporosis, the researchers fit a logistic regression model.

The coefficients in a logistic regression model can be directly interpreted as the logarithm of an odds ratio — the “log odds ratio.” In presenting results from logistic regression, it’s common to exponentiate the coefficients, that is, to compute ecoef to produce a simple odds ratio.

The table below shows the coefficients in odds ratio form from the NORA model. There were many explanatory variables in the model: Age group, years since menopause, health status, etc. All of these were arranged to be categorical variables, so there is one coefficient for each level of each variable. As always, one level of each variable serves as a reference level. For instance, in the table below, the age group 50-54 is the reference level. In the table below, the odds ratio for the reference level is always given as 1.00. The other odds ratios are always with respect to this reference. So, women in the 55-59 age group have odds of having osteoporosis that are 1.79 time bigger than women in the 50-54 age group. In contrast, women who are 6-10 years since menopause have odds of having osteoporosis that are 0.79 as big as women who are 5 years since menopause.

An odds ratio of 1 means that the group has the same probability value as the reference group. Odds ratios bigger than 1 mean the group is more likely to have osteoporosis than the reference group; odds ratios smaller than 1 mean the group is less likely to have the condition.

The 95% confidence interval on the odds ratio indicates the precision of the estimate from the available data. When the confidence interval for a coefficient includes 1.00, the null hypothesis that the population odds ratio is 1 cannot be rejected at a 0.05 significance level. For example, the odds ratio for a self-rated health status level of “very good” is 1.04 compared to those in “excellent” health. But the confidence interval, 0.97 to 1.13, includes 1.00, indicating that the evidence is weak that women in very good health have a different risk of developing osteoporosis compared to women in excellent health.

For some variables, e.g., “college education or higher,” no reference level is given. This is simply because the variable has just two levels. The other level serves as the reference.

 Age group (years) Odds Ratio (95% CI) 50-54 1.00 (Referent) 55-59 1.79 (1.56-2.06) 60-64 3.84 (3.37-4.37) 65-69 5.94 (5.24-6.74) 70-74 9.54 (8.42-10.81) 75-79 14.34 (12.64-16.26) ≥80 22.56 (19.82-25.67) Years since menopause Odds Ratio (95% CI) ≤ 5 1.00 (Referent) 6-10 0.79 (0.70-0.89) 11-15 0.83 (0.76-0.91) 16-20 0.96 (0.89-1.03) 21-25 1.01 (0.95-1.08) 26-30 1.02 (0.95-1.09) 31-35 1.10 (1.03-1.19) 36-40 1.14 (1.05-1.24) ≥41 1.24 (1.14-1.35) College educ or higher Odds Ratio (95% CI) 0.91 (0.87-0.94) Self-rated health status Odds Ratio (95% CI) Excellent 1.00 (Referent) Very good 1.04 (0.97-1.13) Good 1.23 (1.14-1.33) Fair/poor 1.62 (1.50-1.76)
 Fracture history Odds Ratio (95% CI) Hip 1.96 (1.75-2.20) Wrist 1.90 (1.77-2.03) Spine 1.34 (1.17-1.54) Rib 1.43 (1.32-1.56) Maternal history of osteoporosis Odds Ratio (95% CI) 1.08 (1.01-1.17) Maternal history of fracture Odds Ratio (95% CI) 1.16 (1.11-1.22) Race/ethnicity Odds Ratio (95% CI) White 1.00 (Referent) African American 0.55 (0.48-0.62) Native American 0.97 (0.82-1.14) Hispanic 1.31 (1.19-1.44) Asian 1.56 (1.32-1.85)
 Body mass index, kg/m2 Odds Ratio (95% CI) ≤ 23 1.00 (Referent) 23.01-25.99 0.46 (0.44-0.48) 26.00-29.99 0.27 (0.26-0.28) ≥ 30 0.16 (0.15-0.17) Current medication use Odds Ratio (95% CI) Cortisone 1.63 (1.47-1.81) Diuretics 0.81 (0.76-0.85) Estrogen use Odds Ratio (95% CI) Former 0.77 (0.73-0.80) Current 0.27 (0.25-0.28) Cigarette smoking Odds Ratio (95% CI) Former 1.14 (1.10-1.19) Current 1.58 (1.48-1.68) Regular Exercise Odds Ratio (95% CI) Regular 0.86 (0.82-0.89) Alcohol use, drinks/wk Odds Ratio (95% CI) None 1.00 (Referent) 1-6 0.85 (0.80-0.90) 7-13 0.76 (0.69-0.83) ≥ 14 0.62 (0.54-0.71) Technology Odds Ratio (95% CI) Heel x-ray 1.00 (Referent) Forearm x-ray 2.86 (2.75-2.99) Finger x-ray 4.86 (4.56-5.18) Heel ultrasound 0.79 (0.70-0.90)

Since all the variables were included simultaneously in the model, the various coefficients can be interpreted as indicating partial change: the odds ratio comparing the given level to the reference level for each variable, adjusting for all the other variables as if they had been held constant.

• For which ethnicity are women least likely to have osteoporosis?
White  African.American  Native.American  Hispanic  Asian
• Is regular exercise (compared to no regular exercise) associated with a greater or lesser risk of having osteoporosis?
greater  lesser  same
• Is current cigarette smoking (compared to never having smoked) associated with a greater or lesser risk of having osteoporosis?
greater  lesser  same
• The body mass index (BMI) is a measure of overweight. For adults, a BMI greater than 25 is considered overweight (although this is controversial) and a BMI greater than 30 is considered “obese.” Are women with BMI 30 (compared to those with BMI < 23) at greater, lesser, or the same risk of having osteoporosis?
greater  lesser  same
• There are different technologies for detecting osteoporosis. Since the model adjusts for all the other risk factors, it seems fair to interpret the risk ratios for the different technologies as indicating how sensitive each technology is in detecting osteoporosis.

Which technology is the most sensitive?

heel.x-ray  forearm.x-ray  finger.x-ray  heel.ultrasound

Which technology is the least sensitive?

heel.x-ray  forearm.x-ray  finger.x-ray  heel.ultrasound
• In combining the odds ratios of multiple variables, you can multiply the individual odds ratios. For instance, the odds of a woman in very good health with a body mass index of 24 is 1.04 × 0.46 as large as a woman in excellent health with a BMI of < 23 (the reference levels for the variables involved). (If log odds ratios were used, rather than the odds ratios themselves, the values would be added, not multiplied.)

What is the odds ratio of a women having osteoporosis who is in fair/poor health, drinks 7-13 drinks per week, and is Asian?

 A 0.76 × 0.27 × 1.00 B 1.62 × 0.27 × 1.56 C 1.62 × 0.76 × 1.56 D 0.76 × 1.62 × 1.00

• The two variables “age” and “years since menopause” are likely to be somewhat collinear. Explain why. What effect might this collinearity have on the width of the confidence intervals for the various variables associated with those variables? If you were recommending to remove one of the variables in the list of potential risk factors, which one would it be?

Notice that the table gives no intercept coefficient. The intercept corresponds to the probability of having osteoporosis when belonging to the reference level of each of the explanatory variables. Without knowing this, you cannot use the coefficients calculate the absolute risk of osteoporosis in the different conditions. Instead, the odds ratios in the table tell about relative risk. Gigerenzer [??] points out that physicians and patients often have difficulty interpreting relative risks and encourages information to be presented in absolute terms.

To illustrate, in the group from whom the NORA subjects was drawn, the absolute risk of osteoporosis was 72 in 1000 patients. This corresponds to an odds of osteoporosis of 72(1000 - 72) = 0.776. Now consider a woman taking cortisone. According to the table, this increases her odds of osteoporosis by a factor of 1.63, to 0.776 × 1.63 = 0.126. Translating this back into an absolute risk means converting from odds into probability. The probability will be 0.126(1 + 0.126) = 0.112, or, in other words, an absolute risk of 112 in 1000 patients.

Now suppose the woman was taking cortisone to treat arthritis. Knowing the absolute risk (an increase of 40 women per 1000) puts the woman and her physician in a better position to compare the positive effects of cortisone for arthritis to the negative effects in terms of osteoporosis.

[This problem is based on an item used in a test of the statistical expertise of medical residents reported in [?].]

Prob 16.12. The concept of residuals does not cleanly apply to yes/no models because the model value is a probability (of a yes outcome), whereas the actual observation is the outcome itself. It would be silly to try to compute a difference between “yes” and a probability like 0.8. After all, what could it mean to calculate (yes - 0.8)2?

In fitting ordinary linear models, the criterion used to select the best coefficients for any given model design is “least squares,” minimizing the sum of square residuals. The corresponding criterion in fitting yes/no models (and many other types of models) is “maximum likelihood.”

The word “likelihood” has a very specific and technical meaning in statistics, it’s not just a synonym for “chance” or “probability.” A likelihood is the probability of the outcome according to a specific model.

To illustrate, here is an example of some yes-no observations and the model values of two different models.

 Model A Model B Observed Case p(Yes) p(No) p(Yes) p(No) Outcome 1 0.7 0.3 0.4 0.6 Yes 2 0.6 0.4 0.8 0.2 No 3 0.1 0.9 0.3 0.7 No 4 0.5 0.5 0.9 0.1 Yes

Likelihood always refers to a given model, so there are two likelihoods here: one for Model A and another for Model B. The likelihood for each case under Model A is the probability of the observed outcome according to the model. For example, the likelihood under Model A for case 1 is 0.7, because that is the model value of the observed outcome “Yes” for that case. The likelihood of case 2 under Model A is 0.4 — that is the probability of “No” for case 2 under model A.

• What is the likelihood under Model A for case 3?
0.1  0.3  0.5  0.7  0.9
• What is the likelihood under Model B for case 3?
0.1  0.3  0.5  0.7  0.9
• What is the likelihood under Model A for case 4?
0.1  0.3  0.5  0.7  0.9
• What is the likelihood under Model B for case 4?
0.1  0.3  0.5  0.7  0.9

The likelihood for the whole set of observations combines the likelihoods of the individual cases: multiply them all together. This is justified if the cases are independent of one another, as is usually assumed and sensible if the cases are the result of random sampling or random assignment to an experimental treatment.

• What is the likelihood under Model A for the whole set of cases?

 A 0.3 × 0.4 × 0.9 × 0.5 B 0.7 × 0.6 × 0.9 × 0.5 C 0.3 × 0.4 × 0.1 × 0.5 D 0.7 × 0.4 × 0.9 × 0.5 E 0.7 × 0.4 × 0.1 × 0.5

• What is the likelihood under Model B for the whole set of cases?

 A 0.4 × 0.8 × 0.3 × 0.9 B 0.4 × 0.2 × 0.3 × 0.9 C 0.6 × 0.2 × 0.3 × 0.9 D 0.4 × 0.2 × 0.7 × 0.9 E 0.4 × 0.2 × 0.3 × 0.1