24 Optimization

To “optimize” means to make something as good as possible with the available resources. Optimization problems are common in science, logistics, industry, and any other area where one seeks the best solution to a problem. Some everyday examples:

When to harvest trees being grown for lumber. Harvest too soon and you might be losing out on the prime growing years. On the other hand, wait too long and trees will show slow or zero growth.
Walking up too steep a slope is tiring and slows you down; that is why hiking trails have switchbacks. However, when the switchbacks are too shallow, it takes a long time to cover the distance. What’s the most efficient angle to enable hikers to get up the hill in the shortest time.
How much salt to add to a stew. Stews can be too salty, or they can be not salty enough. Somewhere in the middle is the optimum.

24.1 Structure of the problem

In an optimization problem, there is one or more input quantities whose value you have to choose. The amount of salt; the years to wait from planting to harvesting a tree; the angle of the trail with respect to the slope. We will call this the decision quantity.

Similarly, there is one or more output quantity that you value and want to make as good as possible. The taste of the stew; the amount of usable wood harvested; the time it takes to walk up the hill. The output quantities are called the objectives.

This chapter deals with optimization problems that involve only a single objective. Problems with multiple objectives are among the most interesting and important in real-world decision making. Single-objective optimization techniques are a component of the more complex decision making, but they are a good place to get started.

The model that relates inputs to the objective output is the objective function. Solving an optimization problem—once the modeling phase is complete—amounts to finding a value for the decision quantity (the input to the objective function) that produces the best output from the objective function.

Sometimes the objective is something that you want to minimize, make as small as possible. For instance, in the hiking trail problem, we seek to minimize the time it takes to walk up the trail. Sometimes you want to maximize the objective, as in the wood-harvest problem where the objective is to harvest the most wood per year.

Mathematically, maximization and minimization are the same thing. Every minimization problem can be turned into a maximization problem by putting a negative sign in front of the objective function. To simplify the discussion, in talking about finding the solution to an optimization problem we will imagine that the goal is to maximize. But keep in mind that many circumstances in the real world, “best” can mean minimization.

Recall from Section 6.7 that there are two components to the task of maximization or minimization. The argmax is the input to the objective function which produces the largest output. The maximum is the value of that output.¹ Argmin and minimum are the words used in a situation where you seek the smallest value of the objective function.

Once you have found the argmax you can plug that value into the objective function to find the output value. That value is the maximum.

People often talk about “finding the maximum.” This is misleading. Instead, the idea is to find the input to the objective function—that is, the argmax—that produces the maximum output.

To illustrate the setup of an optimization problem, imagine yourself in the situation of a contest to see who can shoot a tennis ball the farthest into a field with a slingshot. During the contest, you will adjust the vertical angle of launch, place the ball into the slingshot’s cradle, pull back as far as possible, and let go. To win the contest, you need to optimize how you launch the ball.

The objective is to maximize the distance traveled by the ball. The objective function models the distance travelled as a function of the quantities you can control, for instance the vertical angle of launch or the amount by which you pull back the slingshot. For simplicity, we will imagine that the slingshot is pulled back by a standard amount, producing a velocity of the ball at release of $v_{0}$ . You will win or lose based on the angle of launch you choose.

Before you head out into the field to experiment, let’s prepare by constructing the objective function. Using some principles of physics and mathematics (which you may not yet understand), we will model how far the ball will travel (horizontally) as a function of the angle of launch $θ$ and the initial velocity $v_{0}$ .

The mathematics of such problems involves an area called differential equations, an important part of calculus which we will come to later in the course. Since you don’t have the tools yet, we will just state a simple model of how long the ball stays in the air.

$duration (v_{0}, θ) = 2 v_{0} \sin (θ) / g$ $g$ is the acceleration due to gravity, which is about $9.8 m s^{- 2}$ , assuming that the contest is being held on Earth.

The horizontal distance travelled by the tennis ball will be $hdist (v_{0}, θ) = \cos (θ) v_{0} duration (v_{0}, θ) = 2 v_{0}^{2} \cos (θ) \sin (θ) / g$ Our objective function is hdist(), and we seek to find the argmax. The input $v_{0}$ is (we have assumed) fixed, so the only decision quantity is the angle $θ$ .

The best choice of $θ$ will make the quantity $\cos (θ) \sin (θ)$ as large as possible. So in finding the argmax, we don’t need to be concerned with $v_{0}$ or $g$ .

Finding the argmax can be accomplished simply by plotting the function $\cos (θ) \sin (θ)$ . We will implement the function so that the input is in units of degrees.

Figure 24.1: In the simple model of a tennis ball launched at an angle $θ$ from the horizontal, the distance travelled is $2 v_{0}^{2} / g$ times $\cos (θ) \sin (θ)$ .

You can see that the maximum value is about 0.5 and that this occurs at an argmax $θ$ that is a little bit less than 50 $^{\circ}$ .

Zooming in on the $θ$ axis let’s you find the argmax with more precision:

Figure 24.2: Zooming in on the argmax of the objective function. It is important to look at the scale of the vertical axis. Any value of $θ$ between about 40 and 50 gives a close approximation to the maximum.

From the graph, especially the zoomed-in version, you can read off the argmax as $θ = 45^{\circ}$ .

Finding the argmax solves the problem. You may also want to present your solution by reporting the value of hdist() when the argmax is given as input. You can read off the graph that the maximum of $\cos (θ) \sin (θ)$ is 0.5 at $θ = 45^{\circ}$ , so overall the distance will be $v_{0}^{2} / g$

24.2 Interpreting the argmax

The graphical solution given to the slingshot problem is entirely satisfactory. Whether that solution will win the contest depends on whether the model we built for the objective function is correct. We have left out, for instance, air resistence, which is potentially important.

Solving the optimization problem has prepared us to test the result in the field. Perhaps we will find that the real-world optimum angle is somewhat steeper or shallower than $θ = 45^{\circ}$ .

Besides the argmax, another important quantity to read from the graph in Figure 24.1 is the precision of the argmax. In strict mathematical terms, the argmax for the tennis-ball problem is exactly 45 degrees at which point $\cos (θ) \sin (θ) = 0.5$ . Suppose, however, that the ball were launched at only 40 degrees. Five degrees difference is apparent to the eye, but the result will be essentially the same as for 45 degrees: $\cos (θ) \sin (θ) = 0.492$ . The same is true for a launch angle of 50 degrees. For both “sub-optimal” launch angles, the output is within 2 percent of the 45-degree result. It is easy to imagine that a factor outside the scope of the simple model—the wind, for instance—could change the result by as much or more than 2 percent, so a practical report of the argmax should reasonable be “40 to 50 degrees” rather than “exactly 45 degrees.”

Contests are won or lost by margins of less than 1%, so you should not casually deviate from the argmax. On the other hand, $45^{\circ}$ is the argmax of the model. Reality may deviate from the model. For instance, suppose that air resistance or wind might might affect distance by 1%. That is. the real-world result might deviate by as much as 1% of the model value. If so, we shouldn’t expect the real-world argmax to be any closer to 45 $^{\circ}$ than $\pm 5^{\circ}$ ; anywhere in that domain interval generates an output that is within 1% of the maximum output for the model.

24.3 Derivatives and optimization

We are now going to reframe the search for the argmax and its interpretation in terms of derivatives of the objective function with respect to the decision quantity ( $θ$ in the slingshot problem). For a function with one input, this will not be an improvement from the look-at-the-graph technique to find the argmax. However, a genuine reason to use derivatives is to set us up in the future to solve problems with more than one input, where it is hard to draw or interpret a graph. Also, describing functions in the language of derivatives can help us think more clearly about aspects of the problem, such as the precision of the argmax.

With a graph such as Figure 24.1, it is easy to find the argmax; common sense carries the day. So it won’t be obvious at first why we will take the following approach:

Let’s denote an argmax of the objective function $f (x)$ by $x^{⋆}$ . Let’s look at the derivative $\partial_{x} f (x)$ in the neighborhood of $x^{⋆}$ . Referring to Figure 24.1, where $x^{⋆} = 45^{\circ}$ , you may be able to see that $\partial_{x} f (x^{⋆})$ is zero; the line tangent to the function’s graph at $x^{⋆}$ is horizontal.

Seen another way, the slope of $f (x)$ to the left of $x^{⋆}$ is positive. Moving a tiny bit to the right (that is, increasing $x$ by a very small amount) increases the output $f (x)$ . On the other hand, just to the right of $x^{⋆}$ , the slope of $f (x)$ is negative; as you reach the top of a hill and continue on, you will be going downhill. So the derivative function is positive on one side of $x^{⋆}$ and negative on the other, suggesting that it crosses zero at the argmax.

Common sense is correct: Walk uphill to get to the peak, walk downhill to move away from the peak. At the top of a smooth hill, the terrain is level. (Since our modeling functions are smooth, so must be the hills that we visualize the functions with.)

Inputs $x^{⋆}$ such that $\partial_{x} f (x^{⋆}) = 0$ are called critical points. Why not call them simply argmaxes? Because a the slope will also be zero at an argmin. And it is even possible to have the slope be zero at a point that is neither an argmin or an argmax.

At this point, we know that values $x^{⋆}$ that give $\partial_{x} f (x^{⋆}) = 0$ are “critical points,” but we haven’t said how to figure out whether a given critical point is an argmax, an argmin, or neither. This is where the behavior of $\partial_{x} f (x)$ near $x = x^{⋆}$ is important. If $x^{⋆}$ is an argmax, then $\partial_{x} f (x)$ will be positive to the left of $x^{⋆}$ and negative to the right of $x^{⋆}$ ; walk up the hill to get to $x^{⋆}$ , at the top the hill is flat, and just past the top the hill has a negative slope.

For an argmin, changing $x$ from less than $x^{⋆}$ to greater than $x^{⋆}$ ; you will be walking down into the valley, then level at the very bottom $x = x^{⋆}$ , then back up the other side of the valley after you pass $x = x^{⋆}$ . Figure 24.3 shows the situation.

Figure 24.3: Top row: An objective function near an argmax (left) and an argmin (right). Bottom row: The derivative of the objective function. A horizontal line (orange) marks zero on the vertical axis.

The bottom row of graphs in Figure 24.3 shows the derivative of the objective function $f (x)$ , that is, $\partial_{x} f (x)$ . You can see that for the argmax of $f (x)$ , the derivative $\partial_{x} f (x)$ is positive to the left and negative to the right. Similarly, near the argmin of $f (x)$ , the derivative $\partial_{x} f (x)$ is negative to the left and positive to the right.

Stated another way, the derivative $\partial_{x} f (x)$ has a negative slope just to the left of an argmin and a positive slope to the left of an argmax.

The second derivative of the objective function $f (x)$ at a critical point $x^{⋆}$ is what tells us whether the critical point is an argmax, an argmin, or neither.

Critical point $x^{⋆}$	$\partial_{x} f (x^{⋆})$	$\partial_{x x} f (x^{⋆})$
argmax	0	negative
argmin	0	positive
neither	0	0

Throughout Block 2, we have translated features of functions that are evident on a graph into the language of derivatives:

The slope of a function $f (x)$ at any input $x$ is the value of the derivative function $\partial_{x} f (x)$ at that same $x$ .
The concavity of a function $f (x)$ at any input is the slope of the derivative function, that is, $\partial_{x x} f (x)$ .
Putting (i) and (ii) together, we get that the concavity of a function $f (x)$ at any input $x$ is the value of the second derivative function, that is, $\partial_{x x} f (x)$ .
At an argmax $x^{⋆}$ of $f (x)$ , the value of the derivative function $\partial_{x} f (x^{⋆})$ is zero and the value of the second derivative function $\partial_{x x} f (x^{⋆})$ is negative. The situation at an argmin is along the same lines, the derivative of the objective function is zero and the second derivative is positive.

What’s the critical point?

You’re familiar with the quadratic polynomial: $g (x) = a_{0} + a_{1} x + a_{2} x^{2}$ The graph of a quadratic polynomial is a parabola, which might be concave up or concave down. A parabola has only one critical point, which might be an argmin or an argmax.

Let’s find the critical point. We know that the critical point is $x^{⋆}$ such that $\partial_{x} g (x^{⋆}) = 0$ . Since we know how to differentiate a power law, we can see that $\partial_{x} g (x) = a_{1} + 2 a_{2} x$ and, more specifically, at the critical point $x^{⋆}$ the derivative will be

$a_{1} + 2 a_{2} x^{⋆} = 0$

The above is an equation, not a definition. It says that whatever $x^{⋆}$ happens to be, the quantity $a_{1} + 2 a_{2} x^{⋆}$ must be zero. Using plain old algebra, we can find the location of the critical point $x^{⋆} = - \frac{a_{1}}{2 a_{2}}$

Math in the World: The demand “curve”

In economics, a monopoly or similar arrangement can set the price for a good or commodity. Monopolists can set the price at a level that generates the most income for themselves.

Figure 24.4: Demand as a *function* of price, as first published by Antoine-Augustin Cournot in 1836. Source)

In 1836, early economist Antoine-Augustin Cournot published a theory of revenue versus demand based on his conception that demand will be a monotonically decreasing function of price. (That is, higher price means lower demand.) we will write as $Demand (p)$ demand as a function of price.

The revenue generated at price $p$ is $R (p) \equiv p Demand (p)$ : price times demand.

To find the revenue-maximizing demand, differentiate $R (p)$ with respect to $p$ and find the argmax $p^{⋆}$ at with $\partial_{p} R (p^{⋆}) = 0) .$ This can be done with the product rule.

$\partial_{p} R (p) = p \partial_{p} Demand (p) + Demand (p)$

At the argmax $p^{⋆}$ we have: $p^{⋆} \partial_{p} Demand (p^{⋆}) + Demand (p^{⋆}) = 0 \overset{solving for p^{⋆}}{⟹} p^{⋆} = - \frac{Demand (p^{⋆})}{\partial_{p} Demand (p^{⋆})}$

If the monopolist knows the demand function $D (p)$ , finding the revenue maximizing price is a simple matter. But in general, the monopolist does not know the demand function in advance. Instead, an informed guess is made to set the initial price $p_{0}$ . Measuring sales $D (p_{0})$ gives one point on the demand curve. Then, try another price $p_{1}$ . This gives another point on the demand curve as well as an estimate

$\partial_{p} D (p_{0}) = \frac{D (p_{1}) - D (p_{0})}{p_{1} - p_{0}}$

Now the monopolist is set to model the demand curve as a straight-line function and easily to find $p^{⋆}$ for the model. For instance, if the demand function is modeled as $D_{1} (p) = a + b p$ , the optimal price will be $p_{1}^{⋆} = - \frac{a + b p^{⋆}}{b}$ which can be solved as $p_{1}^{⋆} = - a / 2 b$ .

$p_{1}^{⋆}$ is just an estimate of the optimum price. Still, the monopolist can try out that price, giving a third data point for the demand function. The new data can lead to a better model of the demand function. With the better estimate, find a new a argmax $p_{2}^{⋆}$ . This sort of iterative process for finding an argmax of a real-world function is very common in practice.

24.4 Be practical!

Decision making is about choosing among alternatives. In some engineering or policy contexts, this can mean finding a value for an input that will produce the “best” outcome. For those who have studied calculus, it is natural to believe that calculus-based techniques for optimization are the route to making the decision.

We emphasize that the optimization techniques covered in this chapter are only part of a broader set of techniques for real-world decision-making problems. In particular, most policy contexts involve multiple objectives. For example, in designing a car one goal is to make it cheap to manufacture, another to make it attractive, and still another to make it safe. These different objectives are often at odds with one another. In Block 4 of this text, we will discuss some calculus techniques that help policy-makers in multi-objective settings.

For now, sticking with the idealized (and often unrealistic) setting of maximizing a single objective, with one or more inputs. Recall the setting for calculus-type maximization. You have a function with one or more inputs, say, $f (x)$ or $g (x, y)$ or, often, $h (x, y, z, \dots)$ where $\dots$ might be standing for tens or hundreds or thousands of inputs or more.

If you can graph the function (feasible for one- or two-input functions), you can often easily scan the graph by eye to find the peak. The basis of the calculus techniques is the observation that, at the argmax of a smooth function, the derivative of the function is 0.

For example, consider a style problem that often appears in calculus textbooks. Suppose you have been tasked to design a container for a large volume V of liquid. The design specifications call for the weight of the container to be as little as possible. (This is a minimization problem, then.) In classical textbook fashion, the specifications might also stipulate that the container is to be a cylinder made out of a particular metal of a particular thickness.

The above is a lovely geometry/calculus problem. Whether it is relevant to any genuine, real-world problem is another question.

Using the notation in the diagram, the volume and surface area of the cylinder is $V (r, h) \equiv π r^{2} h and A (r, h) \equiv 2 π r^{2} + 2 π r h$

Minimizing the weight of the cylinder is our objective (according to the problem statement) and the weight is proportional to the surface area. Since the volume $V$ is given (according to the problem statement), we want to re-write the area function to use volume:

$h (r, V) \equiv V / π r^{2} ⟹ A (r, V) = 2 π r^{2} + 2 π r V / π r^{2} = 2 π r^{2} + 2 V / r$

Suppose $V$ is specified as 1000 liters. A good first step is to choose appropriate units for $r$ to make sure the formula for $A (r, V)$ is dimensionally consistent. Suppose we choose $r$ in cm. Then we want $V$ in cubic centimeters (cc). 1000 liters is 1,000,000 cc. Now we can plot a slice of the area function:

A <- makeFun(2*pi*r^2 + 2*V/r ~ r, V=1000000)
slice_plot(A(r) ~ r, bounds(r=c(10, 100))) %>%
  gf_labs(x = "radius (cm)", y = "Surface area of container (square cm)")

As always, the function’s derivative is zero at the optimal $r$ . In the graph, the argmin is near $r = 50$ cm at which point the minimum is about 50,000 cm $^{2}$ . Since $h (r, V) = V / π r^{2}$ , the required height of cylinder will be near $10^{6} / π 50^{2} = 127$ cm.

In calculus courses, the goal is often to find a formula for the optimal radius as a function of $V$ . So we differentiate the objective function—that is, the area function for any $V$ and $r$ with respect to $r$ ,

$\partial_{r} A (r, V) = 4 π r - 2 V / r^{2}$

Setting this to zero (which will be true at the optimal $r^{⋆}$ ) we can solve for $r^{⋆}$ in terms of $V$ :

$4 π r^{⋆} - 2 \frac{V}{{[r^{⋆}]}^{2}} = 0 ⟹ 4 π r^{⋆} = 2 \frac{V}{{[r^{⋆}]}^{2}} ⟹ {[r^{⋆}]}^{3} = \frac{1}{2 π} V ⟹ r^{⋆} = \sqrt[3]{V / 2 π}$

For $V = 1, 000, 000$ cm $^{3}$ , this gives $r^{⋆} = 54.1926$ cm which in turn implies that the corresponding height $h^{⋆} = V / π (r^{⋆})^{2} = 108.3852$ cm.

We’ve presented the optimum $r^{⋆}$ and $h^{⋆}$ to the nearest micron. (Does that make sense to you? Think about it for a moment before reading on.)

In modeling, a good rule of thumb is this: “If you don’t know what a sensible precision is for reporting your result, you don’t yet have a complete grasp of the problem.” Here are two reasonable ways to sort out a suitable precision.

Solve a closely related problem that would have been equivalent for many practical purposes.
Calculate how much the input can deviate from the argmax while producing a trivial change in the output of the objective function.

Approach (2) is always at hand, since you already know the objective function. Let’s graph the objective function near $r = 54.1926$ …

Look carefully at the axes scales. Deviating from the mathematical optimum by about 5cm (that is, 50,000 microns) produces a change in the output of the objective function by about 400 units out of 55,000. In other words, about 0.7%.

It is true that $r^{⋆} = 54.1926$ cm gives the “best” outcome. And sometimes such precision is warranted. For example, improving the speed of an elite marathon racer by even 0.1% would give her a 7 second advantage: often the difference between silver and gold!

What’s different is that you know exactly the ultimate objective of a marathon: finish faster. But you may not know the ultimate objective of the system your “optimal” tank will be a part of. For instance, your tank may be part of an external fuel pod on an aircraft. Certainly the aircraft designers want the tank to be as light as possible. But they also want to reduce drag as much as possible. A 54 cm diameter tube has about 17% more drag than a 50 cm tube. It is probably well worth increasing weight by 0.7% to save that much drag.

In reporting the results from an optimization problem, you ought to give the decision maker all relevant information. That might be as simple as including the above graph in your report.

We mentioned another technique for getting a handle on what precision is meaningful: (1) solve a closely related problem. It can requires some insight and creativity to frame the new problem. For instance, large capacity tanks often are shaped like a lozenge: a cylinder with hemi-spherical ends.

Using $h$ for the length of the cylindrical portion of the tank, and $r$ for the radius, the volume and surface area are:

$V (r, h) = π r^{2} h + \frac{4}{3} π r^{3} and A (r, h) = 2 π r h + 4 π r^{2}$

Again, $V$ is specified as 1000 liters. As detailed in Exercise 24.18, the surface area of this 1000-liter tank is about 48,400 cm $^{2}$ . This is more than 10% less than for the cylindrical tank.

24.5 Exercises

Exercise 24.02

The simple model of the distance travelled by a tennis ball after launch from a slingshot is $hdist (v_{0}, θ) = 2 v_{0}^{2} \cos (θ) \sin (θ) / g$ where $θ$ is the launch angle, measured from the horizontal, $v_{0}$ is the initial velocity, and $g$ is the acceleration due to gravity.

Part A What is the dimension of $v_{0}$ ?

L T L / T L / T $^{2}$ L $^{2}$ / T $^{2}$

Part B What is the dimension of $g$ ?

M L M L / T L / T $^{2}$ M L / T $^{2}$

Part C What is the dimension of $θ$ ?

L L^2 T/L It is dimensionless

Part D What is the dimension of $\sin (θ$ )?

L L^2 T/L It is dimensionless

Part E What is the dimension of $2 v_{0}^{2} \cos (θ) \sin (θ) / g$ ?

L L^2 L/T L/T $^{2}$

Part F Suppose the initial velocity of the ball is $v_{0} = 10$ meters/second. Since we are on the surface of Earth, $g = 9.8$ meters/second-squared. At the optimal launch angle $θ$ , how far does the model predict the ball will travel?

10 meters 25 meters 50 meters 75 meters

Exercise 24.04

Part A Consider the function $f (x) \equiv x^{3}$ . Confirm that the value of the derivative $\partial_{x} f (x = 0)$ and so $x^{⋆} = 0$ is a critical point. Which sort of critical point is $x^{⋆} = 0$ ? (Hint: Draw the graph of $f (x)$ near $x = 0$ to see what’s going on.)

An argmax An argmin Neither

Part B Still working with the function $f (x) \equiv x^{3}$ , find the value of the second-derivative $\partial_{x x} f (x^{⋆})$ evaluated at the critical point $x = x^{⋆} = 0$ . Which of these is $\partial_{x x} f (x = 0)$ ?

Negative Positive Zero

Exercise 24.06

The graph shows three different functions labeled (A), (B), and (C).

Part A Function (A) is

concave down non-concave concave up

Part B Function (B) is

concave down non-concave concave up

Part C Function (C) is

concave down non-concave concave up

Part D The negative of function (A) is

concave down non-concave concave up

The graph shows a function $wave (t) \equiv \sin (2 π t / 4)$ and labels four input values $t$ .

Part E For what values of the input $t$ is the function concave up?

$t = A$ and $t = D$ $t = A$ and $t = C$ $t = C$ and $t = D$ none of the above

Part F For what values of the input $t$ is the function non-concave?

$t = A$ $t = B$ $t = C$ none of the above

Part G Where is the function steepest?

$t = A$ $t = B$ $t = C$ $t = D$

Exercise 24.08

Here is a smooth function marked at a few points. Your task is, at each point, to estimate the value of the derivative, the sign of the second derivative, and the radius of the circle that would nicely match the function in a small region around each point. (Remember, we are asking for the radius of the circle, which is half the diameter.)

To simplify things, here is a table giving seven different combinations of the quantities you are to estimate. Some of them correctly match one of the labeled points, some do not. All you need to do is choose which is the correct set of quantities for each labeled point.

row	value of 1st deriv	sign of 2nd deriv	radius
i	-0.3	pos	0.25
ii	2.1	near 0	2000
iii	-1.4	neg	12
iv	0.3	neg	0.3
v	2.1	pos	0.1
vi	1.3	neg	3
vii	0.5	pos	1

Part A Which row from the table best matches the function at point A?

i ii iii iv v vi vii

Part B Which row from the table best matches the function at point B?

i ii iii iv v vi vii

Part C Which row from the table best matches the function at point C?

i ii iii iv v vi vii

Part D Which row from the table best matches the function at point D?

i ii iii iv v vi vii

Part E Which row from the table best matches the function at point E?

i ii iii iv v vi vii

Exercise 24.10

You and your pet dog Swimmer often go to the beach and walk along the water’s edge. You throw a ball down the beach, but at an angle so it lands in the water. Swimmer goes to work. She runs down the beach (fast) and then plunges into the water, heading toward the ball. She can run fast on the beach: 400 m/minute. But she swims rather slower: 50 m/min.

Suppose you threw the ball to a point about 50 meters down the beach and 10 meters out in the water. The overall distance to the ball is therefore $\sqrt{50^{2} + 10^{2}} \approx 51$ meters. If Swimmer entered the water immediately, she would take about a minute to reach the ball (51 m / 50 m/min). Swimmer can get to the ball faster by running down the beach a bit and then turning into the water. If Swimmer ran all 50 meters down the beach and then turned to swim the 10 meters, it would take her (50/400 + 10/50) minutes, about one-third of a minute.

Figure 24.5: Swimmer’s optimal path to the ball consists of running $x$ meters along the shore, then swimming diagonally to the ball.

Can Swimmer do better? You can set up the calculation like this. Imagine $x$ to be the distance down the beach that Swimmer runs. The time to run this distance will be $x / 400$ . The distance remaining to the ball can be found by the Pythagorean theorem. One leg of the triangle has length $(50 - x)$ , the other has length 10 m. So, the length of the third side is $\sqrt{(50 - x)^{2} + 10^{2}}$ . For instance, if $x$ were 45, the distance to swim in the water would be $\sqrt{(50 - 45)^{2} + 10^{2}} = 11.2$ m. Divide this distance by 50 m/min to get the time spent in the water.

distance_in_water <- makeFun( __your_pythagorean_formula ~ x)
time_to_ball <- makeFun(x/400 + distance_in_water(x)/50 ~ x)

Time_to_ball() takes one argument, the distance $x$ Swimmer runs down the beach before turning into the water. Use a SANDBOX to find the distance that calculus-savvy Swimmer runs down the beach before turning into the water, if Swimmer’s goal is to get to the ball as fast as possible.

Part A What’s the optimal running distance for Swimmer?

46.75 47.5 48.75 49.75

Here’s a news story about a mathematician’s dog on the shore of Lake Michigan. It is not plausible that Swimmer has been trained in calculus. Perhaps the way Swimmer solves the running distance problem is simply to graph time_to_ball(x) ~ x over a suitable domain and find the argmax by eye!

Exercise 24.12

If you’re skeptical that a dog might do a calculus problem before running to fetch a ball, consider the path taken by a photon. “Fermat’s Principle” is that light takes the path of least time. To illustrate, consider the problem of a photon traveling from a point A to a point B, as in the diagram. The shortest path between the two points is a straight line. Along this straight-line path, the time taken by the photon will be the distance divided by the speed of light.

The diagram shows another path consisting of two segments, one of length $l_{1}$ and the other $l_{2}$ . Obviously, the two-segment path is longer than the straight-line path. But according to Fermat’s principle, light “prefers” the longer path if the time taken to traverse it is shorter. This phenomenon is called refraction.

The reason the indirect path might be shorter is that the speed of light differs in different physical media. Light traveling in a vacuum famously has a speed of about 300,000 km per second. In air, the speed is smaller by a factor of 1/1.003. In water, the speed is smaller still: the factor is 1/1.3.

Imagine that the blue zone of the diagram is water and the clear zone air. The time for the photon to travel from point A to B is proportional to $1.003 l_{1} + 1.3 l_{2}$ .

To see the path taken by light, let’s imagine that point A is $(x = 0, y = 10)$ and point B is $(x = 20, y = - 10)$ , and that the boundary between water and air is at $y = 0$ . We will place the point P at $(x, 0)$ . The total time taken for light to traverse this path is 1.003 dist(A to P) + 1.3 dist(P to B).

Part A Which of these formulas gives the total time it takes for light to traverse the path from A to P at relative speed 1/1.003 and then the path from P to B at relative speed 1/1.3? A is located at $(0, 10)$ , B is located at $(20, - 10)$ , and P is located at $(x, 0)$

$1.003 \sqrt{(x - 0)^{2} + (0 - 10)^{2}} + 1.3 \sqrt{(20 - x)^{2} + (- 10 - 0)^{2}}$
$\sqrt{(x - 0)^{2} + (0 - 10)^{2}} / 1.003 + \sqrt{(20 - x)^{2} + (- 10 - 0)^{2}} / 1.3$
$1.003 \sqrt{(x - 10)^{2} + (0 - 0)^{2}} + 1.3 \sqrt{(- 10 - x)^{2} + (50 - 0)^{2}}$

Implement the calculation of total_time() in R, then use a graph to find the argmin.

total_time <- makeFun( your_formula ~ x)
slice_plot(total_time(x) ~ x, bounds(x=0:20))

# For the next problem
dx_time  <- D(total_time(x) ~ x)
dxx_time <- D(total_time(x) ~ x & x)

Part B What value of $x$ (that is, the argmin) minimizes the travel time of light between points A and B? (Choose the best answer)

10.52 11.02 12.22 12.50 13.21 14.94

Part C Suppose that instead of being water, the blue area was glass. The speed of light in glass is roughly 1/1.5 times as big as in vacuum. What value of $x$ minimizes the travel time of light between points A and B? (Choose the best answer)

13.60 14.58 14.85 15.54

Exercise 24.18

Return to the problem of finding the optimal radius of a cylindrical tank with spherical ends. The point is to choose the sphere radius $r$ and the cylinder length $h$ that minimize the surface area of the tank while enclosing a specified volume. In terms of $h$ and $r$ , the volume and surface area are:

$V (r, h) = π r^{2} h + \frac{4}{3} π r^{3} and A (r, h) = 2 π r h + 4 π r^{2}$

Part A Which of these is correct? (Hint: Only one of the answers is dimensionally consistent.)

$h (r, V) = (V - 4 π r^{3} / 3) / π r^{2}$
$h (r, V) = (V - 4 π r^{2} / 2) / π r^{2}$
$h (r, V) = (V - 4 π r^{3} / 3) / π r^{3}$
$h (r, V) = \sqrt{(V - 4 π r^{3} / 2) / π r^{3}}$

Part B Which of these is the correct expression for $A (r, V)$

$A (r, V) = 2 V / r + \frac{8 π}{3} r^{2}$
$A (r, V) = 2 V / r + \frac{4 π}{3} r^{2}$
$A (r, V) = V / r + \frac{20 π}{3} r^{2}$
$A (r, V) = V / r + 3 π r^{2}$

Part C Find $\partial_{r} A (r, V)$ and set to zero. Solve for $r^{⋆}$ in terms of $V$ . Which of these is correct?

$r^{⋆} = \sqrt[3]{\frac{3}{4 π} V}$
$r^{⋆} = \sqrt[3]{\frac{4}{3 π} V}$
$r^{⋆} = \sqrt[3]{\frac{3 π}{4} V}$
$r^{⋆} = \sqrt[3]{\frac{3}{2 π} V}$

Find the optimum value of $r$ to minimize $A (r, V)$ when $V = 1000$ liters.

Part D What is the optimal value of $r$ in cm to a precision of one micron?

6.2035 46.0351 52.0351 62.0351

Use a sandbox to plot a graph of $A (r, V)$ versus r, setting $V = 1000$ liters.

Part E From the graph of $A (r, V)$ versus $r$ at $V = 1000$ liters, read off a range of $r$ that produces $A$ no worse than 1% greater than the minimum. How wide is that range, approximately?

$\pm 0.1$ cm $\pm 1$ cm $\pm 5$ cm $\pm 10$ cm

Another word for an “input” is “argument.” Argmax is the contraction of argument producing the maximum output.↩︎