24 Optimization
To “optimize” means to make something as good as possible with the available resources. Optimization problems are common in science, logistics, industry, and any other area where one seeks the best solution to a problem. Some everyday examples:
- When to harvest trees being grown for lumber. Harvest too soon and you might be losing out on the prime growing years. On the other hand, wait too long and trees will show slow or zero growth.
- Walking up too steep a slope is tiring and slows you down; that is why hiking trails have switchbacks. However, when the switchbacks are too shallow, it takes a long time to cover the distance. What’s the most efficient angle to enable hikers to get up the hill in the shortest time.
- How much salt to add to a stew. Stews can be too salty, or they can be not salty enough. Somewhere in the middle is the optimum.
24.1 Structure of the problem
In an optimization problem, there is one or more input quantities whose value you have to choose. The amount of salt; the years to wait from planting to harvesting a tree; the angle of the trail with respect to the slope. We will call this the decision quantity.
Similarly, there is one or more output quantity that you value and want to make as good as possible. The taste of the stew; the amount of usable wood harvested; the time it takes to walk up the hill. The output quantities are called the objectives.
This chapter deals with optimization problems that involve only a single objective. Problems with multiple objectives are among the most interesting and important in real-world decision making. Single-objective optimization techniques are a component of the more complex decision making, but they are a good place to get started.
The model that relates inputs to the objective output is the objective function. Solving an optimization problem—once the modeling phase is complete—amounts to finding a value for the decision quantity (the input to the objective function) that produces the best output from the objective function.
Sometimes the objective is something that you want to minimize, make as small as possible. For instance, in the hiking trail problem, we seek to minimize the time it takes to walk up the trail. Sometimes you want to maximize the objective, as in the wood-harvest problem where the objective is to harvest the most wood per year.
Mathematically, maximization and minimization are the same thing. Every minimization problem can be turned into a maximization problem by putting a negative sign in front of the objective function. To simplify the discussion, in talking about finding the solution to an optimization problem we will imagine that the goal is to maximize. But keep in mind that many circumstances in the real world, “best” can mean minimization.
Recall from Section 6.7 that there are two components to the task of maximization or minimization. The argmax is the input to the objective function which produces the largest output. The maximum is the value of that output.1 Argmin and minimum are the words used in a situation where you seek the smallest value of the objective function.
Once you have found the argmax you can plug that value into the objective function to find the output value. That value is the maximum.
People often talk about “finding the maximum.” This is misleading. Instead, the idea is to find the input to the objective function—that is, the argmax—that produces the maximum output.
To illustrate the setup of an optimization problem, imagine yourself in the situation of a contest to see who can shoot a tennis ball the farthest into a field with a slingshot. During the contest, you will adjust the vertical angle of launch, place the ball into the slingshot’s cradle, pull back as far as possible, and let go. To win the contest, you need to optimize how you launch the ball.
The objective is to maximize the distance traveled by the ball. The objective function models the distance travelled as a function of the quantities you can control, for instance the vertical angle of launch or the amount by which you pull back the slingshot. For simplicity, we will imagine that the slingshot is pulled back by a standard amount, producing a velocity of the ball at release of
Before you head out into the field to experiment, let’s prepare by constructing the objective function. Using some principles of physics and mathematics (which you may not yet understand), we will model how far the ball will travel (horizontally) as a function of the angle of launch
The mathematics of such problems involves an area called differential equations, an important part of calculus which we will come to later in the course. Since you don’t have the tools yet, we will just state a simple model of how long the ball stays in the air.
The horizontal distance travelled by the tennis ball will be
The best choice of
Finding the argmax can be accomplished simply by plotting the function
You can see that the maximum value is about 0.5 and that this occurs at an argmax
Zooming in on the
From the graph, especially the zoomed-in version, you can read off the argmax as
Finding the argmax solves the problem. You may also want to present your solution by reporting the value of hdist() when the argmax is given as input. You can read off the graph that the maximum of
24.2 Interpreting the argmax
The graphical solution given to the slingshot problem is entirely satisfactory. Whether that solution will win the contest depends on whether the model we built for the objective function is correct. We have left out, for instance, air resistence, which is potentially important.
Solving the optimization problem has prepared us to test the result in the field. Perhaps we will find that the real-world optimum angle is somewhat steeper or shallower than
Besides the argmax, another important quantity to read from the graph in Figure 24.1 is the precision of the argmax. In strict mathematical terms, the argmax for the tennis-ball problem is exactly 45 degrees at which point
Contests are won or lost by margins of less than 1%, so you should not casually deviate from the argmax. On the other hand,
24.3 Derivatives and optimization
We are now going to reframe the search for the argmax and its interpretation in terms of derivatives of the objective function with respect to the decision quantity (
With a graph such as Figure 24.1, it is easy to find the argmax; common sense carries the day. So it won’t be obvious at first why we will take the following approach:
Let’s denote an argmax of the objective function
Seen another way, the slope of
Common sense is correct: Walk uphill to get to the peak, walk downhill to move away from the peak. At the top of a smooth hill, the terrain is level. (Since our modeling functions are smooth, so must be the hills that we visualize the functions with.)
Inputs
At this point, we know that values
For an argmin, changing
The bottom row of graphs in Figure 24.3 shows the derivative of the objective function
Stated another way, the derivative
The second derivative of the objective function
Critical point |
||
---|---|---|
argmax | 0 | negative |
argmin | 0 | positive |
neither | 0 | 0 |
Throughout Block 2, we have translated features of functions that are evident on a graph into the language of derivatives:
- The slope of a function
at any input is the value of the derivative function at that same . - The concavity of a function
at any input is the slope of the derivative function, that is, . - Putting (i) and (ii) together, we get that the concavity of a function
at any input is the value of the second derivative function, that is, . - At an argmax
of , the value of the derivative function is zero and the value of the second derivative function is negative. The situation at an argmin is along the same lines, the derivative of the objective function is zero and the second derivative is positive.
What’s the critical point?
You’re familiar with the quadratic polynomial:
Let’s find the critical point. We know that the critical point is
The above is an equation, not a definition. It says that whatever
24.4 Be practical!
Decision making is about choosing among alternatives. In some engineering or policy contexts, this can mean finding a value for an input that will produce the “best” outcome. For those who have studied calculus, it is natural to believe that calculus-based techniques for optimization are the route to making the decision.
We emphasize that the optimization techniques covered in this chapter are only part of a broader set of techniques for real-world decision-making problems. In particular, most policy contexts involve multiple objectives. For example, in designing a car one goal is to make it cheap to manufacture, another to make it attractive, and still another to make it safe. These different objectives are often at odds with one another. In Block 4 of this text, we will discuss some calculus techniques that help policy-makers in multi-objective settings.
For now, sticking with the idealized (and often unrealistic) setting of maximizing a single objective, with one or more inputs. Recall the setting for calculus-type maximization. You have a function with one or more inputs, say,
If you can graph the function (feasible for one- or two-input functions), you can often easily scan the graph by eye to find the peak. The basis of the calculus techniques is the observation that, at the argmax of a smooth function, the derivative of the function is 0.
For example, consider a style problem that often appears in calculus textbooks. Suppose you have been tasked to design a container for a large volume V of liquid. The design specifications call for the weight of the container to be as little as possible. (This is a minimization problem, then.) In classical textbook fashion, the specifications might also stipulate that the container is to be a cylinder made out of a particular metal of a particular thickness.
The above is a lovely geometry/calculus problem. Whether it is relevant to any genuine, real-world problem is another question.
Using the notation in the diagram, the volume and surface area of the cylinder is
Minimizing the weight of the cylinder is our objective (according to the problem statement) and the weight is proportional to the surface area. Since the volume
Suppose
<- makeFun(2*pi*r^2 + 2*V/r ~ r, V=1000000)
A slice_plot(A(r) ~ r, bounds(r=c(10, 100))) %>%
gf_labs(x = "radius (cm)", y = "Surface area of container (square cm)")
As always, the function’s derivative is zero at the optimal
In calculus courses, the goal is often to find a formula for the optimal radius as a function of
Setting this to zero (which will be true at the optimal
For
We’ve presented the optimum
In modeling, a good rule of thumb is this: “If you don’t know what a sensible precision is for reporting your result, you don’t yet have a complete grasp of the problem.” Here are two reasonable ways to sort out a suitable precision.
- Solve a closely related problem that would have been equivalent for many practical purposes.
- Calculate how much the input can deviate from the argmax while producing a trivial change in the output of the objective function.
Approach (2) is always at hand, since you already know the objective function. Let’s graph the objective function near
Look carefully at the axes scales. Deviating from the mathematical optimum by about 5cm (that is, 50,000 microns) produces a change in the output of the objective function by about 400 units out of 55,000. In other words, about 0.7%.
It is true that
What’s different is that you know exactly the ultimate objective of a marathon: finish faster. But you may not know the ultimate objective of the system your “optimal” tank will be a part of. For instance, your tank may be part of an external fuel pod on an aircraft. Certainly the aircraft designers want the tank to be as light as possible. But they also want to reduce drag as much as possible. A 54 cm diameter tube has about 17% more drag than a 50 cm tube. It is probably well worth increasing weight by 0.7% to save that much drag.
In reporting the results from an optimization problem, you ought to give the decision maker all relevant information. That might be as simple as including the above graph in your report.
We mentioned another technique for getting a handle on what precision is meaningful: (1) solve a closely related problem. It can requires some insight and creativity to frame the new problem. For instance, large capacity tanks often are shaped like a lozenge: a cylinder with hemi-spherical ends.
Using
Again,
24.5 Exercises
Exercise 24.02
The simple model of the distance travelled by a tennis ball after launch from a slingshot is
Part A What is the dimension of
L T L / T L / T
Part B What is the dimension of
M L M L / T L / T
Part C What is the dimension of
L L^2 T/L It is dimensionless
Part D What is the dimension of
L L^2 T/L It is dimensionless
Part E What is the dimension of
L L^2 L/T L/T
Part F Suppose the initial velocity of the ball is
10 meters 25 meters 50 meters 75 meters
Exercise 24.04
Part A Consider the function
An argmax An argmin Neither
Part B Still working with the function
Negative Positive Zero
Exercise 24.06
The graph shows three different functions labeled (A), (B), and (C).
Part A Function (A) is
concave down non-concave concave up
Part B Function (B) is
concave down non-concave concave up
Part C Function (C) is
concave down non-concave concave up
Part D The negative of function (A) is
concave down non-concave concave up
The graph shows a function
Part E For what values of the input
Part F For what values of the input
Part G Where is the function steepest?
Exercise 24.08
Here is a smooth function marked at a few points. Your task is, at each point, to estimate the value of the derivative, the sign of the second derivative, and the radius of the circle that would nicely match the function in a small region around each point. (Remember, we are asking for the radius of the circle, which is half the diameter.)
To simplify things, here is a table giving seven different combinations of the quantities you are to estimate. Some of them correctly match one of the labeled points, some do not. All you need to do is choose which is the correct set of quantities for each labeled point.
row | value of 1st deriv | sign of 2nd deriv | radius |
---|---|---|---|
i | -0.3 | pos | 0.25 |
ii | 2.1 | near 0 | 2000 |
iii | -1.4 | neg | 12 |
iv | 0.3 | neg | 0.3 |
v | 2.1 | pos | 0.1 |
vi | 1.3 | neg | 3 |
vii | 0.5 | pos | 1 |
Part A Which row from the table best matches the function at point A?
i ii iii iv v vi vii
Part B Which row from the table best matches the function at point B?
i ii iii iv v vi vii
Part C Which row from the table best matches the function at point C?
i ii iii iv v vi vii
Part D Which row from the table best matches the function at point D?
i ii iii iv v vi vii
Part E Which row from the table best matches the function at point E?
i ii iii iv v vi vii
Exercise 24.10
You and your pet dog Swimmer often go to the beach and walk along the water’s edge. You throw a ball down the beach, but at an angle so it lands in the water. Swimmer goes to work. She runs down the beach (fast) and then plunges into the water, heading toward the ball. She can run fast on the beach: 400 m/minute. But she swims rather slower: 50 m/min.
Suppose you threw the ball to a point about 50 meters down the beach and 10 meters out in the water. The overall distance to the ball is therefore
Can Swimmer do better? You can set up the calculation like this. Imagine
<- makeFun( __your_pythagorean_formula ~ x)
distance_in_water <- makeFun(x/400 + distance_in_water(x)/50 ~ x) time_to_ball
Time_to_ball()
takes one argument, the distance
Part A What’s the optimal running distance for Swimmer?
46.75 47.5 48.75 49.75
Here’s a news story about a mathematician’s dog on the shore of Lake Michigan. It is not plausible that Swimmer has been trained in calculus. Perhaps the way Swimmer solves the running distance problem is simply to graph time_to_ball(x) ~ x
over a suitable domain and find the argmax by eye!
Exercise 24.12
If you’re skeptical that a dog might do a calculus problem before running to fetch a ball, consider the path taken by a photon. “Fermat’s Principle” is that light takes the path of least time. To illustrate, consider the problem of a photon traveling from a point A to a point B, as in the diagram. The shortest path between the two points is a straight line. Along this straight-line path, the time taken by the photon will be the distance divided by the speed of light.
The diagram shows another path consisting of two segments, one of length
The reason the indirect path might be shorter is that the speed of light differs in different physical media. Light traveling in a vacuum famously has a speed of about 300,000 km per second. In air, the speed is smaller by a factor of 1/1.003. In water, the speed is smaller still: the factor is 1/1.3.
Imagine that the blue zone of the diagram is water and the clear zone air. The time for the photon to travel from point A to B is proportional to
To see the path taken by light, let’s imagine that point A is
Part A Which of these formulas gives the total time it takes for light to traverse the path from A to P at relative speed 1/1.003 and then the path from P to B at relative speed 1/1.3? A is located at
Implement the calculation of total_time()
in R, then use a graph to find the argmin.
<- makeFun( your_formula ~ x)
total_time slice_plot(total_time(x) ~ x, bounds(x=0:20))
# For the next problem
<- D(total_time(x) ~ x)
dx_time <- D(total_time(x) ~ x & x) dxx_time
Part B What value of
10.52 11.02 12.22 12.50 13.21 14.94
Part C Suppose that instead of being water, the blue area was glass. The speed of light in glass is roughly 1/1.5 times as big as in vacuum. What value of
13.60 14.58 14.85 15.54
Exercise 24.18
Return to the problem of finding the optimal radius of a cylindrical tank with spherical ends. The point is to choose the sphere radius
Part A Which of these is correct? (Hint: Only one of the answers is dimensionally consistent.)
Part B Which of these is the correct expression for
Part C Find
Find the optimum value of
Part D What is the optimal value of
6.2035 46.0351 52.0351 62.0351
Use a sandbox to plot a graph of
Part E From the graph of
Another word for an “input” is “argument.” Argmax is the contraction of argument producing the maximum output.↩︎