23 Derivatives of assembled functions

In Chapter 19 we used the rules associated with evanescent $h$ , that is, $lim_{h \to 0}$ , to confirm our claims about the derivatives of many of the pattern-book functions. We will call these rules h-theory for short. This chapter will use h-theory to find algebraic rules to calculate the derivatives of linear combinations of functions, products of functions, and composition of functions. Remarkably, we can figure out these rules without specifying which functions are being combined. So the rules can be written in terms of abstractions: $f ()$ , $g ()$ , and $h ()$ . Later, we will apply those rules to specific functions, to show how the rules are used in practical work.

23.1 Using the rules

When you encounter a function that you want to differentiate, you first have to examine the function to decide which rule you want to apply. In the following, we will to use the names $f ()$ and $g ()$ , but in practice the functions will often be basic modeling functions, for instance $e^{k x}$ or $\sin (\frac{2 π}{P} t)$ , etc.

Step 1: Identify f() and g()

We will write the rules in terms of two function names, $f ()$ and $g ()$ , which can stand for any functions whatsoever. It is rare to see the product or the composition written explicitly as $f (x) g (x)$ of $f (g (x))$ . Instead, you are given something like $e^{x} \ln (x)$ . The first step in differentiating the product or composition is to identify what are $f ()$ and $g ()$ individually.

In general, $f ()$ and $g ()$ might be complicated functions, themselves involving linear combinations, products, and composition. But to get started, we will practice with cases where they are simple, pattern-book functions.

Step 2: Find f’() and g’()

For differentiating either products or compositions, you will need to identify both $f ()$ and $g ()$ (the first step) and then compute the derivatives $\partial_{x} f ()$ and $\partial_{x} g ()$ . That is, you will write down four functions.

Step 3: Apply the relevant rule

Recall from Chapter ?sec-fun-assembling that will will be working with three important forms for creating new functions out of existing functions:

Linear combinations, e.g. $a f (x) + b g (x)$
Products of functions, e.g. $f (x) g (x)$
Compositions of functions, e.g. $f (g (x))$

23.2 Differentiating linear combinations

Linear combination is one of the ways in which we make new functions from existing functions. As you recall, linear combination involves scaling functions and then adding the scaled functions as in $a f (x) + b g (x)$ , alinear combination of $f (x)$ and $g (x)$ . We can easily use $h$ to show what is the result of differentiating a linear combination of functions. First, let’s figure out what is $\partial_{x} a f (x)$ , Going back to writing $\partial_{x}$ in terms of a slope function:

$\partial_{x} a f (x) = \frac{a f (x + h) - a f (x)}{h} = a \frac{f (x + h) - f (x)}{h} = a \partial_{x} f (x)$ In other words, if we know the derivative $\partial_{x} f (x)$ , we can easily find the derivative of $a f ()$ . Notice that even though $h$ was used in the derivation, it appears nowhere in the result $\partial_{x} b f (x) = b \partial_{x} f (x)$ . The $h$ is solvent to get the paint on the wall and evaporates once its job is done.

Now consider the derivative of the sum of two functions, $f (x)$ and $g (x)$ : $\begin{array}{rc} \partial_{x} [f (x) + g (x)] & = \frac{[f (x + h) + g (x + h)] - [f (x) + g (x)]}{h} \\ = \frac{[f (x + h) - f (x)] + [g (x + h) - g (x)]}{h} \\ = \frac{[f (x + h) - f (x)]}{h} + \frac{[g (x + h) - g (x)]}{h} \\ = \partial_{x} f (x) + \partial_{x} g (x) \end{array}$

Because of how $\partial_{x}$ can be “passed through” a linear combination, mathematicians say that differentiation is a linear operator. Consider this new fact about differentiation as a down payment on what will eventually become a complete theory telling us how to differentiate a product of two functions or the composition of two functions. We will lay out the $h$ -theory based algebra of this in the next two sections.

We can summarize the h-theory result for linear combinations this way:

The derivative of a linear combination is the linear combination of the derivatives.

That is:

$\partial_{x} [a f (x) + b g (x)] = a f^{'} (x) + b g^{'} (x)$

as well as

$\partial_{x} [a f (x) + b g (x) + c h (x) + \dots] = a f^{'} (x) + b g^{'} (x) + c h^{'} (x) + \dots$

The derivative of a polynomial is a polynomial of a lower order.

Consider the polynomial $h (x) = a x^{0} + b x^{1} + c x^{2}$ The derivative is

$\partial_{x} h (x) = 0 a + 1 b + c 2 x = b + 2 c x$

23.3 Product rule for multiplied functions

The question at hand is how to compute the derivative $\partial_{x} f (x) g (x)$ . Of course, you can always use numerical differentiation. But let’s look at the problem from the point of view of symbolic differentiation. And since $f (x)$ and $g (x)$ are just pronoun functions, we will assume you are starting out already knowing the derivatives $\partial_{x} f (x)$ and $\partial_{x} g (x)$ .

This situation arises particularly when $f (x)$ and $g (x)$ are pattern-book functions for which you already have memorized $\partial_{x} f (x)$ and $\partial_{x} g (x)$ or are basic modeling functions whose derivatives you will memorize in Section @ref(basic-derivs).

The purpose of this section is to derive the formula for $\partial_{x} f (x) g (x)$ in terms of $f (x)$ , $g (x)$ , $\partial_{x} f (x)$ and $\partial_{x} g (x)$ . This formula is called the product rule. The point of showing a derivation of the product rule is to let you see how the logic of evanescent $h$ plays a role. In practice, everyone simply memorizes the rule, which has a beautiful, symmetric form:

$Product rule: \partial_{x} [f (x) g (x)] = [\partial_{x} f (x)] g (x) + f (x) [\partial_{x} g (x)]$ and is even prettier in Lagrange notation (where $\partial_{x} f (x)$ is written $f^{'}$ ): ${[f g]}^{'} = f^{'} g + g^{'} f$

As with all derivatives, the product rule is based on the instantaneous rate of change $F^{'} (x) \equiv lim_{h \to 0} \frac{F (x + h) - F (x)}{h}$ introduced in ?sec-instantaneous-rate-of-change.

We also need two other statements about $h$ and functions:

The derivative $F^{'} (x)$ is the slope of of $F ()$ at input $x$ . Taking a step of size $h$ from $x$ will induce a change of output of $h F^{'} (x)$ , so $F (x + h) = f (x) + h F^{'} (x) .$
Any result of the form $h F (x)$ , where $F (x)$ is finite, gives 0. More precisely, $lim_{h \to 0} h F (x) = 0$

As before, we will put the standard $lim_{h \to 0}$ disclaimer against dividing by $h$ until there are no such divisions at all, at which point we can safely use the equality $h = 0$ .

Suppose the function $F (x) \equiv f (x) g (x)$ , a product of the two functions $f (x)$ and $g (x)$ .

$F^{'} (x) = \partial_{x} [f (x) g (x)] \equiv lim_{h \to 0} \frac{f (x + h) g (x + h) - f (x) g (x)}{h}$

We will replace $g (x_{h})$ with its equivalent $g (x) + h g^{'} (x)$ giving

$= lim_{h \to 0} \frac{f (x + h) [g (x) + h g^{'} (x)] - f (x) g (x)}{h}$

$g (x)$ appears in both terms in the numerator, once multiplied by $f (x + h)$ and once by $f (x)$ . Collecting those terms give:

$= lim_{h \to 0} \frac{[f (x + h) - f (x)] g (x) + [f (x + h) h g^{'} (x)]}{h}$

This has two bracketed terms added together over a common denominator. Let’s split them into separate terms:

$= lim_{h \to 0} \underset{f^{'} (x)}{\underset{⏟}{[\frac{f (x + h) - f (x)}{h}]}} g (x) + lim_{h \to 0} \frac{[f (x) + h f^{'} (x)] h g^{'} (x)}{h}$

The first term is $g (x)$ multiplied by the familiar form for the derivative of $f (x)$

$= f^{'} (x) g (x) + lim_{h \to 0} \frac{f (x) h g^{'} (x)}{h} + lim_{h \to 0} \frac{h f^{'} (x) h g^{'} (x)}{h}$

In each of the last two terms there is an $h / h$ involved. This is safely set to 1, since the $lim_{h \to 0}$ implies that $h$ will not be exactly zero. There remain no divisions by $h$ so we can drop the $lim_{h \to 0}$ in favor of $h = 0$ :

$= f^{'} (x) g (x) + f (x) g^{'} (x) + h f^{'} (x) g^{'} (x)$

$= f^{'} (x) g (x) + g^{'} (x) f (x)$

The last step relies on statement (2) above.

Some people find it easier to read the rule in Lagrange shorthand, where $f$ and $g$ stand for $f (x)$ and $g (x)$ respectivly, and $f^{'}$ (“f-prime”) and $g^{'}$ (“g-prime”) stand for $\partial f ()$ and $\partial g ()$ .

$Lagrange shorthand: \partial [f \times g] = [f \times g]^{'} = f^{'} g + g^{'} f$

The expression $\partial_{x} x^{3}$ is the same as $\partial_{x} [x x^{2}]$ . Since we already know $\partial_{x} x$ (it is 1) and $\partial_{x} x^{2}$ (it is $2 x$ ) let’s apply the product rule to find $\partial_{x} x^{3}$ :

$\partial [x \times x^{2}] = [\partial x] \times x^{2} + [\partial x^{2}] \times x = 1 \times x^{2} + 2 x \times x = 3 x^{2}$

Occasionally, mathematics gives us a situation where being more general produces simplicity.

In the case of function products, the generalization is from products of two functions $f (x) \cdot g (x)$ to products of more than two functions, e.g. $u (x) \cdot v (x) \cdot w (x)$ .

The chain rule here takes a form that makes the overall structure much clearer:

$\begin{array}{r} \partial_{x} [u (x) \cdot v (x) \cdot w (x)] = \\ \partial_{x} u (x) \cdot v (x) \cdot w (x) + \\ u (x) \cdot \partial_{x} v (x) \cdot w (x) + \\ u (x) \cdot v (x) \cdot \partial_{x} w (x) \end{array}$ \end{eqnarray}

In the Lagrange shorthand, the pattern is even more evident:

${[u \cdot v \cdot w]}^{'} = u^{'} \cdot v \cdot w + u \cdot v^{'} \cdot w + u \cdot v \cdot w^{'}$

23.4 Chain rule for function composition

A function composition, as described in Section 9.2, involves inserting the output of one function (the “interior function”) as the input of the other function (the “exterior function”). As we so often do, we will be using pronouns a lot. A list might help keep things straight:

There are two functions involved in a composition. Generically, we call them $f (y)$ and $g (x)$ . In the composition $f (g (x))$ , the exterior function is $f ()$ and the interior function is $g ()$ .
Each of the two functions $f ()$ and $g ()$ has an input. In our examples, we use $y$ to stand for the input to the exterior function and $x$ for the input to the interior function.
As with all rules for differentiation, we will need to compute the derivatives of the functions involved, each with respect to its own input. So these will be $\partial_{y} f (y)$ and $\partial_{x} g (x)$ .

A reason to use different pronouns for the inputs to $f ()$ and $g ()$ is to remind us that the output $g (x)$ is in general not the same kind of quantity as the input $x$ . In a function composition, the $f ()$ function will take the output $g (x)$ as input. But since $g (x)$ is not necessarily the same kind of thing as $x$ , why would we want to use the same name for the input to $f ()$ as we use for the input to $g ()$ .

With this distinction between the names of the inputs, we can be even more explicit about the composition, writing $f (y = g (x))$ instead of $f (g (x))$ . Had we used the pronound $x$ for the input to $f ()$ but our explicit statement, although technically correct, would be confusing: $f (x = g (x))$ !

With all these pronouns in mind, here is the chain rule for the derivative $\partial_{x} f (g (x))$ :

$\partial_{x} [f (g (x))] = [\partial_{y} f] (g (x)) \times [\partial_{x} g (x)]$

Or, using the Lagrange prime notation, where $^{'}$ stands for the derivative of a function with respect to its input, we have

$Lagrange shorthand: [f (g)]^{'} = f^{'} (g) \times g^{'}$

23.5 Rates per time

In news and policy discussions, you will often hear about “inflation rate” or “birth rate” or “interest rate” or “investment rate of return.” In each case, there is a function of time combined with a derivative of that function: with the general form $\frac{\partial_{t} f (t)}{f (t)} .$

Inflation rate: The function is cost_of_living( $t$ ). The derivative is the rate of change with respect to time in the cost of living: $\partial_{t}$ cost_of_living( $t$ ).
Birth rate: The function is population( $t$ ). The derivative is $\partial_{t}$ population( $t$ ), or at least that component of the overall $\partial_{t}$ population( $t$ ) that is related to births. (Other components are deaths and the balance of in-migration and out-migration.)
Interest rate: The function is account_balance( $t$ ) and the derivative is $\partial_{t}$ account_balance( $t$ ).
Investment returns: The function is net_worth( $t$ ) and the derivative is $\partial_{t}$ net_worth( $t$ ).

In all these cases, The “rate” is not merely “per time” as would be the case for $\partial_{t} f (t)$ . Instead the rate is “per unit of the whole per time.” For birth rate, the “whole” is the population. The birth rate is the number of births in a year divided by the population itself. Birth rates are often stated with the phrase is “per capita per year.”

“Per capita” is Latin. It translates to “by head.” Its modern sense is “per unit of population.” Of course, the “unit of population” is a person.

Notice the two uses of “per” in the phrase: “births per capita per year.” A proportional rate is two rates in one. Births per capita is a proportion of the population. Births per year is an average rate with respect to time. But “births per capita per year” is a rate in the proportion with respect to time.

The rate word “per” also appears as part of “percent,” which literally means “per hundred.” A “percentage change” is the amount of change divided by the base amount. Confusingly, perhaps, “percentage change” is often truncated to the shorter “percent.” This is the case with inflation rates, interest rates, and rates of return on investment. The interest rate on a credit-card debt is stated as a proportion of the current debt; all that is packed into the word “percent.” The interest rate itself is the “proportion of the current debt per year”: two rates in one.

Similarly for an inflation rate. “Inflation” is stated as the change in prices divided by the current price: a proportional change. “Inflation rate” is the proportional change per unit of time, where the “whole” is current prices and the rate is change in current prices per year divided by current prices.

Thanks to the chain rule, there is a shortcut way of writing proportional rates per time. Exactly equivalent to the ratio $\frac{\partial_{t} f (t)}{f (t)}$ is $\partial_{t} \ln (f (t)) .$

Derivatives of logarithms appear often in fields such as economics or finance, where it is common to consider the logarithm of the economic quantity to render changes as percent of the whole.

Math in the World: Linear or logarithmic axes?

It always pays to look carefully at the axes in a graph. For instance, consider Figure 23.1 which shows the cumulative number of COVID during a period in 2020, early in the pandemic.

Figure 23.1: Growth in the number of Coronavirus cases in Italy and the US early in the pandemic. Source

The two panels in Figure 23.1 show the same data about growing numbers of coronavirus cases, the left graph on linear axes, the right on the now-familiar semi-log axes.

Most people are excellent at comparing slopes, even if they find it difficult or tedious to quantify a slope with a number and units. For instance, a glance suffices to show that in the left graph, well through mid-March the red curve (Italy) is steeper on any given date than the blue curve (US). Correspondingly, the number of people with coronavirus was growing faster (per day) in Italy.

The right graph tells a different story: up until about March 1, the Italian cases were increasing faster than the US cases. Afterwards, the US sees a larger growth rate than Italy until, around March 19, the US growth rate is substantially larger than the Italy growth rate.

The previous two paragraphs and their corresponding graphs seem to contradict one another. But they are both accurate, truthful depictions of the same events. What’s different between the two graphs is that the left shows one kind of rate and the right shows another kind of rate. In the left, the slope is new-cases-per-day, the output of the derivative function

left graph: $\partial_t daily_new_cases (t)$ .

On the right, the slope is the proportional increase in cases per day, that is,

right graph: $\frac{\partial_{t} daily_new_cases (t)}{daily_new_cases (t)}$ .

From the chain rule, we know that

$\partial_{t} [\ln (f (t))] = \frac{\partial_{t} f (t)}{f (t)} .$

Since the right graph is on semi-log axes, the slope we perceive visually is $\partial_{t} [\ln (f (t))]$ . That is an obscure-looking bunch of notation until the chain rule reveals it to be the rate of change at time $t$ divided by the value at time $t$ .

The derivation of the chain rule relies on two closely related statements which are expressions of the idea that near any value $x$ a function can be expressed as a linear approximation with the slope equal to the derivative of the function :

$g (x + h) = g (x) + h g^{'} (x)$
$f (y + ϵ) = f (y) + ϵ f^{'} (y)$ , which is the same thing as (1) but uses $y$ as the argument name and $ϵ$ to stand for the small quantity we usually write with an $h$ .

We will now look at $\partial_{x} f (g (x))$ by writing down the fundamental definition of the derivative. This, of course, involves the disclaimer $lim_{h \to 0}$ until we are sure that there is no division by $h$ involved.

$\partial_{x} [f (g (x))] \equiv lim_{h \to 0} \frac{f (g (x + h)) - f (g (x))}{h}$

Let’s examine closely the expression $f (g (x + h))$ . Applying rule (1) above turns it into $lim_{h \to 0} f (g (x) + h g^{'} (x))$ Now apply rule (2) but substituting in $g (x)$ for $y$ and $h g^{'} (x)$ for $ϵ$ , giving

$lim_{h \to 0} f (g (x + h)) = lim_{h \to 0} [f (g (x)) + h g^{'} (x) f^{'} (g (x))]$

We will substitute the $b l u e$ and $b r o w n$ expression for the $m a g e n t a$ expression in $\partial_{x} f (g (x)) \equiv lim_{h \to 0} \frac{f (g (x + h)) - f (g (x))}{h}$ giving

$\partial_{x} f (g (x)) \equiv lim_{h \to 0} \frac{f (g (x)) + h g^{'} (x) f^{'} (g (x)) - f (g (x))}{h}$

In the denominator, $f (g (x))$ appears twice and cancels itself out. That leaves a single term with an $h$ in the numerator and an $h$ in the denominator. Those $h$ ’s cancel out, at the same time obviating the need for $lim_{h \to 0}$ and leaving us with the chain rule:

$\partial_{x} f (g (x)) \equiv lim_{h \to 0} \frac{h g^{'} (x) f^{'} (g (x))}{h} = f^{'} (g (x)) g^{'} (x)$

Use the chain rule to find the derivative $\partial_{x} e^{2 x}$ .

Recognize that $g (x) \equiv 2 x$ is the interior function in $e^{2 x}$ and $f (x) \equiv \exp (x)$ is the exterior function. Thus $\partial_{x} e^{2 x} = f^{'} (g (x)) g^{'} (x) = \exp (g (x)) 2 = 2 e^{2 x} .$ Happily, this is the same result as we got from using the product rule to find $\partial_{x} e^{2 x}$ .

Recognizing $e^{2 x}$ as $e^{x} \times e^{x}$ , we can apply the product rule.

The chain rule can be used in a clever way to find a formula for $\partial_{x} \ln (x)$ .

We’ve already seen that the logarithm is the inverse function to the exponential, and vice versa. That is:

$e^{\ln (y)} = y and \ln (e^{y}) = x$

Since $\ln (e^{y})$ is the same function as $y$ , the derivative $\partial_{y} \ln (e^{y}) = \partial_{y} y = 1$ .

Let’s differentiate the second form using the chain rule:

$\partial_{y} \ln (e^{y}) = [\partial_{y} \ln] (e^{y}) e^{x} = 1$ giving $[\partial_{y} \ln] (e^{y}) = \frac{1}{e^{y}} = recip (e^{y})$

Whatever the function $\partial_{x} \ln ()$ might be, it takes its input and produces as output the reciprocal of that input. In other words:

$\partial_{x} \ln (x) = \frac{1}{x} .$

Knowing that $\partial_{x} \ln (x) = 1 / x$ and the chain rule, we are in a position to demonstrate the power-law rule $\partial_{x} x^{p} = p x^{p - 1}$ . The key is to use the identity $e^{\ln (x)} = x$ .

$\partial_{x} x^{p} = \partial_{x} {[e^{\ln (x)}]}^{p}$

The rules of exponents allow us to recognize ${[e^{\ln (x)}]}^{p} = e^{p \ln (x)}$ Thus, $x^{p}$ can be seen as a composition of the exponential function onto the logarithm function.

Applying the chain rule to this composition gives

$\partial_{x} e^{p \ln (x)} = e^{p \ln (x)} \partial_{x} [p \ln (x)] = e^{p \ln (x)} \frac{p}{x} .$ Of course, we already know that $e^{p \ln (x)} = x^{p}$ , so we have

$\partial_{x} x^{p} = x^{p} \frac{p}{x} = p x^{p - 1} .$

$\partial_{x} [\sin (a x + b)] = [\partial_{x} \sin] (a x + b) \times \partial_{x} [a x + b] = \cos (a x + b) \times a$ .

23.6 Derivatives of the basic modeling functions

The basic modeling functions are the same as the pattern-book functions, but with bare $x$ replaced by $line (x)$ . In other words, each of the basic modeling functions is a composition of the corresponding pattern-book function with $line (x)$ . Consequently, the derivatives of the basic modeling functions can be found using the chain rule.

Suppose $f ()$ is one of our pattern-book functions. Then $\partial_{x} f (a x + b) = a f^{'} (a x + b)$ where $a$ is the derivative with respect to $x$ of $a x + b$ .

Here are the steps for differentiating a basic modeling function $f (a x + b)$ where $f ()$ is one of the pattern-book functions:

Step 1: Identify the particular pattern-book function $f ()$ and write down its derivative $f^{'}$ . For example, if $f ()$ is $\sin ()$ , then $f^{'} ()$ is $\cos ()$ .
Step 2: Find the derivative of the linear interior function. If the function is $a x + b$ , then the derivative is $a$ . If the interior function is $\frac{2 π}{P} (t - t_{0})$ , the derivative is $\frac{2 π}{P}$ .
Step 3: Write down the original function $f (a x + b)$ but replace $f$ with $f^{'}$ and pre-multiply by the derivative of the interior function. For instance, $\partial_{x} f (a x + b) = a f^{'} (a x + b)$ Another example: $\partial_{t} \sin (\frac{2 π}{P} (t - t_{0})) = \frac{2 π}{P} \cos (\frac{2 π}{P} (t - t_{0}))$

By convention, there are different ways of writing $line (x)$ for the different pattern-book functions, for instance:

Pattern-book function $⟶$	Basic modeling
$\sin (x) ⟶$	$\sin (2 π [x - x_{0}] / P)$
$\exp (x) ⟶$	$\exp (k x)$
$x^{2} ⟶$	${[m x + b]}^{2}$
$1 / x ⟶$	$1 / [m x + b]$
$\ln (x) ⟶$	$\ln (a x + b)$

The rule for the derivative of any basic modeling function $f (line (x))$ is $\partial_{x} f (line (x)) = \partial_{x} line (x) \times \partial_{x} f (line (x))$

To illustrate:

$\partial_{x} e^{k x} = k e^{k x}$ where $line (x) = k x$ .
$\partial_{x} \sin (2 π (x - x_{0}) / P) = \frac{2 π}{P} \sin (2 π (x - x_{0}) / P)$ where $line (x) = 2 π (x - x_{0}) / P)$ .
$\partial_{x} (m x + b)^{2} = m 2 (m x + b) = 2 m^{2} x + m^{2} b$ where $line (x) = m x + b$ .
$\partial_{x} reciprocal (m x + b) = \partial_{x} \frac{1}{m x + b} = - \frac{m}{(m x + b)^{2}}$ where $line (x) = m x + b$ and we use the fact that $\partial_{x} reciprocal (x) = - 1 / x^{2}$
$\partial_{x} \ln (a x + b) = a / (a x + b)$
$\partial_{x} pnorm (x, mean, sd) = d n o r m (x, mean, sd)$ .
$\partial_{x} dnorm (x, mean, sd) = - \frac{x - m}{{sd}^{2}} dnorm (x, mean, sd)$

You will be using the derivatives of the basic modeling functions so often, that you should practice and practice until you can write the derivative at a glance.

There are many possible implementations of the general concept of hump functions and sigmoid functions. This book uses $dnorm ()$ for the hump and $pnorm ()$ for the sigmoid.

The names $dnorm$ and $pnorm$ are worth remarking on. As we’ve said before, $dnorm ()$ is called the gaussian function in many fields of science and engineering. It is also a centrally important function in statistics, where it is called the normal function. (that is how important it is: it is just “normal.”) You may also have heard the normal function described as a “bell-shaped curve.”

In statistical nomenclature, $dnorm ()$ is called the “normal probability density function (PDF)” and $pnorm ()$ is called the “normal cumulative density function (CDF).” that is way too wordy for our purposes. For brevity, we have adopted the R name for those functions: dnorm() and pnorm().

Owing to the origin of the names $dnorm$ and $pnorm$ , we are writing the parameters of the functions—mean and sd—using the computer language notation. The pattern-book functions are just $dnorm (x)$ and $pnorm (x)$ , without listing the parameters. But the basic modeling functions, with parameters, are written $dnorm (x, mean, sd)$ and $dnorm (x, mean, sd)$ . This violates the convention that the basic modeling functions are the composition of the pattern-book functions with $line (x)$ . But $dnorm ()$ does not work this way because, by convention, the amplitude of the peak of $dnorm ()$ changes with the input parameter sd. That is not true for any other basic modeling function.

Composition or product?

There is one family of functions for which function composition accomplishes same thing as multiplying functions: the power-law family.

Consider, for instance, the function $h (x) \equiv {[3 x]}^{4}$ . Let’s let $g (x) \equiv 3 x$ and $f (y) \equiv y^{4}$ . With these definitions, $h (x) = f (g (x))$ .

Recognizing that $\partial_{y} f (y) = 4 y^{3}$ and $\partial_{x} g (x) = 3$ , the chain rule gives $\partial_{x} h (x) = \underset{f^{'} (g (x))}{\underset{⏟}{4 g (x)^{3}}} \times \underset{g^{'} (x)}{\underset{⏟}{3}} = \underset{f^{'} (g (x))}{\underset{⏟}{4 (3 x)^{3}}} \times 3 = 4 \cdot 3^{4} \times x^{3} = 324 x^{3}$ Another way to look at the same function is $g (x)$ multiplied by itself 3 times: $h (x) = g (x) \cdot g (x) \cdot g (x) \cdot g (x)$ This is a product of 4 terms. Applying the product rule gives $\begin{array}{rcl} \partial_{x} h (x) & = & g^{'} (x) \cdot g (x) \cdot g (x) \cdot g (x) + \\ g (x) \cdot {g (x)}^{'} \cdot g (x) \cdot g (x) + \\ g (x) \cdot g (x) \cdot g (x)^{'} \cdot g (x) + \\ g (x) \cdot g (x) \cdot g (x) \cdot g^{'} (x) \end{array}$ Since multiplication is commutative, all four terms are the same, each being $3^{4} x^{3}$ . The sum of all four is therefore $4 \times 3^{4} x^{3} = 324 x^{3}$ .

These are two long-winded ways of getting to the result. For most people, differentiating power-law functions algebraically is simplified by using the rules of exponentiation rather than the product or chain rule. Here, $h (x) \equiv {[3 x]}^{4} = 3^{4} x^{4}$ so $\partial_{x} h (x)$ is easily handled as a scalar ( $3^{4}$ ) times a function $x^{4}$ . Consequently, applying the rule for differentiating power laws,

$\partial_{x} h (x) = 3^{4} \times \partial_{x} x^{4} = 3^{4} \times 4 x^{3} = 324 x^{3}$

As another example, take $h (x) \equiv \sqrt[4]{x^{3}}$ . This is, of course, the composition $f (g (x))$ where $f (y) \equiv y^{1 / 4}$ and $g (x) \equiv x^{3}$ . Applying the chain rule to find $\partial_{x} h (x)$ will work (of course!), but is more work than applying the rules of exponentiation followed by a simple power-law differentiation.

$h (x) = \sqrt[4]{x^{3}} = x^{3 / 4} so \partial_{x} h (x) = \frac{3}{4} x^{(3 / 4 - 1)} = \frac{3}{4} x^{- 1 / 4}$

23.7 Exponentials and logarithms (optional)

The natural logarithm function, $\ln (x)$ , is one of our basic modeling functions. As you know, there are other logarithmic functions. The one most often used is the logarithm-base-10, written $\log_{10} (x)$ or log10(x). Ten is an integer, and a nice number to use in arithmetic. So in practice, it is sensible to use $\log_{10} ()$ . (Indeed, $\log_{10} ()$ is the digit() function, introduced in Chapter ?sec-magnitudes).

The “natural” in the “natural logarithm” means something different.

The base of the natural logarithm is the number called Euler’s constant and written $e$ . As a celebrity number, $e$ is right up there with $π$ and $i$ . Just as $π$ has a decimal expansion that is infinitely long—the familiar $π = 3.14159265358979 . . .$ —Euler’s constant has an infinitely long decimal representation: $e = 2.71828182845905 . . .$

It is not obvious why $e = 2.71828182845905 . . .$ should be called “natural” by mathematicians. The reasons are:

$\ln (x)$ is the inverse of $e^{x}$ , which is special for being invariant under differentiation: $\partial_{x} e^{x} = e^{x}$ .
The derivative $\partial_{x} \ln (x)$ which has a particularly simple form, namely, $1 / x$ .

Let’s look at the log-base-10 and its computer-savvy cousin log-base-2. The very definition of logarithms means that both 10 and 2 can be written $10 = e^{\ln (10)} and 2 = e^{\ln (2)}$ This implies that the base-10 and base-2 exponential functions can be written in terms of Euler’s constant $e$ :

$10^{x} = {[e^{\ln (10)}]}^{x} = e^{\ln (10) x} and 2^{x} = {[e^{\ln (2)}]}^{x} = e^{\ln (2) x}$

Calculating $\partial_{x} 10^{x}$ or $\partial_{x} 2^{x}$ is a matter of applying the chain rule:

$\partial_{x} [10^{x}] = \partial_{x} [e^{\ln (10) x}] = e^{\ln (10) x} \times \ln (10) = 10^{x} \times 2.3026$

and

$\partial_{x} [2^{x}] = \partial_{x} [e^{\ln (2) x}] = e^{\ln (2) x} \times \ln (2) = 2^{x} \times 0.6931$

Like $e^{x}$ , the derivatives of $10^{x}$ and $2^{x}$ are proportional to themselves. For $e^{x}$ the constant of proportionality is 1, a very natural number indeed.

23.8 Drill

Part 1 Which of the derivative rules should you use to find $\partial_{t} e^{t^{2}} ?$

The constant multiplier rule
The linear combination rule
The product rule
The chain rule
No rule needed, it is so basic.

Part 2 Which of the derivative rules should you use to find $\partial_{t} e^{x^{2}} ?$

The constant multiplier rule
The linear combination rule
The product rule
The chain rule
No rule needed, it is so basic.

Part 3 Which of the derivative rules should you use to find $\partial_{t} e^{t} \sin (t) ?$

The constant multiplier rule
The linear combination rule
The product rule
The chain rule
No rule needed, it is so basic.

Part 4 Which of the derivative rules should you use to find $\partial_{t} e^{t} \sin (x) ?$

The constant multiplier rule
The linear combination rule
The product rule
The chain rule
No rule needed, it is so basic.

Part 5 Which of the derivative rules should you use to find $\partial_{t} \ln (t) ?$

The constant multiplier rule
The linear combination rule
The product rule
The chain rule
No rule needed, it is so basic.

Part 6 Which of the derivative rules should you use to find $\partial_{t} t e^{- t} ?$

The constant multiplier rule
The linear combination rule
The product rule
The chain rule
No rule needed, it is so basic.

Part 7 Which of the derivative rules should you use to find $\partial_{x} 37 x^{5} ?$

The constant multiplier rule
The linear combination rule
The product rule
The chain rule
No rule needed, it is so basic.

Part 8 Which of the derivative rules should you use to find $\partial_{x} 19 ?$

The constant multiplier rule
The linear combination rule
The product rule
The chain rule
No rule needed, it is so basic.

Part 9 Which of the derivative rules should you use to find $\partial_{x} 15 x^{2} - 3 x + 7 \ln (x) ?$

The constant multiplier rule
The linear combination rule
The product rule
The chain rule
No rule needed, it is so basic.

Part 10 What is $\partial_{x} 15 x^{2} - 3 x + 7 \ln (x)$ ?

$30 x - 3 - 7 / x$ $30 x - 3 + 7 / x$ $15 x - 3 + 7 / x$ $30 x - 3 x + 7 / x$

Part 11 What is $\partial_{t} e^{k} + \ln (e^{2}) - t$ ?

$k e^{k} + 2 / e - t$
0
-1
$e^{k} + 1 / e$

Part 12 What is $\partial_{x} \ln (x) / x^{2}$ ? (Hint: You can write the function in a simpler way.)

$- 2 x^{- 3} (1 / x - 1)$
$- 2 x^{- 3} \ln (x)$
$- 2 x^{- 1} \ln (x)$
$x^{- 3} (1 - 2 \ln (x))$

Part 13 Which of these is $\partial_{t} (l n (6) + t^{4} - e^{t})$ ?

$\frac{1}{6} + 4 t^{3} - e^{t}$
$\frac{1}{6} + 4 t^{3} - e^{- t}$
$4 t^{3} - e^{- t}$
$4 t^{3} - e^{t}$

Part 14 Which of these is $\partial_{u} (\frac{1}{u^{6}} - π^{3} + 4 u^{3} + e)$ ?

$- 6 u^{- 7} - 3 π^{2} + 4 u^{3}$
$- 6 u^{- 5} - 3 π^{2} + 12 u^{2} + \frac{1}{e}$
$- 6 u^{- 7} + 12 u^{2}$
$- 6 u^{- 5} + 12 u^{2} + \frac{1}{e}$

Part 15 Which of these is $\partial_{v} (\sqrt[4]{v^{7}} + e^{7} - 4 - \frac{3 v^{6}}{v^{2}}) =$

$\frac{7}{4} \frac{1}{v^{4}} + 7 e^{6} - \frac{18 v^{5}}{2 v}$
$\frac{7}{4} v^{\frac{3}{4}} - 12 v^{3}$
$\frac{4}{7} v^{\frac{- 3}{7}} - \frac{18 v^{5}}{2 v}$
$\frac{7}{4} v^{\frac{3}{4}} + e^{7} - 12 v^{3}$

Part 16 What is $\partial_{t} (4 \sin (2 π t) - 5)$ ?

$8 \cos (2 π t)$
$4 π \cos (2 π t)$
$4 \cos (2 π t) - 5$
$8 π \cos (2 π t)$

Part 17 What is $\partial_{t} (7 + 8 t^{2} + 3 t^{4})$ ?

$4 t + 12 t^{2}$ $8 t + 4 t^{3}$ $16 t + 12 t^{3}$ $16 t^{2} + 9 t^{3}$

Part 18 The derivative $\partial_{x} dnorm (x) = - x dnorm (x)$ . What is $\partial_{x} dnorm (\frac{x^{2}}{4}) ?$

$- \frac{x^{2}}{2} dnorm (\frac{x^{2}}{4})$
$- \frac{x}{2} dnorm (\frac{x^{2}}{4})$
$- \frac{x^{3}}{8} dnorm (\frac{x^{2}}{4})$
$- \frac{x}{8} dnorm (\frac{x^{2}}{4})$

Part 19 What is $\partial_{t} (6 t - 3 t^{2} + 2 t^{4})$ ?

$6 - 3 t + 8 t^{2}$ $6 - 3 t + 6 t^{3}$ $6 - 6 t + 8 t^{3}$ $- 3 t + 6 t^{3}$

Part 20 What is $\partial_{t} \ln (t^{2} + 1)$ ?

$2 t \ln (t^{2} + 1)$ $1 / t^{2} + 1$ $\frac{2 t}{t^{2} + 1}$ $1 / 2 t$

Part 21 For the function $g (t) \equiv \sin (\frac{2 π}{P} (t - t_{0}))$ is the interior function linear?

Yes No

Part 22 For the function $g (P) \equiv \sin (\frac{2 π}{P} (t - t_{0}))$ is the interior function linear?

Yes No

Part 23 For the function $h (u) \equiv \ln (a^{2} u - \sqrt{b})$ is the interior function linear?

Yes No

Part 24 For the function $f (w) \equiv e^{k w}$ , is the interior function linear?

Yes No

Part 25 Saying “the interior function is linear” is not an entirely complete statement. A full statement is “the interior function is linear in terms of the input $x$ ” or “in terms of the input $u$ ” or whatever name we choose to use for the input.
Is the expression $V x + U$ linear in terms of $U$ ?

Yes No

Part 26 Saying “the interior function is linear” is not an entirely complete statement. A full statement is “the interior function is linear in terms of the input $x$ ” or “in terms of the input $u$ ” or whatever name we choose to use for the input.
Is the expression $V x^{2} + U$ linear in terms of $U$ ?

Yes No

Part 27 Saying “the interior function is linear” is not an entirely complete statement. A full statement is “the interior function is linear in terms of the input $x$ ” or “in terms of the input $u$ ” or whatever name we choose to use for the input.
Is the expression $V x^{2} + U$ linear in terms of $X$ ?

Yes No

23.9 Exercises

Exercise 23.01

Section 23.1 explains that in differentiating a linear combination of two functions, or a product of two functions, or one function composed with another, your first task is to identify the two functions $f ()$ and $g ()$ involved. Second, compute the derivative of each of those functions on its own: $\partial_{x} f (x)$ and $\partial_{x} g (x)$ .

Carry out these two tasks for each of the combined functions shown in the table. (The first row has been done for you as an example.)

Combination	$f ()$	$g ()$	$\partial_{x} f ()$	$\partial_{x} g ()$
$e^{x} \ln (x)$	$\ln (x)$	$e^{x}$	$recip$ (that is $1 / x$ )	$e^{x}$
$s i n (e^{x})$
$x + x^{2}$
$1 / \sin (x)$
$pnorm (x)^{2}$
$\sqrt{pnorm (x)}$
$pnorm (x^{2})$
$pnorm (\sin (x))$

Exercise 23.02

For each of the following, say whether the function is a composition $f (g (x))$ or a product $f (x) g (x)$ , or neither.

Part A What sort of combination is $h_{1} (x) \equiv \ln (x) e^{x}$ ?

product composition neither

Part B What sort of combination is $h_{2} (x) \equiv \sin (x) \cos (x)$ ?

product composition neither

Part C What sort of combination is $h_{3} (x) \equiv \sin (\ln (x))$ ?

product composition neither

Part D What sort of combination is $h_{4} (x) \equiv e^{\ln (x)}$ ?

product composition neither

Part E What sort of combination is $h_{5} (x) \equiv \sin (x) - dnorm (x)$ ?

product composition neither

Part F What sort of combination is $h_{6} (x) \equiv e^{x^{2}}$ ?

product composition neither

Part G What sort of combination is $h_{7} (x) \equiv pnorm (x^{2})$ ?

product composition neither

Part H What sort of combination is $h_{8} (x) \equiv pnorm (x) dnorm (x)$ ?

product composition neither

Part I What sort of combination is $h_{9} (x) \equiv 1 / \sin (x)$ ?

product composition neither

Exercise 23.03

Consider this function, $F (t)$ , which is a linear combination of three time-shifted sigmoids.

As you know, the derivative of a sigmoid $pnorm (t)$ is a gaussian with the same center and standard deviation.

Part A How many gaussians will be in $\partial_{t} F (t)$ .

2 3 6 none

The following figure shows several functions. One of them is $\partial_{t} F (t)$ .

Part B Which function is the actual derivative of $F (t)$ ? (Hints: The vertical axis is important as is the value of $dnorm (0)$ .)

A B C D

Part C Of the functions (1), (2), (3), and (4) below, which function is the second derivative of $F (t)$ ? (Hints: The vertical axis is important as is the value of $dnorm (0)$ .)

(1) (2) (3) (4)

Exercise 23.04

In function compositions of the form $f (g (x))$ , the function $f ()$ is called the exterior function and $g ()$ is called the interior function.

Part A In $\cos (\ln (x))$ which is the interior function?

$\ln ()$
$\cos ()$
$\sin ()$
None of the above
It is not a function composition

Part B In $1 / \sin (x)$ which is the exterior function?

$recip ()$
$\cos ()$
$\sin ()$
None of the above
It is not a function composition

Part C In $\sin (\frac{2 π}{P} (t - t_{0}))$ which is the exterior function?

$t - t_{0}$
$\frac{2 π}{P}$
$\frac{2 π}{P} t$
$\frac{2 π}{P} (t - t_{0})$
$\sin ()$
None of the above
It is not a function composition

Part D In $\sin (2 π (t - t_{0}) / P)$ which is the interior function?

$t - t_{0}$
$2 π / P$
$2 π t / P t$
$2 π (t - t_{0}) / P$
$\sin ()$
None of the above
It is not a function composition.

Part E In $\sin (x) dnorm (x^{2})$ , which is the interior function?

$x^{2}$
$x$
$dnorm (x^{2})$
None of the above
It is not a function composition.

Exercise 23.06

Compare the functions $f_{1} \equiv dnorm (x, m n, s d)$ and $f_{2} \equiv dnorm ([x - m n] / s d)$ by plotting them out in a SANDBOX.

to construct the plot, you will have to pick specific values for $m n$ and $s d$ . Make sure that you use the same $s d$ and $m n$ when constructing $f_{1} ()$ and $f_{2} ()$ . For instance:

f1 <- makeFun(dnorm(x, mn, sd) ~ x, mn=2, sd=3)
f2 <- makeFun(dnorm( (x-mn) / sd) ~ x, mn=2, sd=3)

Part A When $sd = 1$ , are the two functions the same?

Yes
Yes, but only if $mn = 1$
Yes, but only if $mn = 0$
No

Part B When $sd \neq 1$ , for any given mean, the two functions are not the same. What’s the relationship between $f_{1} (x)$ and $f_{2} (x)$ ?

$f_{2} (x) = s d f_{1} (x)$
$f_{1} (x) = s d f_{2} (x)$
$f_{1} (x) = s d^{2} f_{2} (x)$
$f_{2} (x) = s d^{2} f_{1} (x)$

Exercise 23.08

Pilots of commercial passenger aircraft consider the comfort of their passengers into account when flying. In transitioning from level flight onto the descent path for landing, for example, pilots take care that the vertical component of acceleration isn’t so great that passengers feel the plane “falling out from under them.”

A simple model of the descent path is a sigmoid function. Suppose that the descent starts from an altitude of $A = 20, 000$ feet at a distance of 30,000 feet from the end of the runway. A reasonable model for the vertical component of the flight path is $altitude (x) \equiv A pnorm (x, m n = 30000 / 2, s d = 30000 / 6)$ Notice that the parameter “mean” is set to be half the distance to the runway, and the parameter “sd” is set to be a third of that. This ensures that the start and end of the descent will involve flight that is close to level.

The vertical acceleration is the second derivative of alt() with respect to time: $\partial_{t t} altitude (t)$ . But notice that alt() is a function of distance from the runway, not time.

to treat alt() as a function of time, we need to write “distance from the runway” as a function of time. Let’s set $t = 0$ to be the time when the plane begins its descent, when it is 30,000 feet from the end of the runway. Distance from the runway will be $x (t) = 30000 - v t$ where $v$ is the plane’s velocity. Composing altitude() onto $x (t)$ gives a new function

$alt (t) \equiv altitude (x (t)) = altitude (30000 - v t)$

Suppose that the aircraft is flying at $v = 200$ miles-per-hour, which is $200 \frac{miles}{hour} \frac{1 hour}{3600 s} \frac{5280 ft}{1 mile} = 293.3 \frac{ft}{s}$ . At that speed, it will take a little more than 100 seconds for the aircraft to reach the runway.

Using a sandbox, plot out the function alt( $t$ ) function, choosing a domain for $t$ that lets you see the whole descent path.

alt <- makeFun(20000 * pnorm(30000 - v * t, 30000/2, 30000/6) ~ t, v = 293.3)
slice_plot(alt(t) ~ t, bounds(t=0:110))

Compute the second derivative $\partial_{t t} alt (t)$ to find the vertical component of acceleration of the aircraft. (Important note: Due to a bug in R, use numD() rather than D() to compute the second derivative.)

Graph the second derivative over the appropriate domain and look for the most extreme values of acceleration.

dd_alt <- numD(alt(t) ~ t + t)
slice_plot(dd_alt(t) ~ t, bounds(t=0:110))

From the graph, read off the maximum vertical acceleration during the descent.

Part A What are the units of vertical acceleration shown in the graph?

feet-per-second
feet-per-second-squared
miles-per-hour-squared

A rule of thumb is that a vertical acceleration up to $5 ft s^{- 2}$ is acceptable in terms of passenger comfort. Regrettably, the descent path we described does not meet the standard! So we have to re-design the descent path. Since both the altitude and velocity are set, the only parameter you can change is the distance from the foot of the runway where descent commences. Of course, for the parameters “mean” and “sd” need to be set accordingly.

Part B How far from the foot of the runway should descent begin to stay within the $5 ft s^{- 2}$ acceleration constraint? Pick the shortest distance that satisfies the constraint.

40,000 ft 50,000 ft 60,000 ft 70,000 ft 80,000 ft

For reflection: A new hire at the airline’s operations center proposes to model the descent as a straight-line function rather than a sigmoid. He points out that the second derivative of a straight-line function is always 0, so the passengers would feel no acceleration at all! Explain to this newbie what’s wrong with his idea.

Exercise 23.09

In Exercise none yet, E9e7c6 you constructed models $D (t)$ of the availability of a drug in the bloodstream for three different pill-taking regimens: every six hours, every eight hours, and a double dose to start followed by a single dose every eight hours. The model for from a single, isolated pill is a zero before the pill is taken, then exponential decay from the level of the pill dose after the pill is taken. Like this:

pill <- makeFun(ifelse(t < 0, 0, exp(-k * t)) ~ t, k = log(2)/3)

The parameter $k$ has been set to represent a drug with a half-life of three hours.

The model for the entire regiment is a linear combination of time-shifted single pills, e.g.

regimen8 <- makeFun(A*pill(t) + A*pill(t-8) + A*pill(t-16) + A*pill(t-24) + A*pill(t-32) ~ t, A=1)

From graphs of the functions themselves it is easy to check whether the availability ever falls below the therapeutic threshold (which we stipulated is 0.25). For instance, the eight-hour regiment with a dose of A=1 does fall below the threshold during the first day. So a larger dose is needed than A=1.

The derivative $\partial_{t} regimen8 (t)$ tells the instantaneous rate at which the drug is being administered to and eliminated from the patient’s body.

For each of the three regimens, construct $\partial_{t} regimen (t)$ . Ignoring the glitches due to discontinuity at the times the pills are consumed, which of the three regimens has the lowest average rate of drug elimination?

Exercise 23.10

Recall from Section 9.2 the Lorenz curve used to describe income inequality. The Lorenz curve shows the fraction of total income versus population fraction.

Figure 23.2: A Lorenz curve (blue) fitted to income data from the US in 2009. (See Figure 11.15.)

Since the population is arranged from poorest to richest along the horizontal axis, Lorenz curves must be both monotonically increasing and concave up. That is, any Lorenz function $L (P)$ , where $P$ is the population fraction, must satisfy these criteria:

$L (0) = 0$
$L (1) = 1$ that is, the aggregate fraction of income earned by the entire population is 100%.
$\partial_{P} L (P) > 0$ that is, monotonically increasing
$\partial_{P P} L (P) > 0$ that is, concave up.

Consider a function $H (P) \equiv L_{1} (L_{2} (P))$ which is the composition of two Lorenz curves.

A. Use the composition rule to show that $H (P)$ is monotonically increasing. (Hint, calculate $\partial_{P} H (P)$ and show that it must be positive.)
B. Using both the composition and product rules, calculate $\partial_{P P} H (P)$ and show that $H (P)$ must be concave up.

Exercise 23.12

The formula for the function $dnorm (x)$ is

$dnorm (x) \equiv \frac{1}{\sqrt{2 π}} \exp (\frac{x^{2}}{2}) .$

A. Use the chain rule to find $\partial_{x} dnorm (x)$ .

B. Confirm from your answer to (1) that there is another formula for $\partial_{x} dnorm (x)$ , namely $\partial_{x} dnorm (x) = - x dnorm (x) .$

C. Use the product rule to find $\partial_{x x} dnorm (x)$ .

D. From your answer to (3), compute the 3rd derivative $\partial_{x x x} dnorm (x)$ :

E. Let’s generalize the pattern. Each of the previous derivatives has been a polynomial—let’s call it $p_{n} (x)$ for the $n$ th derivative—times $dnorm (x)$ . Knowing $p_{n} (x)$ , we can easily find $p_{n + 1} (x)$ :

$p_{n + 1} (x) = - x p_{n} (x) + \partial_{x} p_{n} (x)$

We know $p_{1} (x) = - x$ so $p_{2} (x) = x^{2} - 1$ . In turn, this tells us $p_{3} (x) = 3 x - x^{3}$ . Find:

$p_{4} (x)$
$p_{5} (x)$
$p_{6} (x)$

Exercise 23.14

Confirm using algebraic manipulation the differentiation rule for a product of three functions:

${[u \cdot v \cdot w]}^{'} = u^{'} \cdot v \cdot w + u \cdot v^{'} \cdot w + u \cdot v \cdot w^{'}$

Here, $u$ is shorthand for $u (x)$ , and $u^{'}$ is shorthand for $\partial_{x} u (x)$ , and similarly for $v$ and $w$ .

Hint: $[u \cdot v \cdot w] = u \cdot [v \cdot w]$ . So a product of three functions can be seen as a product $u \cdot h$ where $h \equiv v \cdot w$ .