@@ -281,10 +281,10 @@ That was just a re-arrangement. Now, let's require that
281281$$\lambda^\prime = -\frac{df}{du}^\ast \lambda + \left(\frac{dg}{du} \right)^\ast$$
282282$$\lambda(T) = 0$$
283283
284- This means that the boundary term of the integration by parts is zero, and also one of those integral terms are perfectly zero.
284+ This means that one of the boundary term of the integration by parts is zero, and also one of those integrals is perfectly zero.
285285Thus, if $\lambda$ satisfies that equation, then we get:
286286
287- $$\frac{dG}{dp} = \lambda^\ast(t_0)\frac{dG}{du} (t_0) + \int_{t_0}^T \left(g_p + \lambda^\ast f_p \right)dt$$
287+ $$\frac{dG}{dp} = \lambda^\ast(t_0)\frac{du (t_0)}{dp} + \int_{t_0}^T \left(g_p + \lambda^\ast f_p \right)dt$$
288288
289289which gives us our adjoint derivative relation.
290290
@@ -296,8 +296,8 @@ in which case
296296
297297$$g_u(t_i) = 2(d_i - u(t_i,p))$$
298298
299- at the data points $(t_i,d_i)$. Therefore, the derivative of an ODE solution
300- with respect to a cost function is given by solving for $\lambda^\ast$ using an
299+ at the data points $(t_i,d_i)$. Therefore, the derivatives of a cost function with respect to
300+ the parameters is obtained by solving for $\lambda^\ast$ using an
301301ODE for $\lambda^T$ in reverse time, and then using that to calculate $\frac{dG}{dp}$.
302302Note that $\frac{dG}{dp}$ can be calculated simultaneously by appending a single
303303value to the reverse ODE, since we can simply define the new ODE term as
@@ -327,15 +327,15 @@ on-demand. There are three ways which this can be done:
327327 numerically this is unstable and thus not always recommended (ODEs are
328328 reversible, but ODE solver methods are not necessarily going to generate the
329329 same exact values or trajectories in reverse!)
330- 2. If you solve the forward ODE and receive a continuous solution $u(t)$, you
331- can interpolate it to retrieve the values at any given the time reverse pass
330+ 2. If you solve the forward ODE and receive a solution $u(t)$, you
331+ can interpolate it to retrieve the values at any time at which the reverse pass
332332 needs the $\frac{df}{du}$ Jacobian. This is fast but memory-intensive.
3333333. Every time you need a value $u(t)$ during the backpass, you re-solve the
334334 forward ODE to $u(t)$. This is expensive! Thus one can instead use
335- *checkpoints*, i.e. save at finitely many time points during the forward
335+ *checkpoints*, i.e. save at a smaller number of time points during the forward
336336 pass, and use those as starting points for the $u(t)$ calculation.
337337
338- Alternative strategies can be investigated, such as an interpolation which
338+ Alternative strategies can be investigated, such as an interpolation that
339339stores values in a compressed form.
340340
341341### The vjp and Neural Ordinary Differential Equations
@@ -348,11 +348,11 @@ backpass
348348$$\lambda^\prime = -\frac{df}{du}^\ast \lambda - \left(\frac{dg}{du} \right)^\ast$$
349349$$\lambda(T) = 0$$
350350
351- can be improved by noticing $\frac{df}{du}^\ast \lambda $ is a vjp, and thus it
351+ can be improved by noticing $\lambda^\ast \ frac{df}{du}$ is a vjp, and thus it
352352can be calculated using $\mathcal{B}_f^{u(t)}(\lambda^\ast)$, i.e. reverse-mode
353353AD on the function $f$. If $f$ is a neural network, this means that the reverse
354354ODE is defined through successive backpropagation passes of that neural network.
355- The result is a derivative with respect to the cost function of the parameters
355+ The result is a derivative of the cost function with respect to the parameters
356356defining $f$ (either a model or a neural network), which can then be used to
357357fit the data ("train").
358358
@@ -385,7 +385,7 @@ spline:
385385
386386
387387If that's the case, one can use the fit spline in order to estimate the derivative
388- at each point. Since the ODE is defined as $u^\prime = f(u,p,t)$, one then then
388+ at each point. Since the ODE is defined as $u^\prime = f(u,p,t)$, one can then
389389use the cost function
390390
391391$$C(p) = \sum_{i=1}^N \Vert\tilde{u}^{\prime}(t_i) - f(u(t_i),p,t)\Vert$$
0 commit comments