A detailed tutorial with python featuring the simple nonlinear pendulum
optimal control
indirect methods
numerical methods
shooting method
tutorial
pendulum
Author
Willem Esterhuizen
Published
April 18, 2026
Abstract
This article shows how one solves a continuous time optimal control problem via indirect single shooting, from analysing the problem via Pontryagin’s maximum principle to implementing a shooting algorithm to solve the obtained bounday value problem. Detailed numerics with code is also provided, get it all from this repo.
When faced with a continuous-time optimal control problem (OCP) it is rare to find a closed-form expression of its solution, so one must resort to numerical approximation via direct or indirect methods.
In direct methods, you discretise the problem’s dynamics to obtain a finite-dimensional nonlinear programme (NLP), which you can then solve with standard NLP solvers. This is the approach used in, for example, this article on optimal path planning for a wheeled robot.
In indirect methods, you first analyse the continuous-time problem using tools like Pontryagin’s maxiumum principle (the PMP) to obtain a boundary value problem (BVP). You then solve this BVP with numerical tools, for instance with (various) shooting methods.
Both methods have their strengths and weaknesses, see Table 1. Direct methods require less theoretical analysis upfront, are easier to set up, are excellent at dealing with state and control constraints, and tend to be more robust with respect to initial guess. However, you may get large computationally heavy nonlinear programmes, especially when the time horizon is long and the discretisation is very fine.
Indirect methods usually produce very accurate solutions, can be significantly faster with a good initial guess, and are well-suited for problems with long time horizons. However, they often require deep analysis and can be sensitive to the initial guess.
Table 1: Direct versus indirect methods
Feature
Direct Methods
Indirect Methods
Ease of setup
High
Low
Constraint handling
Excellent
Challenging
Robustness to initial guess
Good
Poor
Solution accuracy
Good (discretization dependent)
Excellent (when it converges)
Speed (long horizons)
Slower
Faster
Analytical insight
Limited
High
1.1 Article outline
This article goes through all the steps involved in solving an optimal control problem via indirect single shooting. The introductory section is finished off with a statement of Pontryagin’s maximum principle for easy reference. Section 2 covers the details of the nonlinear pendulum we will consider throughout the article. Section 3 presents a full analysis of an energy optimal control problem, showing how one eventually obtains a boundary value problem. Section 4 then describes how this boundary value problem can be solved via indirect single shooting. Section 5 presents some clarifying numerical experiments, and the article is concluded with remarks in Section 6. See Section 7 for references.
1.2 Pontragin’s maximum principle
Here follows a rigorous statement of the PMP for the particular problem we will consider. The formulation has a purely integral cost functional (thus, it is a Lagrange problem), no state and control constraints, and the dynamics and running cost are both smooth. For more general formulations and different flavours of OCPs, good references include L. Cesari (2012) (my favourite), L. Pontryagin et al. (1965), J. Macki and A. Strauss (2012) and D. Liberzon (2011).
Consider the following OCP, \[
\begin{align}
\min\limits_{(\mathbf{x}, \mathbf{u})\in\mathrm{AC}\times\mathcal{M}} \quad & J(\mathbf{x},\mathbf{u}):= \int_0^T \ell(\mathbf{x}(s), \mathbf{u}(s)) \, \mathrm{d}s,\\
\mathrm{subject\ to:}\quad & \dot{\mathbf{x}}(t) = f(\mathbf{x}(t),\mathbf{u}(t)), & \mathrm{a.e.}\,\, t \in[0,T], & \quad (1) \\
\quad & \mathbf{x}(0) = \mathbf{x}^0, & & \quad (2) \\
\quad & \mathbf{x}(T) = \mathbf{x}^{T}, & & \quad (3) \\
\quad & \mathbf{x}(t) \in \mathbb{R}^n, & \forall t \in[0,T], & \quad (4) \\
\quad & \mathbf{u}(t) \in \mathbb{R}^m, & \mathrm{a.e.}\,\, t \in[0,T], & \quad (5)
\end{align}
\] where \(t\geq 0\) denotes time, \(T\geq 0\) is the time horizon, \(\mathbf{x}:= (x_1, x_2,\dots, x_n)\in\mathbb{R}^n\) is the state and \(\mathbf{u}:=(u_1, u_2,\dots, u_m)^\top\in\mathbb{R}^m\) is the control. The initial and target states are indicated by \(\mathbf{x}^0\in\mathbb{R}^n\) and \(\mathbf{x}^T\in\mathbb{R}^n\), respectively.
With \(T\geq 0\), \[\mathrm{AC}([0,T];\mathbb{R}^n) := \{f:[0,T] \rightarrow \mathbb{R}^n : f \text{ absolutely continuous } \},\] and \[\mathcal{M}([0,T];\mathbb{R}^m) := \{f:[0,T] \rightarrow \mathbb{R}^m : f \text{ is Lebesgue measurable }\}.\] For brevity, we’ll sometimes use \(\mathrm{AC}:= \mathrm{AC}([0,T];\mathbb{R}^n)\) and \(\mathcal{M}:= \mathcal{M}([0,T];\mathbb{R}^m)\), the time horizon, \(T\), and dimension, \(m\) or \(n\), being clear from context. The acronym “\(\mathrm{a.e.}\,\, t \in[0,T]\)” stands for “almost every t in \([0,T]\)”, meaning that a property holds for all \(t\) in the interval \([0,T]\), except possibly on a set of Lebesgue measure zero.
Note
Throughout this article we’ll work with absolutely continuous state trajectories and Lebesgue-measurable control functions, as these are the standard classes used in rigorous optimal control theory.
If you are unfamiliar with these concepts, you can safely replace “absolutely continuous” with “continuous” for the state, and “Lebesgue-measurable” with “piecewise continuous” for the control. This approximation is sufficient for understanding the core ideas. I highly recommend Appendix C of E. Sontag (2013) if you would like to delve into these topics.
The function \(f:\mathbb{R}^n\times \mathbb{R}^m\rightarrow\mathbb{R}^n\) is the dynamics, assumed to be \(C^\infty\) and \(J:\mathrm{AC}\times\mathcal{M}\rightarrow \mathbb{R}_{\geq 0}\) is the cost functional, where \(\ell:\mathbb{R}^n\times\mathbb{R}^m\rightarrow\mathbb{R}_{\geq 0}\) is the running cost, assumed to be \(C^\infty\) and integrable.
Let all admissible pairs be denoted by \(\Omega\), \[
\Omega := \{(\mathbf{x}, \mathbf{u}) \in \mathrm{AC}\times \mathcal{M}: \ (1) - (5) \ \mathrm{ hold}\}.
\] Throughout we’ll assume that there exists a pair \((\mathbf{x}^\star, \mathbf{u}^\star)\in\Omega\) that minimises the cost functional, that is, \[
J(\mathbf{x}^\star, \mathbf{u}^\star) \leq J(\mathbf{x}, \mathbf{u}),\quad \forall\, (\mathbf{x}, \mathbf{u}) \in \Omega.
\] We’ll call \((\mathbf{x}^\star, \mathbf{u}^\star)\) an optimal pair. We also assume that the optimal control, \(\mathbf{u}^\star\), is bounded.
To state the PMP, we need to introduce the co-state, \(\boldsymbol{\lambda}:= (\lambda_1,\lambda_2,\dots,\lambda_n) \in \mathbb{R}^n\) and the Hamiltonian function, \(H:\mathbb{R}^n \times \mathbb{R}^n \times \mathbb{R}^m \rightarrow \mathbb{R}\), \[
H(\mathbf{x}, \boldsymbol{\lambda}, \mathbf{u}) := \boldsymbol{\lambda}^\top f(\mathbf{x}, \mathbf{u}) + \ell(\mathbf{x}, \mathbf{u}).
\] The partial derivatives of the Hamiltonian with respect to the state and control (column vectors) are denoted by: \[
H_{\mathbf{x}}(\mathbf{x}, \boldsymbol{\lambda}, \mathbf{u}) := \frac{\partial H}{\partial \mathbf{x}}(\mathbf{x}, \boldsymbol{\lambda}, \mathbf{u}), \qquad
H_{\mathbf{u}}(\mathbf{x}, \boldsymbol{\lambda}, \mathbf{u}) := \frac{\partial H}{\partial \mathbf{u}}(\mathbf{x}, \boldsymbol{\lambda}, \mathbf{u}).
\]
Statement of the PMP
Pontryagin’s maximum principle (fixed states & horizon, integral cost)
Suppose \((\mathbf{x}^\star, \mathbf{u}^\star)\) is an optimal pair. Then:
There exists a nonzero co-state, \(\boldsymbol{\lambda}^{\star}\in \mathrm{AC}([0,T];\mathbb{R}^n)\), which satisfies the co-state equation, \[
\dot{\boldsymbol{\lambda}}^{\star}(t) = -H_{\mathbf{x}}(\mathbf{x}^{\star}(t),\boldsymbol{\lambda}^{\star}(t), \mathbf{u}^{\star}(t)),\quad \mathrm{a.e.}\quad t\in[0,T].
\]
For a.e. \(t\in[0,T]\) the Hamiltonian as a function of \(\mathbf{u}\) is minimised, \[
H(\mathbf{x}^{\star}(t),\boldsymbol{\lambda}^\star(t),\mathbf{u}^{\star}(t)) = \min_{\mathbf{u}\in \mathbb{R}^m} H(\mathbf{x}^{\star}(t),\boldsymbol{\lambda}^\star(t),\mathbf{u}),\quad \mathrm{a.e.}\quad t\in[0,T].
\]
Moreover, because the initial and final states, \(\mathbf{x}^{\star}(0)\) and \(\mathbf{x}^{\star}(T)\), are fixed, the initial and final co-states, \(\boldsymbol{\lambda}^{\star}(0)\) and \(\boldsymbol{\lambda}^{\star}(T)\) are free.
Note that I have used \(\mathbf{u}\) to indicate a control function, \(\mathbf{u}\in\mathcal{M}\), as well as an \(m\)-dimensional control vector, \(\mathbf{u}\in\mathbb{R}^m\), but this will hopefully not be confusing.
Note that the co-states being “free” does not mean that they can take on any value. It means that they are not known beforehand, and that their values must be determined to solve the OCP. This is exactly the boundary value problem that we will solve via shooting.
2 The pendulum’s equations of motion
Throughout the article we’ll consider the well-known differential equations of a forced pendulum, \[
\begin{align}
\dot x_1(t) &= x_2(t), & \quad (6)\\
\dot x_2(t) &= -\frac{g}{L} \sin(x_1(t)) - b x_2(t) + u(t), & \quad (7)
\end{align}
\] where \(t\geq 0\) indicates time, \(x_1\in\mathbb{R}\) is pendulum’s angle, \(x_2\in\mathbb{R}\) is its angular velocity and \(u\in\mathbb{R}\) is the applied torque, which we can control, see Figure 1. The pendulum’s length is denoted by \(L>0\), \(b\geq 0\) is a coefficient modelling losses due to friction and \(g = 9.81\) is the acceleration due to gravity on Earth.
We’ll let \(\mathbf{x}:= (x_1, x_2)^\top\in\mathbb{R}^2\), and define \(f:\mathbb{R}^2\times\mathbb{R}\rightarrow \mathbb{R}^2\) as, \[
f(\mathbf{x},u) :=
\left(
\begin{array}{c}
x_2 \\
-\frac{g}{L} \sin(x_1) - b x_2 + u \\
\end{array}
\right)
\] so that we may write the system (6)-(7) as, \[
\dot{\mathbf{x}}(t) = f(\mathbf{x}(t), u(t)),\quad t\geq 0.
\] A controlled equilibrium point, \(\mathbf{x}^{\mathrm{eq}}:=(x_1^{\mathrm{eq}}, x_2^{\mathrm{eq}})^\top\in\mathbb{R}^2\), is a state for which there exists an input, \(u^{\mathrm{eq}}\in\mathbb{R}\), such that \(f(\mathbf{x}^{\mathrm{eq}}, u^{eq}) = 0\).
Figure 1: The nonlinear pendulum
3 The energy optimal problem
Consider the pendulum, (6)-(7). Given an initial state, \(\mathbf{x}^0\), a controlled-equilibrium, \(\mathbf{x}^{\mathrm{eq}}\), and a time horizon, \(T>0\), find the pair \((\mathbf{x}^\star, u^\star)\), that minimises the energy of the control signal, \(\int_0^T u^2(s)\,\mathrm{d}s\).
Referring to Section 1.2, we have a problem where the initial and final states are fully specified, the control is unrestricted, the cost functional has no terminal cost, and the time horizon is fixed.
The problem’s Hamiltonian reads, \[
H(\mathbf{x},\boldsymbol{\lambda},u) = \lambda_1 x_2 + \lambda_2\left( -\frac{g}{L}\sin(x_1) - bx_2 + u \right) + \frac{1}{2}u^2,
\] and the optimal control, \(u^\star\), must satisfy, \[
u^\star(t) = \arg\min_{u \in \mathbb{R}} H(\mathbf{x}(t), \boldsymbol{\lambda}(t), u)\quad \mathrm{a.e.}\,\, t \in [0,T].
\] Because the control is unrestricted we can simply consider the equation: \[
H_{u}(\mathbf{x}(t),\boldsymbol{\lambda}(t),u) = \lambda_2(t) + u = 0,
\] and conclude that, \[
u^{\star}(t) = -\lambda_2(t), \quad \,\, t \in [0,T].
\] The co-state system reads, \[
\dot{\boldsymbol{\lambda}}(t) = -H_{\mathbf{x}}(\mathbf{x}(t),\boldsymbol{\lambda}(t),u(t)) =
\left(
\begin{array}{c}
\frac{g}{L}\cos(x_1(t))\lambda_2(t)\\
-\lambda_1(t) + b \lambda_2(t)
\end{array}
\right),
\] with \(\boldsymbol{\lambda}(0)\) and \(\boldsymbol{\lambda}(T)\) free.
The boundary value problem
Our analysis via the PMP concludes that to solve the energy optimal problem we only need to determine the initial and final co-states. More specifically, we need to solve the following boundary value problem (BVP).
\[
\mathrm{BVP}
\begin{cases}
\mathrm{find}: & (\boldsymbol{\lambda}^0)^{\star}, (\boldsymbol{\lambda}^T)^{\star}\in\mathbb{R}^2\quad & & \\
\mathrm{such\, that}: & \dot{\mathbf{x}}(t) = f(\mathbf{x}(t),u(t)), \quad & \,\, \forall\, t \in[0,T], & \quad (8)\\
\quad &\dot{\boldsymbol{\lambda}}(t) = -H_{\mathbf{x}}(\mathbf{x}(t), \boldsymbol{\lambda}(t), u(t)), \quad & \,\,\forall\, t \in[0,T], & \quad (9)\\
\quad &\mathbf{x}(0) = \mathbf{x}^0, & & \quad (10)\\
\quad & \boldsymbol{\lambda}(0) = (\boldsymbol{\lambda}^0)^{\star}, & & \quad (11)\\
\quad & u(t) = -\lambda_2(t), \quad & \,\, \forall\, t \in[0,T], & \quad (12)\\
\quad & \mathbf{x}(T) = \mathbf{x}^{\mathrm{eq}}, & & \\
\quad & \boldsymbol{\lambda}(T) = (\boldsymbol{\lambda}^T)^{\star}. & &
\end{cases}
\] Recall that \(\mathbf{x}^0\), \(\mathbf{x}^{\mathrm{eq}}\) and \(T\) are given, fixed values. We’ll let \((\boldsymbol{\lambda}^0)^\star\) and \((\boldsymbol{\lambda}^T)^\star\) indicate the initial and final co-states that solve the BVP. Given an arbitrary \(\mathbf{x}^0\) and \(\boldsymbol{\lambda}^0\), let \(\mathbf{x}(t;\mathbf{x}^0, \boldsymbol{\lambda}^0)\) denote the state at time \(t\in[0,T]\) obtained from integrating (8)-(9) from \(t=0\) with \(\mathbf{x}(0) = \mathbf{x}^0\), \(\boldsymbol{\lambda}(0) = \boldsymbol{\lambda}^0\) and \(u(t) = -\lambda_2(t)\).
4 Indirect single shooting
Loosely speaking, the idea behind solving a BVP via shooting is as follows. You start by guessing the value of \((\boldsymbol{\lambda}^0)^{\star}\), call this guess \(\hat{\boldsymbol{\lambda}}^0\), and integrate the extended system (8)-(12) from \(t=0\) to \(t=T\) (i.e., you shoot). You then see how close the resulting state at \(t=T\) is to the desired target, \(\mathbf{x}^T\). Then you adjust your original guess \(\hat{\boldsymbol{\lambda}}^0\), shoot again, adjust \(\hat{\boldsymbol{\lambda}}^0\), and iterate in this manner until you are happy that the resulting final state is close enough to the target. There are various ways to “adjust” your guess, see for example Chapter 7 of A. Bryson and Y-C. Ho (1975). A common way is via Newton’s method, which we will code up in the numerics section. The optimal pair \((\mathbf{x}^\star, u^\star)\) (or rather, a good enough approximation), which we actually seek, is then the resulting state and control trajectories that result from the obtained initial costate. (For our problem, we don’t really care about the final costate, \((\boldsymbol{\lambda}^T)^\star\).) Note that, depending on the problem and the information you have, you might instead want to shoot backwards, starting from the final costate.
More concretely for our problem, define the shooting function, \(\mathbf{S}:\mathbb{R}^2\rightarrow \mathbb{R}^2\), \[
\mathbf{S}({\hat{\boldsymbol{\lambda}}^0}) :=
\left(
\begin{array}{c}
x_1(T;\mathbf{x}^{0},\hat{\boldsymbol{\lambda}}^0) - x_1^{\mathrm{eq}}\\
x_2(T;\mathbf{x}^{0},\hat{\boldsymbol{\lambda}}^0) - x_2^{\mathrm{eq}}
\end{array}
\right).
\] This tells us how far away from the desired equilibrium point we are if we solve the system (8)-(12) with the initial co-state, \(\hat{\boldsymbol{\lambda}}^0\). Let \(\mathbf{J}_{\mathbf{S}}(\hat{\boldsymbol{\lambda}}^0)\in\mathbb{R}^{2\times 2}\) denote the Jacobian of \(\mathbf{S}\) w.r.t. \(\boldsymbol{\lambda}^0\), evaluated at an arbitrary \(\hat{\boldsymbol{\lambda}}^0\in\mathbb{R}^2\), that is, \[
\mathbf{J}_{\mathbf{S}}(\hat{\boldsymbol{\lambda}}^0) = \frac{\partial \mathbf{S}}{\partial \boldsymbol{\lambda}^0}(\hat{\boldsymbol{\lambda}}^0) =
\left(
\begin{array}{cc}
\frac{\partial x_1(T;\mathbf{x}^0,\hat{\boldsymbol{\lambda}}^0)}{\partial \lambda_1^0} & \frac{\partial x_1(T;\mathbf{x}^0,\hat{\boldsymbol{\lambda}}^0)}{\partial \lambda_2^0} \\
\frac{\partial x_2(T;\mathbf{x}^0,\hat{\boldsymbol{\lambda}}^0)}{\partial \lambda_1^0} & \frac{\partial x_2(T;\mathbf{x}^0,\hat{\boldsymbol{\lambda}}^0)}{\partial \lambda_2^0}
\end{array}
\right),
\] and let \(\mathbf{J}_{\mathbf{S}}^{-1}(\hat{\boldsymbol{\lambda}}^0)\) denote the inverse of this matrix.
Output: Approximate solution, \((\boldsymbol{\lambda}^0)^{\varepsilon}\) to (BVP).
Let \(\hat{\boldsymbol{\lambda}}^0\gets\boldsymbol{\lambda}^0_{\mathrm{init}}\).
Integrate (8)-(12) to get \(\mathbf{x}(T;\mathbf{x}^0,\hat{\boldsymbol{\lambda}}^0)\) and thus \(\mathbf{S}(\hat{\boldsymbol{\lambda}}^0)\).
If \(\mathbf{S}(\hat{\boldsymbol{\lambda}}^0)>\varepsilon\):
3.1 Let \(\hat{\boldsymbol{\lambda}}^0 \gets \hat{\boldsymbol{\lambda}}^0 - \alpha\mathbf{J}_{\mathbf{S}}^{-1}(\hat{\boldsymbol{\lambda}}^0) \mathbf{S}(\hat{\boldsymbol{\lambda}}^0)\).
3.2 Go to 2.
Let \((\boldsymbol{\lambda}^0)^{\varepsilon} \gets \hat{\boldsymbol{\lambda}}^0\)
The state-control pair, \((\mathbf{x}, \mathbf{u})\), that satisfies (8)-(12) with \((\boldsymbol{\lambda}^0)^{\varepsilon}\) is then an approximate solution to the original OCP.
4.1 Some comments
Step 3.1 is a damped Newton step, where \(\alpha\) determines the damping. Having this scaling sometimes helps with convergence, thus making the approach more robust.
Probably the biggest weakness of the shooting method, as presented here, is that convergence can be very sensitive to the initial guess you make at the costate. Moreover, as with our specific OCP, you often have no physical intuition about what the costate means, meaning you really are stabbing in the dark. In the numerics section to come, I often struggled to find a good starting \(\boldsymbol{\lambda}^0_{\mathrm{init}}\), as it changes with every adjustment you make in the problem, like the intial and target states, and horizon length. I also had to play around with the damping, \(\alpha\). This process, unfortunately, can be a bit of an art. A useful technique I sometimes use is to find the costate for an easy problem, and then use that solution as a warm start for a slightly different problem. I then perturb the problem slowly until I get to the one I originally cared about (this is a type of homotopy method).
Note that the inverse of the Jacobian, \(\mathbf{J}_{\mathbf{S}}^{-1}\), must exist at every iteration of the Newton update, which is of course not guaranteed for every optimal control problem. The Jacobian can become singular when perturbations of the initial costate (or other shooting variables) no longer produce independent variations in the terminal state. This can happen, for example, when the associated extremal trajectory contains a so-called conjugate point, when boundary conditions are not independent, or when singular arcs or abnormal extremals occur. Moreover, for some problems the Jacobian is not square. In this case, one may instead employ the Gauss–Newton method, which replaces the inverse of the Jacobian with its Moore–Penrose pseudoinverse.
If you are interested in some of these issues as well as more advanced techniques, you can consult J. Bonnans (2013), M. Aronna, F. Bonnans, and P. Martinon (2013) and their references.
5 Numerics
In this section we’ll code up a solver for the energy-optimal problem and do some experiments. You can show/hide the code in each block by clicking the small triangle next to the word “Code”.
5.1 Coding up a solver in python
First, we need to set up some symbolic variables and functions
def create_shooting_residual_and_jacobian_inv(params):# we need to use `p` because `lambda` is taken p_init = ca.MX.sym('p_init', 2) # Symbolic unknown: initial costate z0 = ca.vertcat(params['x_init'], p_init) # Initial augmented state # Discretized dynamics function z_sym = ca.MX.sym('z', 4) z_next, _ = get_next_step_pendulum_extended(z_sym, params) F = ca.Function('F', [z_sym], [z_next], ['z'], ['z_next'])# Forward simulation (still symbolic!) z = z0 N =int(np.floor(params['T'] / params['h']))for k inrange(N): z = F(z)# Final residual x1_final = z[0] x2_final = z[1] S = ca.vertcat(x1_final - params['x_target'][0], x2_final - params['x_target'][1])# Now take symbolic Jacobian ∂g / ∂p_init → 2 × 2 matrix Jacobian_of_S_wrt_p_init = ca.jacobian(S, p_init) J_inv = ca.inv(Jacobian_of_S_wrt_p_init) jac_inv_fun = ca.Function('jac_inv', [p_init], [J_inv], ['p_init'], ['J_inv'] )# CasADi functions for numeric evaluation later# Residual evaluator S_fun = ca.Function('residual', [p_init], [S], ['p_init'], ['S'] )return S_fun, jac_inv_fun
Code
def get_next_step_pendulum_extended(z, params, simulate_flag=False): x1 = z[0] x2 = z[1] p1 = z[2] p2 = z[3] b = params['b'] L = params['L'] g = params['g'] h = params['h'] u =- p2# RK4 k1 = extended_system_pendulum(z, u, params, simulate_flag) k2 = extended_system_pendulum(z + h*(k1/2), u, params, simulate_flag) k3 = extended_system_pendulum(z + h*(k2/2), u, params, simulate_flag) k4 = extended_system_pendulum(z + h*k3, u, params, simulate_flag)return z + (h/6)*(k1 +2*k2 +2*k3 + k4), udef extended_system_pendulum(z, u, params, simulate_flag): x1 = z[0] x2 = z[1] p1 = z[2] p2 = z[3]if simulate_flag isFalse:return ca.vertcat(x2,-(params['g']/params['L']) * ca.sin(x1) - params['b'] * x2 + u, params['g']/params['L'] * ca.cos(x1) * p2,-p1 + params['b'] * p2 )else:return np.array([ x2,-(params['g']/params['L']) * ca.sin(x1) - params['b'] * x2 + u, params['g']/params['L'] * ca.cos(x1) * p2,-p1 + params['b'] * p2]).reshape(4, 1)
We now have two symbolic functions, S_fun and jac_inv_fun, implementing the shooting function, \(\mathbf{S}\), and inverse of the Jacobian, \(\mathbf{J}_{\mathbf{S}}^{-1}\), respectively.
Next, we easily code up Newton’s method. Recall that we’ve set the initial and target states to be \(\mathbf{x}^0 = (0,0)^\top\) and \(\mathbf{x}^T = (\pi,0)^\top\).
Code
p_init = np.array([[0], [0]]) # seems to work best as an initial guesstol =1e-6p = p_initalpha =0.1i =0no_iter =200while (np.linalg.norm(S_fun(p).full()) > tol) and (i < no_iter): step = ca.mtimes(jac_inv_fun(p), S_fun(p)) p = p - alpha * step i +=1print(f'Iterations = {i}')print(f'p_init optimal = {p}')print(f'||S|| = {np.linalg.norm(S_fun(p))}')
It’s interesting how the optimal control lets the pendulum swing a bit before it then drives it to the target state. Let’s see how the solution behaves with a long horizon.
Code
params = {'g': 9.81,'b': 0.1,'L': 1,'T': 15, # long horizon'h': 0.1,'x_init': np.array([[0], [0]]), # hanging down'x_target': np.array([[ca.pi], [0]]) # upright position}S_fun, jac_inv_fun = create_shooting_residual_and_jacobian_inv(params)p_init = np.array([[0], [0]]) # seems to work best as an initial guesstol =1e-6p = p_initalpha =0.1i =0no_iter =200while (np.linalg.norm(S_fun(p).full()) > tol) and (i < no_iter): step = ca.mtimes(jac_inv_fun(p), S_fun(p)) p = p - alpha * step i +=1print(f'Iterations = {i}')print(f'p_init optimal = {p}')print(f'||S|| = {np.linalg.norm(S_fun(p))}')
Wow this is maybe a little counter-intuitive. One might expect the optimal solution for a long horizon to simply be a scaled version of some short-horizon solution. But no!
Instead, the optimiser reveals that for longer horizons it is optimal to initially force the pendulum to oscillate before applying the final control action that makes it reach the upright target. The state space curve beautifully illustrates this: instead of a direct path towards the origin, we see a bunch of loops before the trajectory converges to the target. Clearly, the energy lost due to friction while oscillating is less than the energy you would lose if you “brute forced” the system with a scaled short-horizon solution.
Of course, the conditions of the PMP are necessary and we might have found a pair \((\mathbf{x}, \mathbf{u})\) that is merely locally optimal. More investigation might be required, but I’ll leave it to the reader “as an excercise” :).
6 Conclusion
We’ve gone through the steps of how to solve a continuous time optimal control problem via indirect shooting in detail. We considered the nonlinear pendulum, defined the optimal control problem we wanted to solve, analysed it via Pontryagin’s maximum principle to obtain a boundary value problem, and then numerically solved it via Newton’s method. The code we wrote can easily be adapted to consider other systems.
7 References
A. Bryson, and Y-C. Ho. 1975. Applied Optimal Control: Optimization, Estimization, and Control. Taylor & Francis.
D. Liberzon. 2011. Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton university press.
E. Sontag. 2013. Mathematical Control Theory: Deterministic Finite Dimensional Systems. Vol. 6. Springer Science & Business Media.
J. Bonnans. 2013. “The Shooting Approach to Optimal Control Problems.”IFAC Proceedings Volumes 46 (11): 281–92.
J. Macki, and A. Strauss. 2012. Introduction to Optimal Control Theory. Springer Science & Business Media.
L. Cesari. 2012. Optimization—Theory and Applications: Problems with Ordinary Differential Equations. Springer Science & Business Media.
L. Pontryagin, V. Boltyanskii, R. Gamkrelidze, and E. Mishchenko. 1965. The Mathematical Theory of Optimal Processes. John Wiley & sons.
M. Aronna, F. Bonnans, and P. Martinon. 2013. “A Shooting Algorithm for Optimal Control Problems with Singular Arcs.”Journal of Optimization Theory and Applications 158 (2): 419–59.