Abstract

This article shows how one solves a continuous time optimal control problem via indirect single shooting, from analysing the problem via Pontryagin’s maximum principle to implementing a shooting algorithm to solve the obtained bounday value problem. Detailed numerics with code is also provided, get it all from this repo.

1 Introduction

When faced with a continuous-time optimal control problem (OCP) it is rare to find a closed-form expression of its solution, so one must resort to numerical approximation via direct or indirect methods.

In direct methods, you discretise the problem’s dynamics to obtain a finite-dimensional nonlinear programme (NLP), which you can then solve with standard NLP solvers. This is the approach used in, for example, this article on optimal path planning for a wheeled robot.

In indirect methods, you first analyse the continuous-time problem using tools like Pontryagin’s maxiumum principle (the PMP) to obtain a boundary value problem (BVP). You then solve this BVP with numerical tools, for instance with (various) shooting methods.

Both methods have their strengths and weaknesses, see Table 1. Direct methods require less theoretical analysis upfront, are easier to set up, are excellent at dealing with state and control constraints, and tend to be more robust with respect to initial guess. However, you may get large computationally heavy nonlinear programmes, especially when the time horizon is long and the discretisation is very fine.

Indirect methods usually produce very accurate solutions, can be significantly faster with a good initial guess, and are well-suited for problems with long time horizons. However, they often require deep analysis and can be sensitive to the initial guess.

Table 1: Direct versus indirect methods

Feature	Direct Methods	Indirect Methods
Ease of setup	High	Low
Constraint handling	Excellent	Challenging
Robustness to initial guess	Good	Poor
Solution accuracy	Good (discretization dependent)	Excellent (when it converges)
Speed (long horizons)	Slower	Faster
Analytical insight	Limited	High

1.1 Article outline

This article goes through all the steps involved in solving an optimal control problem via indirect single shooting. The introductory section is finished off with a statement of Pontryagin’s maximum principle for easy reference. Section 2 covers the details of the nonlinear pendulum we will consider throughout the article. Section 3 presents a full analysis of an energy optimal control problem, showing how one eventually obtains a boundary value problem. Section 4 then describes how this boundary value problem can be solved via indirect single shooting. Section 5 presents some clarifying numerical experiments, and the article is concluded with remarks in Section 6. Here are links to the references and the comments

1.2 Pontragin’s maximum principle

Here follows a rigorous statement of the PMP for the particular problem we will consider. The formulation has a purely integral cost functional (thus, it is a Lagrange problem), no state and control constraints, and the dynamics and running cost are both smooth. For more general formulations and different flavours of OCPs, good references include L. Cesari (2012) (my favourite), L. Pontryagin et al. (1965), J. Macki and A. Strauss (2012) and D. Liberzon (2011).

Consider the following OCP, \[ \begin{align} \min\limits_{(\mathbf{x}, \mathbf{u})\in\mathrm{AC}\times\mathcal{M}} \quad & J(\mathbf{x},\mathbf{u}):= \int_0^T \ell(\mathbf{x}(s), \mathbf{u}(s)) \, \mathrm{d}s,\\ \mathrm{subject\ to:}\quad & \dot{\mathbf{x}}(t) = f(\mathbf{x}(t),\mathbf{u}(t)), & \mathrm{a.e.}\,\, t \in[0,T], & \quad (1) \\ \quad & \mathbf{x}(0) = \mathbf{x}^0, & & \quad (2) \\ \quad & \mathbf{x}(T) = \mathbf{x}^{T}, & & \quad (3) \\ \quad & \mathbf{x}(t) \in \mathbb{R}^n, & \forall t \in[0,T], & \quad (4) \\ \quad & \mathbf{u}(t) \in \mathbb{R}^m, & \mathrm{a.e.}\,\, t \in[0,T], & \quad (5) \end{align} \] where \(t\geq 0\) denotes time, \(T\geq 0\) is the time horizon, \(\mathbf{x}:= (x_1, x_2,\dots, x_n)\in\mathbb{R}^n\) is the state and \(\mathbf{u}:=(u_1, u_2,\dots, u_m)^\top\in\mathbb{R}^m\) is the control. The initial and target states are indicated by \(\mathbf{x}^0\in\mathbb{R}^n\) and \(\mathbf{x}^T\in\mathbb{R}^n\), respectively.

With \(T\geq 0\), \[\mathrm{AC}([0,T];\mathbb{R}^n) := \{f:[0,T] \rightarrow \mathbb{R}^n : f \text{ absolutely continuous } \},\] and \[\mathcal{M}([0,T];\mathbb{R}^m) := \{f:[0,T] \rightarrow \mathbb{R}^m : f \text{ is Lebesgue measurable }\}.\] For brevity, we’ll sometimes use \(\mathrm{AC}:= \mathrm{AC}([0,T];\mathbb{R}^n)\) and \(\mathcal{M}:= \mathcal{M}([0,T];\mathbb{R}^m)\), the time horizon, \(T\), and dimension, \(m\) or \(n\), being clear from context. The acronym “\(\mathrm{a.e.}\,\, t \in[0,T]\)” stands for “almost every t in \([0,T]\)”, meaning that a property holds for all \(t\) in the interval \([0,T]\), except possibly on a set of Lebesgue measure zero.

Note

Throughout this article we’ll work with absolutely continuous state trajectories and Lebesgue-measurable control functions, as these are the standard classes used in rigorous optimal control theory.

If you are unfamiliar with these concepts, you can safely replace “absolutely continuous” with “continuous” for the state, and “Lebesgue-measurable” with “piecewise continuous” for the control. This approximation is sufficient for understanding the core ideas. I highly recommend Appendix C of E. Sontag (2013) if you would like to delve into these topics.

The function \(f:\mathbb{R}^n\times \mathbb{R}^m\rightarrow\mathbb{R}^n\) is the dynamics, assumed to be \(C^\infty\) and \(J:\mathrm{AC}\times\mathcal{M}\rightarrow \mathbb{R}_{\geq 0}\) is the cost functional, where \(\ell:\mathbb{R}^n\times\mathbb{R}^m\rightarrow\mathbb{R}_{\geq 0}\) is the running cost, assumed to be \(C^\infty\) and integrable.

Let all admissible pairs be denoted by \(\Omega\), \[ \Omega := \{(\mathbf{x}, \mathbf{u}) \in \mathrm{AC}\times \mathcal{M}: \ (1) - (5) \ \mathrm{ hold}\}. \] Throughout we’ll assume that there exists a pair \((\mathbf{x}^\star, \mathbf{u}^\star)\in\Omega\) that minimises the cost functional, that is, \[ J(\mathbf{x}^\star, \mathbf{u}^\star) \leq J(\mathbf{x}, \mathbf{u}),\quad \forall\, (\mathbf{x}, \mathbf{u}) \in \Omega. \] We’ll call \((\mathbf{x}^\star, \mathbf{u}^\star)\) an optimal pair. We also assume that the optimal control, \(\mathbf{u}^\star\), is bounded.

To state the PMP, we need to introduce the co-state, \(\boldsymbol{\lambda}:= (\lambda_1,\lambda_2,\dots,\lambda_n) \in \mathbb{R}^n\) and the Hamiltonian function, \(H:\mathbb{R}^n \times \mathbb{R}^n \times \mathbb{R}^m \rightarrow \mathbb{R}\), \[ H(\mathbf{x}, \boldsymbol{\lambda}, \mathbf{u}) := \boldsymbol{\lambda}^\top f(\mathbf{x}, \mathbf{u}) + \ell(\mathbf{x}, \mathbf{u}). \] The partial derivatives of the Hamiltonian with respect to the state and control (column vectors) are denoted by: \[ H_{\mathbf{x}}(\mathbf{x}, \boldsymbol{\lambda}, \mathbf{u}) := \frac{\partial H}{\partial \mathbf{x}}(\mathbf{x}, \boldsymbol{\lambda}, \mathbf{u}), \qquad H_{\mathbf{u}}(\mathbf{x}, \boldsymbol{\lambda}, \mathbf{u}) := \frac{\partial H}{\partial \mathbf{u}}(\mathbf{x}, \boldsymbol{\lambda}, \mathbf{u}). \]

Statement of the PMP

Pontryagin’s maximum principle (fixed states & horizon, integral cost)

Suppose \((\mathbf{x}^\star, \mathbf{u}^\star)\) is an optimal pair. Then:

There exists a nonzero co-state, \(\boldsymbol{\lambda}^{\star}\in \mathrm{AC}([0,T];\mathbb{R}^n)\), which satisfies the co-state equation, \[ \dot{\boldsymbol{\lambda}}^{\star}(t) = -H_{\mathbf{x}}(\mathbf{x}^{\star}(t),\boldsymbol{\lambda}^{\star}(t), \mathbf{u}^{\star}(t)),\quad \mathrm{a.e.}\quad t\in[0,T]. \]
For a.e. \(t\in[0,T]\) the Hamiltonian as a function of \(\mathbf{u}\) is minimised, \[ H(\mathbf{x}^{\star}(t),\boldsymbol{\lambda}^\star(t),\mathbf{u}^{\star}(t)) = \min_{\mathbf{u}\in \mathbb{R}^m} H(\mathbf{x}^{\star}(t),\boldsymbol{\lambda}^\star(t),\mathbf{u}),\quad \mathrm{a.e.}\quad t\in[0,T]. \]

Moreover, because the initial and final states, \(\mathbf{x}^{\star}(0)\) and \(\mathbf{x}^{\star}(T)\), are fixed, the initial and final co-states, \(\boldsymbol{\lambda}^{\star}(0)\) and \(\boldsymbol{\lambda}^{\star}(T)\) are free.

Note that I have used \(\mathbf{u}\) to indicate a control function, \(\mathbf{u}\in\mathcal{M}\), as well as an \(m\)-dimensional control vector, \(\mathbf{u}\in\mathbb{R}^m\), but this will hopefully not be confusing.

Note that the co-states being “free” does not mean that they can take on any value. It means that they are not known beforehand, and that their values must be determined to solve the OCP. This is exactly the boundary value identification problem that we will solve via shooting.

2 The pendulum’s equations of motion

Throughout the article we’ll consider the well-known differential equations of a forced pendulum, \[ \begin{align} \dot x_1(t) &= x_2(t), & \quad (6)\\ \dot x_2(t) &= -\frac{g}{L} \sin(x_1(t)) - b x_2(t) + u(t), & \quad (7) \end{align} \] where \(t\geq 0\) indicates time, \(x_1\in\mathbb{R}\) is pendulum’s angle, \(x_2\in\mathbb{R}\) is its angular velocity and \(u\in\mathbb{R}\) is the applied torque, which we can control, see Figure 1. The pendulum’s length is denoted by \(L>0\), \(b\geq 0\) is a coefficient modelling losses due to friction and \(g = 9.81\) is the acceleration due to gravity on Earth.

We’ll let \(\mathbf{x}:= (x_1, x_2)^\top\in\mathbb{R}^2\), and define \(f:\mathbb{R}^2\times\mathbb{R}\rightarrow \mathbb{R}^2\) as, \[ f(\mathbf{x},u) := \left( \begin{array}{c} x_2 \\ -\frac{g}{L} \sin(x_1) - b x_2 + u \\ \end{array} \right) \] so that we may write the system (6)-(7) as, \[ \dot{\mathbf{x}}(t) = f(\mathbf{x}(t), u(t)),\quad t\geq 0. \] A controlled equilibrium point, \(\mathbf{x}^{\mathrm{eq}}:=(x_1^{\mathrm{eq}}, x_2^{\mathrm{eq}})^\top\in\mathbb{R}^2\), is a state for which there exists an input, \(u^{\mathrm{eq}}\in\mathbb{R}\), such that \(f(\mathbf{x}^{\mathrm{eq}}, u^{eq}) = 0\).

Figure 1: The nonlinear pendulum

3 The energy optimal problem

Consider the pendulum, (6)-(7). Given an initial state, \(\mathbf{x}^0\), a controlled-equilibrium, \(\mathbf{x}^{\mathrm{eq}}\), and a time horizon, \(T>0\), find the pair \((\mathbf{x}^\star, u^\star)\), that minimises the energy of the control signal, \(\int_0^T u^2(s)\,\mathrm{d}s\).

The mathematical formulation reads, \[ \mathrm{OCP} \begin{cases} \min\limits_{\mathbf{x},u} \quad & \frac{1}{2}\int_0^T u^2(s)\, \mathrm{d}s,\\ \mathrm{subject\, to:}\quad & \dot{\mathbf{x}}(t) = f(\mathbf{x}(t),u(t)), \quad & \mathrm{a.e.}\,\, t \in[0,T], & \\ \quad & \mathbf{x}(0) = \mathbf{x}^0, & & \\ \quad & \mathbf{x}(T) = \mathbf{x}^{\mathrm{eq}}, & & \\ \quad & u(t) \in \mathbb{R}, \quad & \mathrm{a.e.}\,\, t \in[0,T], & \\ \quad & \mathbf{x}(t) \in \mathbb{R}^2, \quad & \forall\,\, t \in[0,T], & \\ \quad & (\mathbf{x}, u) \in \mathrm{AC}\times \mathcal{M}. \end{cases} \]

3.1 Analysis via the PMP

Referring to Section 1.2, we have a problem where the initial and final states are fully specified, the control is unrestricted, the cost functional has no terminal cost, and the time horizon is fixed.

The problem’s Hamiltonian reads, \[ H(\mathbf{x},\boldsymbol{\lambda},u) = \lambda_1 x_2 + \lambda_2\left( -\frac{g}{L}\sin(x_1) - bx_2 + u \right) + \frac{1}{2}u^2, \] and the optimal control, \(u^\star\), must satisfy, \[ u^\star(t) = \arg\min_{u \in \mathbb{R}} H(\mathbf{x}(t), \boldsymbol{\lambda}(t), u)\quad \mathrm{a.e.}\,\, t \in [0,T]. \] Because the control is unrestricted we can simply consider the equation: \[ H_{u}(\mathbf{x}(t),\boldsymbol{\lambda}(t),u) = \lambda_2(t) + u = 0, \] and conclude that, \[ u^{\star}(t) = -\lambda_2(t), \quad \,\, t \in [0,T]. \] The co-state system reads, \[ \dot{\boldsymbol{\lambda}}(t) = -H_{\mathbf{x}}(\mathbf{x}(t),\boldsymbol{\lambda}(t),u(t)) = \left( \begin{array}{c} \frac{g}{L}\cos(x_1(t))\lambda_2(t)\\ -\lambda_1(t) + b \lambda_2(t) \end{array} \right), \] with \(\boldsymbol{\lambda}(0)\) and \(\boldsymbol{\lambda}(T)\) free.

The boundary value problem

Our analysis via the PMP concludes that to solve the energy optimal problem we need to solve the following boundary value problem (BVP).

\[ \mathrm{BVP} \begin{cases} \mathrm{find}: & \mathbf{x},\boldsymbol{\lambda}\in\mathrm{AC}& & \\ \mathrm{such\, that}: & \dot{\mathbf{x}}(t) = f(\mathbf{x}(t),u(t)), \quad & \,\, \forall\, t \in[0,T], & & \quad (8)\\ \quad &\dot{\boldsymbol{\lambda}}(t) = -H_{\mathbf{x}}(\mathbf{x}(t), \boldsymbol{\lambda}(t), u(t)), \quad & \,\,\forall\, t \in[0,T], & & \quad (9)\\ \quad & u(t) = -\lambda_2(t), \quad & \,\, \forall\, t \in[0,T], & & \quad (10)\\ \quad &\mathbf{x}(0) = \mathbf{x}^0, & &\\ \quad & \mathbf{x}(T) = \mathbf{x}^{\mathrm{eq}}. & & \end{cases} \]

4 Indirect single shooting

One way to solve this BVP is to just find either the initial or final co-state. Because, if either are known, you simply integrate (8)-(10) over the horizon \([0,T]\) (an initial-value problem) to obtain the desired trajectories, \(\mathbf{x}\) and \(\boldsymbol{\lambda}\).

Therefore, we can in fact consider the following boundary value identification problem (BVIP),

\[ \mathrm{BVIP} \begin{cases} \mathrm{find}: & (\boldsymbol{\lambda}^0)^\star \in\mathbb{R}^2 & & \\ \mathrm{such\, that}: & \dot{\mathbf{x}}(t) = f(\mathbf{x}(t),u(t)), \quad & \,\, \forall\, t \in[0,T], & \quad (11)\\ \quad &\dot{\boldsymbol{\lambda}}(t) = -H_{\mathbf{x}}(\mathbf{x}(t), \boldsymbol{\lambda}(t), u(t)), \quad & \,\,\forall\, t \in[0,T], & \quad (12)\\ \quad &\mathbf{x}(0) = \mathbf{x}^0, & & \quad (10)\\ \quad &\boldsymbol{\lambda}(0) = (\boldsymbol{\lambda}^0)^\star, & & \quad (13)\\ \quad & u(t) = -\lambda_2(t), \quad & \,\, \forall\, t \in[0,T], & \quad (14)\\ \quad & \mathbf{x}(T) = \mathbf{x}^{\mathrm{eq}}. & & \end{cases} \]

Loosely speaking, the idea behind solving this BVIP via shooting is as follows. You start by guessing the value of \((\boldsymbol{\lambda}^0)^{\star}\), call this guess \(\hat{\boldsymbol{\lambda}}^0\), and integrating the extended system (11)-(14) from \(t=0\) to \(t=T\) (i.e., you shoot). You then see how close the resulting state at \(t=T\) is to the desired target, \(\mathbf{x}^{\mathrm{eq}}\). Then you adjust your original guess \(\hat{\boldsymbol{\lambda}}^0\), shoot again, adjust \(\hat{\boldsymbol{\lambda}}^0\), and iterate in this manner until you are happy that the resulting final state is close enough to the target. There are various ways to “adjust” your guess, see for example Chapter 7 of A. Bryson and Y-C. Ho (1975). A common way is via Newton’s method, which we will code up in the numerics section. The optimal pair \((\mathbf{x}^\star, u^\star)\) (or rather, a good enough approximation), which we actually seek, is then the resulting state and control trajectories that result from the obtained initial costate. Note that, depending on the problem and the information you have, you might instead want to shoot backwards, starting from the final costate.

More concretely for our problem, define the shooting function, \(\mathbf{S}:\mathbb{R}^2\rightarrow \mathbb{R}^2\), \[ \mathbf{S}({\hat{\boldsymbol{\lambda}}^0}) := \left( \begin{array}{c} x_1(T;\mathbf{x}^{0},\hat{\boldsymbol{\lambda}}^0) - x_1^{\mathrm{eq}}\\ x_2(T;\mathbf{x}^{0},\hat{\boldsymbol{\lambda}}^0) - x_2^{\mathrm{eq}} \end{array} \right), \] where, given an arbitrary \(\mathbf{x}^0\) and \(\boldsymbol{\lambda}^0\), \(\mathbf{x}(t;\mathbf{x}^0, \boldsymbol{\lambda}^0)\) denotes the state at time \(t\in[0,T]\) obtained from integrating (11)-(14) from \(t=0\) with \(\mathbf{x}(0) = \mathbf{x}^0\), \(\boldsymbol{\lambda}(0) = \boldsymbol{\lambda}^0\) and \(u(t) = -\lambda_2(t)\).

This tells us how far away from the desired equilibrium point we are if we solve the system (11)-(14) with the initial co-state, \(\hat{\boldsymbol{\lambda}}^0\). Let \(\mathbf{J}_{\mathbf{S}}(\hat{\boldsymbol{\lambda}}^0)\in\mathbb{R}^{2\times 2}\) denote the Jacobian of \(\mathbf{S}\) w.r.t. \(\boldsymbol{\lambda}^0\), evaluated at an arbitrary \(\hat{\boldsymbol{\lambda}}^0\in\mathbb{R}^2\), that is, \[ \mathbf{J}_{\mathbf{S}}(\hat{\boldsymbol{\lambda}}^0) = \frac{\partial \mathbf{S}}{\partial \boldsymbol{\lambda}^0}(\hat{\boldsymbol{\lambda}}^0) = \left( \begin{array}{cc} \frac{\partial x_1(T;\mathbf{x}^0,\hat{\boldsymbol{\lambda}}^0)}{\partial \lambda_1^0} & \frac{\partial x_1(T;\mathbf{x}^0,\hat{\boldsymbol{\lambda}}^0)}{\partial \lambda_2^0} \\ \frac{\partial x_2(T;\mathbf{x}^0,\hat{\boldsymbol{\lambda}}^0)}{\partial \lambda_1^0} & \frac{\partial x_2(T;\mathbf{x}^0,\hat{\boldsymbol{\lambda}}^0)}{\partial \lambda_2^0} \end{array} \right), \] and let \(\mathbf{J}_{\mathbf{S}}^{-1}(\hat{\boldsymbol{\lambda}}^0)\) denote the inverse of this matrix.

A shooting algorithm

Input: Initial guess, \(\boldsymbol{\lambda}^0_{\mathrm{init}}\). Tolerance, \(\varepsilon>0\). Damping factor, \(\alpha \leq 1\).

Output: Approximate solution, \((\boldsymbol{\lambda}^0)^{\varepsilon}\) to (BVIP).

Let \(\hat{\boldsymbol{\lambda}}^0\gets\boldsymbol{\lambda}^0_{\mathrm{init}}\).
Integrate (11)-(14) to get \(\mathbf{x}(T;\mathbf{x}^0,\hat{\boldsymbol{\lambda}}^0)\) and thus \(\mathbf{S}(\hat{\boldsymbol{\lambda}}^0)\).
If \(\mathbf{S}(\hat{\boldsymbol{\lambda}}^0)>\varepsilon\):
- 3.1 Let \(\hat{\boldsymbol{\lambda}}^0 \gets \hat{\boldsymbol{\lambda}}^0 - \alpha\mathbf{J}_{\mathbf{S}}^{-1}(\hat{\boldsymbol{\lambda}}^0) \mathbf{S}(\hat{\boldsymbol{\lambda}}^0)\).
- 3.2 Go to 2.
Let \((\boldsymbol{\lambda}^0)^{\varepsilon} \gets \hat{\boldsymbol{\lambda}}^0\)

The state-control pair, \((\mathbf{x}, \mathbf{u})\), that satisfies (11)-(14) with \((\boldsymbol{\lambda}^0)^{\varepsilon}\) is then an approximate solution to the original OCP.

4.1 Some comments

Step 3.1 is a damped Newton step, where \(\alpha\) determines the damping. Having this scaling sometimes helps with convergence, thus making the approach more robust.

Probably the biggest weakness of the shooting method, as presented here, is that convergence can be very sensitive to the initial guess you make at the costate. Moreover, as with our specific OCP, you often have no physical intuition about what the costate means, meaning you really are stabbing in the dark. In the numerics section to come, I often struggled to find a good starting \(\boldsymbol{\lambda}^0_{\mathrm{init}}\), as the correct initial co-state, \((\boldsymbol{\lambda}^0)^\star\), changes with every adjustment you make in the problem, like the intial and target states, and horizon length. I also had to play around with the damping, \(\alpha\). This process, unfortunately, can be a bit of an art. A useful technique I sometimes use is to find the costate for an easy problem, and then use that solution as a warm start for a slightly different problem. I then perturb the problem slowly until I get to the one I originally cared about (this is a type of homotopy method).

Note that the inverse of the Jacobian, \(\mathbf{J}_{\mathbf{S}}^{-1}\), must exist at every iteration of the Newton update, which is of course not guaranteed for every optimal control problem. The Jacobian can become singular when perturbations of the initial costate (or other shooting variables) no longer produce independent variations in the terminal state. This can happen, for example, when the associated extremal trajectory contains a so-called conjugate point, when boundary conditions are not independent, or when singular arcs or abnormal extremals occur. Moreover, for some problems the Jacobian is not square. In this case, one may instead employ the Gauss–Newton method, which replaces the inverse of the Jacobian with its Moore–Penrose pseudoinverse.

If you are interested in some of these issues as well as more advanced techniques, you can consult J. Bonnans (2013), M. Aronna, F. Bonnans, and P. Martinon (2013) and their references.

5 Numerics

In this section we’ll code up a solver for the energy-optimal problem and do some experiments. You can show/hide the code in each block by clicking the small triangle next to the word “Code”.

5.1 Coding up a solver in python

First, we need to set up some symbolic variables and functions

Code

from shooting_method import create_shooting_residual_and_jacobian_inv, \
    simulate_extended_pendulum, plot_pendulum_solution_in_statespace_and_control, \
    animate_pendulum
import numpy as np
import casadi as ca

params = {
    'g': 9.81,
    'b': 0.1,
    'L': 1,
    'T': 2,
    'h': 0.1,
    'x_init': np.array([[0], [0]]),         # hanging down
    'x_target': np.array([[ca.pi], [0]])    # upright position
}

S_fun, jac_inv_fun = create_shooting_residual_and_jacobian_inv(params)

Code

def create_shooting_residual_and_jacobian_inv(params):
    # we need to use `p` because `lambda` is taken
    p_init = ca.MX.sym('p_init', 2)                 # Symbolic unknown: initial costate       
    z0 = ca.vertcat(params['x_init'], p_init)       # Initial augmented state       

    # Discretized dynamics function
    z_sym = ca.MX.sym('z', 4)
    z_next, _ = get_next_step_pendulum_extended(z_sym, params)
    F = ca.Function('F', [z_sym], [z_next], ['z'], ['z_next'])

    # Forward simulation (still symbolic!)
    z = z0
    N = int(np.floor(params['T'] / params['h']))
    for k in range(N):
        z = F(z)

    # Final residual
    x1_final = z[0]
    x2_final = z[1]
    S = ca.vertcat(x1_final - params['x_target'][0], x2_final - params['x_target'][1])

    # Now take symbolic Jacobian ∂g / ∂p_init   →  2 × 2 matrix
    Jacobian_of_S_wrt_p_init = ca.jacobian(S, p_init)

    J_inv = ca.inv(Jacobian_of_S_wrt_p_init)

    jac_inv_fun = ca.Function(
        'jac_inv',
        [p_init],
        [J_inv],
        ['p_init'],
        ['J_inv']
    )

    # CasADi functions for numeric evaluation later
    # Residual evaluator
    S_fun = ca.Function(
        'residual',
        [p_init],
        [S],
        ['p_init'],
        ['S']
    )

    return S_fun, jac_inv_fun

Code

def get_next_step_pendulum_extended(z, params, simulate_flag=False):
    x1 = z[0]
    x2 = z[1]
    p1 = z[2]
    p2 = z[3]

    b = params['b']
    L = params['L']
    g = params['g']
    h = params['h']

    u = - p2

    # RK4
    k1 = extended_system_pendulum(z, u, params, simulate_flag)
    k2 = extended_system_pendulum(z + h*(k1/2), u, params, simulate_flag)
    k3 = extended_system_pendulum(z + h*(k2/2), u, params, simulate_flag)
    k4 = extended_system_pendulum(z + h*k3, u, params, simulate_flag)
    
    return z + (h/6)*(k1 + 2*k2 + 2*k3 + k4), u

def extended_system_pendulum(z, u, params, simulate_flag):
    x1 = z[0]
    x2 = z[1]
    p1 = z[2]
    p2 = z[3]

    if simulate_flag is False:
        return ca.vertcat(x2,
                        -(params['g']/params['L']) * ca.sin(x1) - params['b'] * x2 + u,
                        params['g']/params['L'] * ca.cos(x1) * p2,
                        -p1 + params['b'] * p2
                        )
    else:
        return np.array([
                        x2,
                        -(params['g']/params['L']) * ca.sin(x1) - params['b'] * x2 + u,
                        params['g']/params['L'] * ca.cos(x1) * p2,
                        -p1 + params['b'] * p2]).reshape(4, 1)

We now have two symbolic functions, S_fun and jac_inv_fun, implementing the shooting function, \(\mathbf{S}\), and inverse of the Jacobian, \(\mathbf{J}_{\mathbf{S}}^{-1}\), respectively.

Next, we easily code up Newton’s method. Recall that we’ve set the initial and target states to be \(\mathbf{x}^0 = (0,0)^\top\) and \(\mathbf{x}^T = (\pi,0)^\top\).

Code

p_init = np.array([[0], [0]]) # seems to work best as an initial guess

tol = 1e-6
p = p_init
alpha = 0.1
i = 0
no_iter = 200
while (np.linalg.norm(S_fun(p).full()) > tol) and (i < no_iter):
    step = ca.mtimes(jac_inv_fun(p), S_fun(p))
    p = p - alpha * step
    i += 1

print(f'Iterations = {i}')
print(f'p_init optimal = {p}')
print(f'||S|| = {np.linalg.norm(S_fun(p))}')

Iterations = 148
p_init optimal = [7.8444, 7.86226]
||S|| = 9.139294369120613e-07

We see that Newton’s method converged to a solution, see Figure 2 and the animation in Figure 3.

Figure 2: Solution with \(\mathbf{x}^0=(0,0)^\top\), \(\mathbf{x}^T=(\pi,0)^\top\), \(T=2\), \(b=0.1\), \(L=1\), \(h=0.1\).

Figure 3: Animation of solution in Figure 2

5.2 Further experiments

For fun, let’s play with the solver we have coded.

5.2.1 Different intial and target states

We can easily change the initial and target state, and the time horizon.

Code

params = {
    'g': 9.81,
    'b': 0.1,
    'L': 1,
    'T': 3,                                
    'h': 0.1,
    'x_init': np.array([[2*ca.pi/3], [0]]),         
    'x_target': np.array([[-2*ca.pi/3], [0]])    
}

S_fun, jac_inv_fun = create_shooting_residual_and_jacobian_inv(params)

p_init = np.array([[0], [0]])

tol = 1e-6
p = p_init
alpha = 0.1
i = 0
no_iter = 200
while (np.linalg.norm(S_fun(p).full()) > tol) and (i < no_iter):
    step = ca.mtimes(jac_inv_fun(p), S_fun(p))
    p = p - alpha * step
    i += 1
print(f'Iterations = {i}')
print(f'p_init optimal = {p}')
print(f'||S|| = {np.linalg.norm(S_fun(p))}')

Iterations = 146
p_init optimal = [17.4954, 4.26011]
||S|| = 9.771639948190016e-07

Figure 4: Solution with \(\mathbf{x}^0=(2\pi/3,0)^\top\), \(\mathbf{x}^T=(-2\pi/3,0)^\top\), \(T=3\), \(b=0.1\), \(L=1\), \(h=0.1\).

Figure 5: Animation of solution in Figure 4

5.2.2 Long horizons

It’s interesting how the optimal control lets the pendulum swing a bit before it then drives it to the target state. Let’s see how the solution behaves with a long horizon.

Code

params = {
    'g': 9.81,
    'b': 0.1,
    'L': 1,
    'T': 15,                                # long horizon
    'h': 0.1,
    'x_init': np.array([[0], [0]]),         # hanging down
    'x_target': np.array([[ca.pi], [0]])    # upright position
}

S_fun, jac_inv_fun = create_shooting_residual_and_jacobian_inv(params)

p_init = np.array([[0], [0]]) # seems to work best as an initial guess

tol = 1e-6
p = p_init
alpha = 0.1
i = 0
no_iter = 200
while (np.linalg.norm(S_fun(p).full()) > tol) and (i < no_iter):
    step = ca.mtimes(jac_inv_fun(p), S_fun(p))
    p = p - alpha * step
    i += 1
print(f'Iterations = {i}')
print(f'p_init optimal = {p}')
print(f'||S|| = {np.linalg.norm(S_fun(p))}')

Iterations = 194
p_init optimal = [1.32368, 0.464242]
||S|| = 9.908921154988721e-07

Figure 6: Solution with \(\mathbf{x}^0=(0,0)^\top\), \(\mathbf{x}^T=(\pi,0)^\top\), \(T=15\), \(b=0.1\), \(L=1\), \(h=0.1\).

Figure 7: Animation of solution in Figure 6

Wow this is maybe a little counter-intuitive. One might expect the optimal solution for a long horizon to simply be a scaled version of some short-horizon solution. But no!

Instead, the optimiser reveals that for longer horizons it is optimal to initially force the pendulum to oscillate before applying the final control action that makes it reach the upright target. The state space curve beautifully illustrates this: instead of a direct path towards the origin, we see a bunch of loops before the trajectory converges to the target. Clearly, the energy lost due to friction while oscillating is less than the energy you would lose if you “brute forced” the system with a scaled short-horizon solution.

Of course, the conditions of the PMP are necessary and we might have found a pair \((\mathbf{x}, \mathbf{u})\) that is merely locally optimal. More investigation might be required, but I’ll leave it to the reader “as an excercise” :).

6 Conclusion

We’ve gone through the steps of how to solve a continuous time optimal control problem via indirect shooting in detail. We considered the nonlinear pendulum, defined the optimal control problem we wanted to solve, analysed it via Pontryagin’s maximum principle to obtain a boundary value problem, and then numerically solved it via Newton’s method. The code we wrote can easily be adapted to consider other systems.

Revision history

2026-05-02: Reworked the beginning of Section 4 to explain that we are in fact solving what could be called a boundary value identification problem via shooting. What was written before was inaccurate. Also fixed a few minor typos.

References

A. Bryson, and Y-C. Ho. 1975. Applied Optimal Control: Optimization, Estimization, and Control. Taylor & Francis.

D. Liberzon. 2011. Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton university press.

E. Sontag. 2013. Mathematical Control Theory: Deterministic Finite Dimensional Systems. Vol. 6. Springer Science & Business Media.

J. Bonnans. 2013. “The Shooting Approach to Optimal Control Problems.” IFAC Proceedings Volumes 46 (11): 281–92.

J. Macki, and A. Strauss. 2012. Introduction to Optimal Control Theory. Springer Science & Business Media.

L. Cesari. 2012. Optimization—Theory and Applications: Problems with Ordinary Differential Equations. Springer Science & Business Media.

L. Pontryagin, V. Boltyanskii, R. Gamkrelidze, and E. Mishchenko. 1965. The Mathematical Theory of Optimal Processes. John Wiley & sons.

M. Aronna, F. Bonnans, and P. Martinon. 2013. “A Shooting Algorithm for Optimal Control Problems with Singular Arcs.” Journal of Optimization Theory and Applications 158 (2): 419–59.

Solving Optimal Control Problems via Indirect Single Shooting

1 Introduction

1.1 Article outline

1.2 Pontragin’s maximum principle

Statement of the PMP

2 The pendulum’s equations of motion

3 The energy optimal problem

3.1 Analysis via the PMP

The boundary value problem

4 Indirect single shooting

4.1 Some comments

5 Numerics

5.1 Coding up a solver in python

5.2 Further experiments

5.2.1 Different intial and target states

5.2.2 Long horizons

6 Conclusion

Revision history

References

Comments