using OrdinaryDiffEq, Plots
λ, x0 = 0.5, 1.0
decay(x, p, t) = -λ * x
prob = ODEProblem(decay, x0, (0.0, 6.0))
sol = solve(prob, Tsit5(); saveat = 0.05)
plot(sol.t, sol.u; label = "Tsit5 (numerical)", lw = 2,
xlabel = "t", ylabel = "x(t)",
title = "Exponential decay: ẋ = -λx, λ = $λ")
plot!(sol.t, x0 .* exp.(-λ .* sol.t);
label = "exact: x₀·e^(-λt)", ls = :dash, c = :black)Unit 3: Scientific Machine Learning and Physics-Informed Machine Learning
Unit 2 gave us a function-approximation view of ML — pick a function class, minimise a loss, generalise. This unit takes that picture and adds physics to it, in three flavours: hybrid models, conservation-aware architectures, and equation discovery. The unit closes with a conceptual preview of physics-informed machine learning (PIML) — the framing that Units 5–9 implement against the AIMS capstone.
Before any of that, an ODE refresher — the dynamics language we need to start meaningful Sci-ML.
3.1 An ODE refresher
An ordinary differential equation (ODE) is a relation between a function and its derivatives in one independent variable (typically time t):
\dot{\mathbf{x}}(t) \;=\; f(\mathbf{x}(t), t),\qquad \mathbf{x}(0) \;=\; \mathbf{x}_0.
We’ll meet partial differential equations (PDEs) — derivatives in more than one variable — in Unit 6. For now everything is an ODE, and everything is an initial-value problem (IVP): given \mathbf{x}_0 at t = 0, integrate forward.
Vocabulary worth keeping straight:
- \dot{\mathbf{x}} is the time derivative d\mathbf{x}/dt.
- Order — highest derivative appearing. A second-order scalar ODE \ddot{x} = g(x, \dot{x}) rewrites as a first-order system in (x, \dot{x}). Throughout the unit we assume that’s been done.
- Autonomous if f has no explicit t dependence (\dot{\mathbf{x}} = f(\mathbf{x})); non-autonomous otherwise.
- Linear if f(\mathbf{x}, t) = A(t)\,\mathbf{x} + b(t); nonlinear otherwise. Linear systems have explicit solutions via matrix exponentials; nonlinear systems generally do not.
The next three subsubsections walk three ODEs in increasing complexity — exponential decay, the undamped harmonic oscillator, and a second-order system with damping. Each comes with Julia and Python code so the ecosystem fluency built in Unit 2 carries over.
The simplest ODE: exponential decay
A scalar, first-order, linear, autonomous ODE — the canonical warm-up:
\dot x \;=\; -\lambda\, x, \qquad x(0) = x_0, \qquad \lambda > 0,
with exact solution x(t) = x_0\, e^{-\lambda t}. It models radioactive decay, RC-circuit discharge, drug clearance from a bloodstream — anything where the rate of change is proportional to the current amount. The point of starting here is that we can compare numerical integration to a known closed-form answer, which makes solver choices concrete.
In Julia, using the OrdinaryDiffEq ecosystem that the rest of the workshop standardises on:
In Python with scipy.integrate.solve_ivp (the SciPy equivalent of OrdinaryDiffEq):
units/unit_03/scripts/decay_python.py
import numpy as np
from scipy.integrate import solve_ivp
lam, x0 = 0.5, 1.0
sol = solve_ivp(lambda t, x: -lam * x,
t_span=(0.0, 6.0), y0=[x0],
method="RK45", dense_output=True)
t = np.linspace(0, 6, 121)
print("max |numerical - exact| =",
float(np.max(np.abs(sol.sol(t)[0] - x0 * np.exp(-lam * t)))))The max error is \sim 10^{-6} for both solvers at default tolerances — a useful baseline before we ask anything harder.
A second-order system: the harmonic oscillator
The simplest non-trivial second-order ODE — a mass on a spring,
\ddot{x} + \omega^2 x = 0 \quad\Longleftrightarrow\quad \begin{pmatrix} \dot{x} \\ \dot{v} \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ -\omega^2 & 0 \end{pmatrix} \begin{pmatrix} x \\ v \end{pmatrix},
with v = \dot{x}. Energy E = \tfrac{1}{2}(v^2 + \omega^2 x^2) is conserved exactly; the phase portrait — trajectories in the (x, v) plane — is a family of concentric ellipses. This is the example the Hamiltonian / Lagrangian neural networks of §3.3 are designed to handle without drifting.
using OrdinaryDiffEq, Plots
ω = 2π # one cycle per second
oscillator(u, p, t) = [u[2], -ω^2 * u[1]]
u0s = [[1.0, 0.0], [1.5, 0.0], [2.0, 0.0]]
tspan = (0.0, 2.0)
plt = plot(xlabel = "x", ylabel = "v",
title = "Harmonic oscillator phase portrait",
aspect_ratio = :equal, legend = :outerright)
for u0 in u0s
sol = solve(ODEProblem(oscillator, u0, tspan), Tsit5();
saveat = 0.01)
plot!(plt, sol[1, :], sol[2, :], label = "x(0) = $(u0[1])")
end
pltOrdinaryDiffEq.Tsit5() is a 5th-order explicit Runge–Kutta solver — a sensible default for non-stiff problems. For stiff systems (e.g. fast diffusion coupled to slow advection) one picks an implicit solver like QNDF() or Rodas5P() instead. Choosing the solver is most of classical numerics; in Unit 4 we’ll let the network learn the dynamics directly and side-step the choice.
Adding damping: the three regimes
Real oscillators bleed energy. Add a velocity-proportional drag 2\zeta\omega\dot x and the equation becomes
\ddot{x} + 2\zeta\omega\, \dot{x} + \omega^2 x = 0,
with damping ratio \zeta \geq 0 — the dimensionless number that controls how the oscillator decays:
- \zeta < 1 — underdamped. Oscillation with an exponential envelope e^{-\zeta\omega t}. The default for, e.g., a car suspension or a ringing bell.
- \zeta = 1 — critically damped. Fastest return to zero with no overshoot. The design target for, e.g., a closing door or a feedback controller.
- \zeta > 1 — overdamped. Slow exponential return; no oscillation. Heavy honey on a spring.
A single Julia script overlays the three regimes:
using OrdinaryDiffEq, Plots
ω = 2π
function damped(u, p, t)
ζ = p[1]
[u[2], -ω^2 * u[1] - 2ζ * ω * u[2]]
end
u0 = [1.0, 0.0]
ts = (0.0, 3.0)
plt = plot(xlabel = "t", ylabel = "x(t)",
title = "Damped oscillator: ẍ + 2ζω ẋ + ω²x = 0",
legend = :topright)
for (ζ, label) in [(0.1, "ζ=0.1 (under)"),
(1.0, "ζ=1.0 (critical)"),
(2.0, "ζ=2.0 (over)")]
sol = solve(ODEProblem(damped, u0, ts, [ζ]), Tsit5(); saveat = 0.01)
plot!(plt, sol.t, [u[1] for u in sol.u], label = label, lw = 2)
end
pltSame model in Python — same shape:
units/unit_03/scripts/damped_python.py
import numpy as np
from scipy.integrate import solve_ivp
omega = 2 * np.pi
def damped(t, u, zeta):
return [u[1], -omega**2 * u[0] - 2 * zeta * omega * u[1]]
for zeta in (0.1, 1.0, 2.0):
sol = solve_ivp(damped, (0.0, 3.0), [1.0, 0.0],
args=(zeta,), t_eval=np.linspace(0, 3, 301))
print(f"ζ={zeta}: x(3.0) = {sol.y[0, -1]:+.4f}")Notice \zeta = 1 minimises the time to “land” near zero — a fact control engineers exploit. We’ll come back to damped oscillators when we discuss the inverse problem in Unit 4: recovering \zeta from a noisy trajectory is the simplest possible “physics parameter from data” problem and a useful sanity check before attempting the AIMS capstone in Units 8–10.
Why ODEs matter for the rest of the unit
Every Sci-ML idea in §§3.2–3.5 starts from an ODE picture:
- Hybrid models replace part of f with a neural network.
- Hamiltonian / Lagrangian NNs parametrise an energy function and derive f from it.
- SINDy assumes f is a sparse combination of library terms and recovers it from data.
- PINNs turn the IVP itself into a loss — they approximate the solution map t \mapsto \mathbf{x}(t) by a neural network.
Keep the picture \dot{\mathbf{x}} = f(\mathbf{x}, t) in mind as we go through each.
3.2 The SciML landscape
Hybrid physics + ML
A hybrid (or universal) model uses a known physical equation for the parts of the dynamics we understand and a neural network for the parts we don’t. Concretely, swap f in \dot{\mathbf{x}} = f(\mathbf{x}, t) for a sum of known and learned pieces:
\dot{\mathbf{x}} = \underbrace{f_{\text{phys}}(\mathbf{x}, t)}_{\text{known}} \;+\; \underbrace{N_\theta(\mathbf{x}, t)}_{\text{learned}}.
Example: a population model with known logistic growth and an unknown immigration term modelled by a small NN; train \theta against observed time series. The benefits over pure ML — data efficiency and out-of-distribution extrapolation. The benefit over pure physics — flexibility where the physics is genuinely unknown. Universal differential equations (UDEs) in Unit 4 are exactly this construction.
Surrogate models for expensive simulations
A surrogate replaces an expensive PDE / ODE solver with a fast learned approximation. Train on a dataset of (input, solver output) pairs; deploy in design optimisation, uncertainty quantification, or real-time inference. The catch: surrogates are only trustworthy inside the distribution they were trained on. Out-of-distribution inputs can produce confident-but-wrong outputs.
Conservation, symmetries, and inductive biases
A neural network that cannot violate conservation of mass, energy, or momentum will generalise better than one that has to learn the constraint from data. Inductive biases that enforce conservation:
- Hard architectural constraints — output a flux, then take its divergence; output an energy, then take its gradient (HNNs in §3.3 work this way).
- Soft loss penalties — add a violation term to the loss.
- Equivariant architectures — networks whose outputs transform correctly under symmetries (translation, rotation, gauge).
The pattern recurs throughout the unit: the more physics we bake in, the less data we need.
3.3 Hamiltonian and Lagrangian neural networks
The construction
A Hamiltonian Neural Network (HNN) (Greydanus et al., 2019) parametrises the Hamiltonian H_\theta(q, p) of a mechanical system with a neural network, then derives the dynamics via Hamilton’s equations with autodiff:
\dot{q} \;=\; \frac{\partial H_\theta}{\partial p}, \qquad \dot{p} \;=\; -\,\frac{\partial H_\theta}{\partial q}.
Energy is exactly conserved by construction — no soft penalty needed — because \dot H = \nabla H \cdot \dot{\mathbf{x}} = 0 for the symplectic flow. Train on observed trajectories by minimising the residual
\mathcal{L}(\theta) \;=\; \sum_i \bigl\| (\dot q_i, \dot p_i) - (\partial_p H_\theta, -\partial_q H_\theta)(q_i, p_i) \bigr\|^2.
Lagrangian Neural Networks (LNNs) (Cranmer et al., 2020) do the analogous trick with the Lagrangian L_\theta(q, \dot q) and the Euler–Lagrange equations.
A pendulum sanity check
The simple pendulum has the closed-form Hamiltonian H(q, p) = \tfrac{1}{2} p^2 + (1 - \cos q). An HNN trained on a hundred noisy trajectory points typically recovers an energy-conserving model that drifts an order of magnitude less than a vanilla MLP fit to the same data. The win is sharpest on long extrapolation horizons.
Code placeholder — units/unit_03/scripts/hnn_pendulum.jl (TODO).
3.4 Equation discovery: SINDy
The next three subsubsections cover (a) the sparse-regression idea, (b) how to choose the candidate-term library, and (c) a worked Lorenz example in Julia and Python. We then close with when SINDy beats — and loses to — black-box function approximators.
The sparse regression idea
SINDy (Sparse Identification of Nonlinear Dynamics) (Brunton et al., 2016) asks: instead of fitting a black-box neural network to data, can we directly recover the governing equation by assuming it’s a short symbolic expression? The construction:
- Observe time-series data X = [\mathbf{x}(t_1), \mathbf{x}(t_2), \ldots].
- Numerically differentiate to estimate \dot X (the hard part — noise destroys naive finite differences; total-variation regularised differentiation or smoothing splines help).
- Build a library matrix of candidate right-hand-side terms evaluated at each sample, \Theta(\mathbf{x}) \;=\; [1,\ x_1,\ x_2,\ \ldots,\ x_1^2,\ x_1 x_2,\ \ldots,\ \sin x_1,\ \cos x_1,\ \ldots].
- Solve the sparse regression \dot X = \Theta(X)\,\Xi for a sparse coefficient matrix \Xi. The nonzero entries are the equation.
The sparsity is enforced either by \ell_1 regularisation (LASSO) or by STLSQ — Sequentially Thresholded Least Squares: ordinary least squares → set small coefficients to zero → refit on the survivors → repeat until stable. STLSQ is what the original Brunton paper proposed and what the workshop script uses.
Why SINDy is popular. Compared to fitting a neural network to the same dynamics, SINDy offers:
- Interpretability. The output is a human-readable equation, not a 10-layer black box. Domain experts can sanity-check it.
- Extrapolation. A 3-term equation is far more likely to be correct outside its training regime than a network that memorised the training trajectory.
- Data efficiency. Hundreds of samples are usually enough, vs. tens of thousands for a deep model on the same task.
- Compatibility with downstream physics. Once you have an equation, you can hand it to a stiff ODE solver, do stability analysis, derive control laws — any classical tool works.
Compared to physics-informed neural networks (next units), SINDy is the right move when you don’t know the equation but suspect it’s sparse. PINNs assume the equation is given and solve it; SINDy discovers the equation in the first place. They’re complementary, not competing.
Building a library
A standard polynomial library up to degree 3 plus a few trigonometric terms is the default. Domain knowledge guides which terms to include; including too many makes the regression unstable (many candidate terms ⇒ overfitting), too few makes recovery impossible (the true terms aren’t in the basis). Numerical differentiation of \dot X from time-series X is a separate art (total-variation regularised differentiation, smoothed splines).
Worked example: recovering Lorenz
Given trajectories of the Lorenz system
\dot x = \sigma(y - x),\quad \dot y = x(\rho - z) - y,\quad \dot z = xy - \beta z,
with moderate observation noise, SINDy with a degree-2 polynomial library typically recovers the exact equations and the parameters (\sigma, \rho, \beta) \approx (10, 28, 8/3) — the canonical demonstration that a sparse prior plus enough data is sometimes all you need.
A bare-bones STLSQ implementation (no DataDrivenDiffEq dependency) makes the recipe explicit: build the library, threshold the OLS coefficients, re-fit on the surviving terms, repeat.
# SINDy on the Lorenz system.
#
# Simulate a Lorenz trajectory, add observation noise, and recover the
# symbolic equations by sparse regression against a degree-2 polynomial
# library — the worked example in unit_03.qmd §3.4.
#
# Algorithm: sequentially thresholded least squares (STLSQ), the
# canonical SINDy update from Brunton, Proctor & Kutz (2016).
# Derivatives are obtained by second-order central differences; the
# qmd notes that derivative estimation is a separate art and we keep
# this script simple by relying on a dense save interval.
#
# Run via ./build.sh execute 3 (writes output to ../output/sindy_lorenz.md).
using OrdinaryDiffEq, Random, LinearAlgebra, Printf, Statistics
# ── 1. simulate Lorenz ─────────────────────────────────────────────────
σ_true, ρ_true, β_true = 10.0, 28.0, 8/3
function lorenz!(du, u, p, t)
x, y, z = u
du[1] = σ_true * (y - x)
du[2] = x * (ρ_true - z) - y
du[3] = x * y - β_true * z
end
u0 = [-8.0, 7.0, 27.0]
tspan = (0.0, 20.0)
dt = 0.002
sol = solve(ODEProblem(lorenz!, u0, tspan), Tsit5();
saveat = dt, abstol = 1e-10, reltol = 1e-10)
X_clean = permutedims(reduce(hcat, sol.u)) # N × 3
# ── 2. add observation noise ───────────────────────────────────────────
rng = Random.MersenneTwister(0)
noise_level = 0.01 # 1 % of per-axis std
σ_obs = noise_level .* vec(std(X_clean; dims = 1))'
X = X_clean .+ σ_obs .* randn(rng, size(X_clean))
# ── 3. central-difference derivatives, trim end points ─────────────────
Ẋ = (X[3:end, :] .- X[1:end-2, :]) ./ (2 * dt)
Xc = X[2:end-1, :]
# ── 4. degree-2 polynomial feature library ─────────────────────────────
# columns: 1, x, y, z, x², xy, xz, y², yz, z²
function poly_library(X)
x, y, z = X[:, 1], X[:, 2], X[:, 3]
hcat(ones(length(x)), x, y, z,
x.^2, x.*y, x.*z, y.^2, y.*z, z.^2)
end
labels = ["1", "x", "y", "z", "x^2", "x*y", "x*z", "y^2", "y*z", "z^2"]
Θ = poly_library(Xc)
# ── 5. STLSQ — sequentially thresholded least squares ──────────────────
function stlsq(Θ, Ẋ, λ; n_iters = 15)
Ξ = Θ \ Ẋ
for _ in 1:n_iters
small = abs.(Ξ) .< λ
Ξ[small] .= 0
for j in axes(Ξ, 2)
keep = .!small[:, j]
sum(keep) == 0 && continue
Ξ[keep, j] = Θ[:, keep] \ Ẋ[:, j]
end
end
Ξ
end
λ = 0.1
Ξ = stlsq(Θ, Ẋ, λ)
# ── 6. report ──────────────────────────────────────────────────────────
state_names = ["dx/dt", "dy/dt", "dz/dt"]
println("Lorenz SINDy demo")
println("samples used : ", size(Xc, 1))
println("observation noise : ", noise_level * 100, " % of per-axis std")
println("STLSQ threshold λ : ", λ)
println()
println("True system:")
println(" dx/dt = -10·x + 10·y")
println(" dy/dt = 28·x - y - x·z")
println(" dz/dt = x·y - (8/3)·z")
println()
println("Recovered equations:")
for j in 1:3
terms = String[]
for i in eachindex(labels)
c = Ξ[i, j]
c == 0 && continue
push!(terms, @sprintf("%+0.3f·%s", c, labels[i]))
end
println(" $(state_names[j]) = ", isempty(terms) ? "0" : join(terms, " "))
end
println()
@printf("σ̂ ≈ %0.3f (true %0.3f)\n", -Ξ[2, 1], σ_true)
@printf("ρ̂ ≈ %0.3f (true %0.3f)\n", Ξ[2, 2], ρ_true)
@printf("β̂ ≈ %0.3f (true %0.3f)\n", -Ξ[4, 3], β_true)Captured output from ./build.sh execute 3:
Lorenz SINDy demo
samples used : 9999
observation noise : 1.0 % of per-axis std
STLSQ threshold λ : 0.1
True system:
dx/dt = -10·x + 10·y
dy/dt = 28·x - y - x·z
dz/dt = x·y - (8/3)·z
Recovered equations:
dx/dt = -9.995·x +9.995·y
dy/dt = +27.971·x -0.990·y -0.999·x*z
dz/dt = -2.667·z +1.000·x*y
σ̂ ≈ 9.995 (true 10.000)
ρ̂ ≈ 27.971 (true 28.000)
β̂ ≈ 2.667 (true 2.667)
The same workflow in Python uses the pysindy package — the canonical SINDy implementation, maintained by the Brunton group:
units/unit_03/scripts/sindy_lorenz_python.py
# pip install pysindy
import numpy as np
from scipy.integrate import solve_ivp
import pysindy as ps
sigma, rho, beta = 10.0, 28.0, 8.0 / 3.0
def lorenz(t, u): return [sigma * (u[1] - u[0]),
u[0] * (rho - u[2]) - u[1],
u[0] * u[1] - beta * u[2]]
t = np.linspace(0, 10, 5001)
sol = solve_ivp(lorenz, (t[0], t[-1]), [-8.0, 7.0, 27.0],
t_eval=t, rtol=1e-9, atol=1e-9)
X = sol.y.T
model = ps.SINDy(
feature_library=ps.PolynomialLibrary(degree=2),
optimizer=ps.STLSQ(threshold=0.1),
)
model.fit(X, t=t)
model.print()PySINDy bundles all the choices we made explicit in the Julia script (library, optimiser, differentiation) into a scikit-learn style estimator; the trade-off is less visibility into the inner loop. Both produce the same Lorenz equation, recovered to ~3 significant figures.
When discovery beats approximation
SINDy wins when
- the underlying equation is genuinely sparse in your library,
- you care about interpretability — a 3-term equation is more useful than a 10-layer network, and
- you have enough clean trajectory data to estimate derivatives.
It loses when the dynamics are intrinsically high-dimensional, the right basis isn’t in your library, or the noise drowns the derivatives.
3.5 Proper Orthogonal Decomposition (POD)
Dynamical systems often look high-dimensional but live on a low-dimensional manifold — a turbulent flow field has millions of grid cells but the coherent motion (vortices, waves) is described by a handful of modes. Proper Orthogonal Decomposition (POD) — also called Principal Component Analysis when applied to snapshot matrices — extracts those modes systematically. It’s the classical workhorse for reduced-order modelling (ROM) and the first thing to reach for when you suspect your system has hidden low-dimensional structure.
The snapshot SVD
Collect N snapshots of the state \mathbf{x}_k \in \mathbb{R}^M (e.g. M = number of grid cells, N = number of time samples) into a snapshot matrix:
X \;=\; \bigl[\mathbf{x}_1 \;|\; \mathbf{x}_2 \;|\; \ldots \;|\; \mathbf{x}_N\bigr] \;\in\; \mathbb{R}^{M \times N}.
Compute its singular value decomposition,
X \;=\; U \Sigma V^\top, \qquad U \in \mathbb{R}^{M \times r},\ \Sigma \in \mathbb{R}^{r \times r},\ V \in \mathbb{R}^{N \times r},
with \Sigma = \mathrm{diag}(\sigma_1 \geq \sigma_2 \geq \ldots \geq \sigma_r > 0). The columns of U are the POD modes — an orthonormal basis for the snapshots, ranked by how much variance they explain. Keep the first k \ll r modes and you have a k-dimensional approximation:
\mathbf{x}(t) \;\approx\; U_k \, \mathbf{a}(t), \qquad \mathbf{a}(t) = U_k^\top \mathbf{x}(t) \;\in\; \mathbb{R}^k.
The decay of \sigma_i tells you whether the reduction will work. Fast decay → a few modes capture almost everything → ROM is viable. Slow decay → genuinely high-dimensional → POD will lose detail.
Worked example: damped oscillator chain
A chain of M = 50 masses coupled by springs, given a localised initial push and lightly damped, is a useful test bench. The state-space dimension is 100 (positions + velocities) but a handful of normal modes carry almost all the energy:
units/unit_03/scripts/pod_chain.jl
using OrdinaryDiffEq, LinearAlgebra, Plots
# 50-mass chain: x¨ᵢ = ω²(xᵢ₋₁ - 2xᵢ + xᵢ₊₁) - 2ζω x˙ᵢ (free at endpoints)
M, ω, ζ = 50, 2π, 0.02
function chain!(du, u, p, t)
x, v = @views u[1:M], u[M+1:end]
du[1:M] .= v
@inbounds for i in 1:M
l = i == 1 ? 0.0 : x[i - 1]
r = i == M ? 0.0 : x[i + 1]
du[M + i] = ω^2 * (l - 2x[i] + r) - 2ζ * ω * v[i]
end
end
u0 = zeros(2M); u0[M ÷ 2] = 1.0 # localised initial displacement
sol = solve(ODEProblem(chain!, u0, (0.0, 8.0)), Tsit5(); saveat = 0.02)
# Snapshot matrix: rows = positions, columns = time samples
X = reduce(hcat, [u[1:M] for u in sol.u])
U, Σ, V = svd(X)
println("first 10 singular values (relative):")
foreach(i -> println(" σ_$i / σ_1 = ", round(Σ[i] / Σ[1]; digits = 4)),
1:10)
# Reconstruction error vs number of modes kept
k_range = 1:15
errs = [norm(X - U[:, 1:k] * Diagonal(Σ[1:k]) * V[:, 1:k]') / norm(X)
for k in k_range]
plot(k_range, errs; lw = 2, marker = :circle,
xlabel = "modes kept k", ylabel = "relative reconstruction error",
title = "POD spectrum: 50-mass damped chain", yaxis = :log)Typical behaviour: ~6 modes recover >99\% of the variance even though the system has 100 state-space dimensions. The first mode is the symmetric breathing of the chain, the second is the first-overtone shape, and so on — modes hierarchically encoded by spatial frequency, exactly as a Fourier basis would predict for this linear system.
Same workflow in Python — scipy.linalg.svd is the SVD; the rest is numpy:
units/unit_03/scripts/pod_chain_python.py
import numpy as np
from scipy.integrate import solve_ivp
from scipy.linalg import svd
M, omega, zeta = 50, 2 * np.pi, 0.02
def chain(t, u):
x, v = u[:M], u[M:]
du = np.zeros_like(u)
du[:M] = v
for i in range(M):
l = 0.0 if i == 0 else x[i - 1]
r = 0.0 if i == M - 1 else x[i + 1]
du[M + i] = omega**2 * (l - 2 * x[i] + r) - 2 * zeta * omega * v[i]
return du
u0 = np.zeros(2 * M); u0[M // 2] = 1.0
sol = solve_ivp(chain, (0.0, 8.0), u0, t_eval=np.linspace(0, 8, 401))
X = sol.y[:M, :] # (M, N) positions
U, Sigma, Vt = svd(X, full_matrices=False)
print("first 6 singular values (relative):",
np.round(Sigma[:6] / Sigma[0], 4))
print(f"variance captured by k=6 modes: "
f"{(Sigma[:6]**2).sum() / (Sigma**2).sum():.4f}")POD’s role in modern Sci-ML
POD by itself is linear and deterministic — it pre-dates deep learning by half a century. But it shows up everywhere in contemporary work:
- As a preprocessor. Reduce 1M-dimensional snapshots to 100 POD coefficients, then learn the evolution of those coefficients with a small neural ODE or transformer. Modern weather emulators (Pangu-Weather, GraphCast) lean on this idea.
- As a sanity check. If POD already captures 99% of variance in 5 modes, you don’t need deep learning — a linear ROM is enough.
- As a comparison baseline. Any nonlinear surrogate should beat the POD-truncation baseline; if it doesn’t, the nonlinearity isn’t pulling its weight.
POD’s nonlinear cousins — autoencoders, dynamic mode decomposition (DMD), Koopman operators — generalise the idea: project the dynamics onto a learned (possibly nonlinear) low-dimensional manifold, then evolve there. We won’t go deep into them in the workshop, but the POD picture is the right mental starting point.
3.6 Physics-informed machine learning: a conceptual preview
The core idea
A neural network u_\theta(\mathbf{x}, t) approximates the solution of a PDE (or ODE). The loss has a physics term — the equation residual evaluated by autodiff at scattered collocation points — plus boundary / initial conditions and (optionally) a data term:
\mathcal{L}(\theta) \;=\; \lambda_r\,\mathcal{L}_{\text{PDE}} + \lambda_b\,\mathcal{L}_{\text{BC}} + \lambda_i\,\mathcal{L}_{\text{IC}} + \lambda_d\,\mathcal{L}_{\text{data}}.
Minimising it simultaneously fits the data and respects the equation. No grid, no time-stepping, no mesh.
Mechanically the residual is the part that’s new: for the oscillator at the top of this unit, \mathcal{L}_{\text{PDE}}(\theta) = \frac{1}{N_r}\sum_i \bigl| \ddot u_\theta(t_i) + \omega^2 u_\theta(t_i) \bigr|^2, with the second derivative computed by autodiff straight through u_\theta.
What makes PIML different from other Sci-ML methods
- vs surrogates: PIML doesn’t need a precomputed dataset — the equation itself supervises the network.
- vs equation discovery: PIML assumes the equation is known and asks for its solution (or its parameters via the inverse problem).
- vs hybrid models: PIML is a special case where the entire solution field is the “ML part” and the equation residual is the “physics part”.
Where this lands in the workshop
Unit 4 takes the ODE picture from §3.1 and adds learnable terms — the Neural-ODE / UDE picture from §3.2. Unit 5 implements vanilla PINNs on the oscillator above and a small collection of ODE / PDE toys. Unit 6 provides the PDE foundations. Unit 7 addresses where naïve PINNs fail and how the modern fixes close the gap. Units 8–10 assemble the AIMS capstone — forward (Part I) and inverse (Part II).