using Plots
L = 1.0; nx = 101; α = 0.01; T = 5.0
dx = L / (nx - 1)
dt = 0.4 * dx^2 / α # just under the CFL stability limit
nt = ceil(Int, T / dt)
x = range(0, L, length=nx)
u = exp.(-200 .* (x .- 0.75).^2)
u[1] = 0.25; u[nx] = 0.25
u_new = similar(u)
for n in 1:nt
for i in 2:(nx-1)
u_new[i] = u[i] + α * dt / dx^2 * (u[i+1] - 2u[i] + u[i-1])
end
u_new[1] = 0.25; u_new[nx] = 0.25
u, u_new = u_new, u
end
plot(x, u, label="t = $T", xlabel="x", ylabel="u(x, t)", lw=2)Unit 6: PDE Bootcamp
Formal foundations of PDEs: classification (elliptic, parabolic, hyperbolic), boundary and initial conditions, well-posedness. The three canonical mechanisms — diffusion, advection, reaction — appear as the building blocks for everything that follows. The unit closes with finite-difference numerics and the linearised shallow water equations as a worked nonlinear example we revisit in the capstone.
6.1 PDE Classification and Well-Posedness
We work in physical space-time throughout. Spatial coordinates are x, y, z; time is t. The unknown u(x, y, z, t) stands in for whatever physical field a given equation models — temperature, pressure, concentration, displacement, electric potential. The Laplacian is
\nabla^2 u = \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} + \frac{\partial^2 u}{\partial z^2};
in 1D it reduces to u_{xx}, in 2D to u_{xx} + u_{yy}. Vector fields (velocity \mathbf{u}(x, y, z, t), etc.) appear in the systems at the bottom of the table below.
Common PDE Models at a Glance
A map of the PDEs we’ll meet, grouped by character. Most of classical physics lives in this table; the rest of the unit is about how to read, classify, and solve them.
| Name | Equation | Type | Typical application |
|---|---|---|---|
| Laplace | \nabla^2 u = 0 | Elliptic | Steady-state temperature, electrostatic potential |
| Poisson | \nabla^2 u = f | Elliptic | Potential with sources (gravity, charge) |
| Helmholtz | \nabla^2 u + k^2 u = 0 | Elliptic | Standing waves, frequency-domain wave problems |
| Heat / diffusion | u_t = \alpha\,\nabla^2 u | Parabolic | Heat conduction, mass diffusion |
| Reaction–diffusion | u_t = \alpha\,\nabla^2 u + f(u) | Parabolic (nonlinear) | Pattern formation, population dynamics |
| Advection | u_t + c\,\partial_x u = 0 | Hyperbolic (1st order) | Pure transport in a flow |
| Wave | u_{tt} = c^2\,\nabla^2 u | Hyperbolic | Vibrations, sound, electromagnetic waves |
| Advection–diffusion | u_t + c\,\partial_x u = \alpha\,u_{xx} | Mixed (parabolic + 1st-order hyperbolic) | Pollutant or heat in a flow |
| Burgers | u_t + u\,u_x = \nu\,u_{xx} | Nonlinear | Shock formation; simple fluid model |
| Shallow water | \eta_t + H\,\nabla\!\cdot\!\mathbf{u} = 0,\;\;\mathbf{u}_t + g\,\nabla\eta = 0 | Hyperbolic system | Ocean and atmospheric flow (revisited in §6.8) |
Order, Linearity, and Type
A partial differential equation (PDE) is a relationship between an unknown function of several variables and its partial derivatives. In general form,
F\!\left(x_1, \ldots, x_n,\, u,\, \frac{\partial u}{\partial x_1}, \ldots, \frac{\partial^2 u}{\partial x_1^2}, \ldots\right) = 0.
The order of a PDE is the highest derivative that appears. We focus on second-order PDEs in two variables — they cover most of the physics relevant to PINNs.
Elliptic, Parabolic, Hyperbolic
The general second-order linear PDE in two variables takes the form
a\,u_{xx} + 2b\,u_{xy} + c\,u_{yy} + \text{lower-order terms} = 0,
with coefficients that may depend on position. The character of solutions depends entirely on the discriminant \Delta = b^2 - ac:
| Type | Condition | Behaviour | Canonical example |
|---|---|---|---|
| Elliptic | b^2 - ac < 0 | Steady-state, smooth | Laplace: u_{xx} + u_{yy} = 0 |
| Parabolic | b^2 - ac = 0 | Diffusion, smoothing | Heat: u_t = \alpha\,u_{xx} |
| Hyperbolic | b^2 - ac > 0 | Wave propagation | Wave: u_{tt} = c^2\,u_{xx} |
Classification matters because it determines what boundary/initial conditions are needed for well-posedness, how information propagates, and which numerical methods are appropriate.
Boundary and Initial Conditions
The four common boundary-condition types:
- Dirichlet: prescribe the value of u on the boundary (“the wall is held at temperature T_{\text{wall}}”).
- Neumann: prescribe the normal derivative \partial_n u (“heat flux through the wall is q”).
- Robin (mixed): a linear combination \alpha u + \beta\,\partial_n u = g. Common when the boundary exchanges with an external medium.
- Periodic: u(x_{\text{left}}) = u(x_{\text{right}}) — used to model infinite domains by tiling.
For time-dependent problems, an initial condition u(\mathbf{x}, 0) = u_0(\mathbf{x}) fixes the state at t = 0. Parabolic and hyperbolic PDEs need ICs; elliptic problems do not (they’re steady-state).
Well-Posedness
Hadamard’s three criteria. A problem is well-posed if (1) a solution exists, (2) it is unique, (3) it depends continuously on the data (initial conditions, boundary conditions, source terms). Mismatching boundary conditions to PDE type produces ill-posed problems — for example, prescribing both u and \partial_n u on the same boundary of an elliptic equation generally has no solution. PINNs inherit this constraint: an ill-posed problem will train to a residual minimum that is not the (non-existent) classical solution.
Strong and weak solutions
A strong (or classical) solution of a PDE has as many classical derivatives as the equation needs, and satisfies the equation pointwise at every point of the domain. The heat equation u_t = \alpha u_{xx} needs u_t and u_{xx} to exist and be continuous; if you have a u in C^{2,1} that satisfies the equation at every (x, t), that’s a strong solution.
Most physical PDEs don’t admit strong solutions in general. As soon as you have shocks (Burgers, the compressible Euler equations), corners in the domain (an L-shaped boundary for Laplace), or rough initial data (a step function for the heat equation), the requisite pointwise derivatives don’t exist everywhere. The way out is to relax what “solving” means.
A weak (or generalised) solution satisfies the PDE on average against every test function. Concretely, multiply u_t = \alpha u_{xx} by a smooth compactly-supported test function v(x, t), integrate by parts to push the spatial derivative onto v, and demand the resulting integral identity holds for every v:
\int_\Omega \int_0^T u\, v_t \, dt\, dx \;=\; \alpha \int_\Omega \int_0^T u_x\, v_x \, dt\, dx \qquad \forall v.
A weak solution only needs u to have one spatial derivative in an L^2 sense — much weaker than C^{2}. Any strong solution is automatically a weak one; the reverse is not true.
Why this matters here:
- The finite-element method (§6.6) is built directly on the weak form — it discretises the integral identity, not the pointwise PDE.
- PINNs sit somewhere in between. They enforce the residual pointwise at collocation points (strong-form-ish), but the loss aggregates over many points, which behaves like a weak form. This is one reason PINNs handle moderately rough solutions better than naive finite differences but still struggle with genuine shocks.
We won’t develop the weak / Sobolev theory rigorously — that’s a graduate course in itself. But the distinction is worth knowing about every time someone tells you “this PDE has a solution”: in what sense?
6.2 Explicit Solution Techniques
Before we approximate a PDE numerically (let alone with a neural network), it pays to know which problems admit a closed-form solution and how those solutions are derived. The classical techniques aren’t museum pieces: they’re the source of the correct answers against which every numerical scheme — finite differences, finite volumes, finite elements, PINNs — is benchmarked. The next three subsubsections walk the three main tools: separation of variables, Fourier series, and Laplace transforms. Each is presented at a level that yields one concrete worked solution.
Separation of variables: heat equation on a rod
The cleanest example. Take the heat equation on a finite rod of length L with both ends held at zero:
u_t \;=\; \alpha\, u_{xx}, \qquad 0 < x < L,\ t > 0,
with Dirichlet boundaries u(0, t) = u(L, t) = 0 and initial profile u(x, 0) = f(x).
Step 1 — assume a separated form u(x, t) = X(x)\,T(t). Substitute:
X(x)\,T'(t) = \alpha\, X''(x)\, T(t) \quad\Longleftrightarrow\quad \frac{T'(t)}{\alpha\, T(t)} = \frac{X''(x)}{X(x)} = -\lambda,
where the constant -\lambda (the separation constant) must hold because the left side depends only on t, the right only on x.
Step 2 — solve each ODE. The spatial eigenproblem X'' + \lambda X = 0 with X(0) = X(L) = 0 has solutions only when \lambda = \lambda_n = (n\pi/L)^2 for n = 1, 2, 3, \ldots, namely X_n(x) = \sin(n\pi x / L). The time equation T' = -\alpha\lambda_n T has T_n(t) = e^{-\alpha\lambda_n t}.
Step 3 — superpose. Linearity of the PDE means any sum u(x, t) = \sum_{n=1}^{\infty} b_n \sin(n\pi x/L)\, e^{-\alpha (n\pi/L)^2 t} is also a solution. The initial condition fixes the coefficients:
f(x) \;=\; \sum_{n=1}^{\infty} b_n \sin(n\pi x / L) \qquad\Longrightarrow\qquad b_n \;=\; \frac{2}{L}\int_0^L f(x)\,\sin(n\pi x / L)\, dx.
The complete solution is
\boxed{\; u(x, t) \;=\; \sum_{n=1}^{\infty} b_n\, e^{-\alpha (n\pi/L)^2 t}\, \sin\!\bigl(n\pi x / L\bigr). \;}
Three observations that recur all the way through the rest of the course:
- High-frequency modes decay fastest. The mode n decays like e^{-\alpha n^2 \pi^2 t / L^2} — a factor of n^2 in the exponent. This is why diffusion smooths: sharp features (high n) die quickly; the long-wavelength shape survives. It is also why MLP-PINNs (Unit 5 §5.6) struggle to fit sharp initial data — the loss surface is dominated by the slow low-frequency modes (a different reason, but a related one called spectral bias).
- Linearity of the PDE → superposition of eigenmodes. Nonlinear PDEs (Burgers, Navier–Stokes, the full shallow-water equations) don’t admit this decomposition. That’s the structural reason they’re harder.
- The eigenmodes are determined by the boundary conditions. Dirichlet boundaries gave us \sin(n\pi x / L). Neumann boundaries would give \cos; periodic would give complex exponentials. Everything downstream cascades from the BC choice.
Fourier series: the eigenmode toolbox
The expansion f(x) = \sum_n b_n \sin(n\pi x / L) from Step 3 is a Fourier sine series. More generally, a function on [-L, L] has a full Fourier series
f(x) \;=\; \frac{a_0}{2} + \sum_{n=1}^{\infty}\Bigl[a_n \cos\!\bigl(n\pi x / L\bigr) + b_n \sin\!\bigl(n\pi x / L\bigr)\Bigr], \qquad \begin{aligned} a_n &= \tfrac{1}{L} \int_{-L}^{L} f(x)\cos(n\pi x/L)\, dx, \\ b_n &= \tfrac{1}{L} \int_{-L}^{L} f(x)\sin(n\pi x/L)\, dx. \end{aligned}
The Fourier coefficients are the coordinates of f in the orthonormal basis of trig functions — the same way you’d write a vector in \mathbb{R}^n in terms of basis vectors. Three uses that turn up across the rest of the workshop:
- Solve linear PDEs by mode-by-mode evolution. Take the Fourier transform of the initial condition, multiply each mode by its time-dependent factor (exponential decay for heat, sinusoidal oscillation for wave), then inverse-transform. This is the basis of spectral methods.
- Diagnose numerical schemes. Von Neumann stability analysis (covered in §6.4) studies how an FD scheme amplifies each Fourier mode. The CFL condition falls out automatically.
- Diagnose PINN failure modes. The “spectral bias” of MLPs (slow learning of high-frequency components, Rahaman et al.
- and the Fourier-feature embedding fix (Tancik et al. 2020; details in Unit 7) are stated and understood in the Fourier basis.
Laplace transforms: turning ODEs into algebra
For initial-value problems (parabolic and hyperbolic PDEs in time), the Laplace transform \mathcal{L}\{f(t)\}(s) = \int_0^\infty f(t)\,e^{-st}\, dt converts time derivatives into multiplication:
\mathcal{L}\{f'(t)\}(s) \;=\; s\,F(s) - f(0), \qquad \mathcal{L}\{f''(t)\}(s) \;=\; s^2\,F(s) - s\, f(0) - f'(0).
Apply it to a PDE in t and you get an ODE in the remaining spatial variables, parameterised by s. Solve the spatial ODE, inverse-transform back to t.
The canonical worked example is the 1-D heat equation on the half-line x > 0 with a suddenly imposed boundary temperature u(0, t) = u_0 and u(x, 0) = 0. Laplace-transforming in t:
s\, U(x, s) \;=\; \alpha\, U_{xx}(x, s), \quad U(0, s) = u_0 / s, \quad U(x, s) \to 0 \text{ as } x \to \infty.
A second-order ODE in x with constant coefficients. Solving gives U(x, s) = (u_0/s)\, e^{-x\sqrt{s/\alpha}}, and the inverse Laplace transform yields the famous closed-form:
u(x, t) \;=\; u_0 \,\operatorname{erfc}\!\left(\frac{x}{2\sqrt{\alpha t}}\right).
The pattern — transform → solve algebraic / lower-order problem → inverse-transform — is exactly what the Fourier method does in the spatial direction. They’re two faces of the same idea: expand in eigenfunctions of the differential operator.
When closed forms run out
Almost everything in this course fails one or more of the conditions for these methods to work:
- Nonlinearity kills separation of variables and superposition (Burgers, Navier–Stokes, the shallow-water equations as soon as you re-introduce the advective \mathbf{u}\!\cdot\!\nabla\mathbf{u} term).
- Variable coefficients (\alpha = \alpha(x), H = H(x, y) as in Unit 1’s bay) prevent the spatial eigenproblem from having sinusoidal eigenfunctions.
- Irregular geometries (the Moreton Bay coastline, anything with a hole in it) defeat any analytic eigenbasis.
That’s why the rest of this unit — and the rest of the course — is about numerical methods. But the classical methods give us the test problems against which every numerical method is judged. The §6.4 FD code below converges to the separation-of-variables solution; the SWE PINN in Unit 7 is benchmarked against a high-resolution FD reference; the column PINN in Unit 9 starts its sanity-checks on a constant-\alpha problem with a known erfc solution.
6.3 The Five Canonical PDEs
Most of classical mathematical physics rests on five equations. They reappear (sometimes with changed names, always with the same structure) in heat conduction, electromagnetism, quantum mechanics, fluid flow, finance, biology, traffic. This section takes each in turn and tells you what it physically describes, not just what it looks like.
The Heat (diffusion) equation — smoothing
\frac{\partial u}{\partial t} \;=\; \alpha\,\nabla^2 u.
Parabolic. Time-dependent. The fundamental “things smear out” equation. Imagine a rod with a hot spot in the middle; heat diffuses outward until the temperature equilibrates.
Why second-derivative-in-space? The second derivative is curvature. At a hill in the profile, u_{xx} < 0, so u_t < 0 — the hill drops. At a valley, u_{xx} > 0, the valley fills. Curvature drives change, and the curvature itself relaxes as the profile flattens. That single line of reasoning explains why the heat equation:
- Smooths fast. Sharp features have huge curvature → decay fast. Smooth features have low curvature → decay slowly.
- Has infinite propagation speed. A localised hot spot instantly raises the temperature everywhere (by an exponentially small amount). Not physically realistic at relativistic scales, but accurate for everyday heat / mass diffusion.
- Loses information. The smoothing is irreversible — starting from u(x, T) you cannot in general recover u(x, 0). This makes the backward heat equation catastrophically ill-posed and a classic test case for regularisation theory.
Where it shows up beyond temperature. Chemical diffusion (Fick’s law), Brownian motion (the Fokker–Planck equation in its simplest form), option pricing (Black–Scholes is heat- equation-shaped), ocean / atmosphere turbulent mixing (Unit 8 §8.3 with \alpha = \kappa). Same equation, different units.
The Laplace equation — steady-state equilibrium
\nabla^2 u \;=\; 0.
Elliptic. No time variable. What happens when the heat equation has been running forever and everything has equilibrated: u_t = 0 means \nabla^2 u = 0. The Laplace equation describes steady states of diffusive processes.
The physical content is the mean-value property: the solution at any interior point equals the average of its neighbours on any small sphere around it. No interior maxima or minima — the extrema live on the boundary (maximum principle). This is why Laplace solutions look “stretched-membrane smooth”: rubber sheet pinned at the boundary, no source forces in the interior.
Where it shows up.
- Steady-state heat conduction (no sources, no time).
- Electrostatic potential in a charge-free region.
- Incompressible irrotational flow (the velocity potential).
- Conformal mapping in complex analysis (harmonic functions are the real parts of holomorphic functions).
- Steady-state diffusion of any chemical species.
The Poisson equation — Laplace with a source
\nabla^2 u \;=\; -f(\mathbf{x}).
Elliptic. Identical to Laplace except for the right-hand-side source f. The source physically deposits the quantity (charge, heat, mass) that the diffusion-and-equilibrium process then spreads out. Mean-value-property still holds up to a correction proportional to the source intensity.
Where it shows up.
- Electrostatics with charges: \nabla^2 \varphi = -\rho / \varepsilon_0.
- Gravitational potential with masses: \nabla^2 \Phi = 4\pi G\, \rho.
- Steady-state heat conduction with internal heat sources.
- Stream function ↔︎ vorticity in 2-D incompressible flow: \nabla^2 \psi = -\omega.
The Laplace / Poisson pair is the workhorse of “what’s the steady-state response to this source distribution?”
The Wave equation — propagation
\frac{\partial^2 u}{\partial t^2} \;=\; c^2\,\nabla^2 u.
Hyperbolic. Time-dependent, but unlike heat: time and space play similar roles. This is the equation for anything that oscillates and propagates: vibrating strings, sound waves, electromagnetic waves, water waves (linearised), seismic waves.
Why second-derivative-in-time? Newton’s F = ma — the acceleration u_{tt} of a string element is proportional to the net force, which is proportional to the curvature u_{xx} of the string (the spatial second derivative measures how much the neighbours pull this point up or down). The constant c^2 — the wave speed squared — is the ratio of restoring tension to inertia.
Three consequences that distinguish the wave equation from the heat equation:
- Finite propagation speed c. Information travels along characteristics x \pm c\,t = \text{const}; the “light cone” in 1+1 dimensions. A disturbance at (x_0, 0) cannot affect (x_1, t) unless |x_1 - x_0| \leq c\,t.
- No smoothing. A sharp initial profile stays sharp forever (in fact it splits into a left-moving and a right-moving copy — d’Alembert’s formula).
- Energy is conserved. \frac{d}{dt} \int (u_t^2 + c^2 u_x^2)\, dx = 0 — the wave equation is Hamiltonian. Numerical schemes that don’t respect this conservation drift over long simulations.
Where it shows up.
- Acoustics: u = pressure perturbation, c = sound speed.
- Optics / electromagnetism: each component of Maxwell’s electric and magnetic fields satisfies a wave equation in vacuum.
- Seismology: P-waves and S-waves are wave-equation solutions in elastic media.
- The Unit 1 shallow-water surge — once we drop the drag term, what’s left is exactly a 2-D wave equation \eta_{tt} = \nabla\!\cdot\!(gH\,\nabla\eta) for the surface elevation (see §6.8).
The Advection equation — pure transport
\frac{\partial u}{\partial t} \;+\; c\,\frac{\partial u}{\partial x} \;=\; 0.
Hyperbolic, first-order. The simplest possible PDE: “a thing moves at speed c without changing shape”. The exact solution is u(x, t) = u_0(x - c t) — the initial profile sliding rigidly.
Information travels along characteristics. The lines x - c t = \text{const} in space-time are the trajectories of the solution’s “DNA” — the value of u on a characteristic is the same as at t = 0 at the point that characteristic came from. This is the cleanest case of the method of characteristics for first-order PDEs.
Three useful facts:
- No smoothing, no diffusion. A step function stays a step function forever. Real fluid models have a small diffusive term that gradually rounds the corners — Burgers’ equation u_t + u u_x = \nu u_{xx} adds nonlinearity and diffusion; in the inviscid limit it forms shocks.
- CFL constraint in numerics is sharp: |c|\,\Delta t / \Delta x \leq 1 for first-order upwind. Information from any cell must be able to reach the next cell within one time step.
- The advective u\,u_x term in Navier–Stokes / shallow water is what makes those equations nonlinear — and what makes them genuinely hard. Linearised models (including the Unit 1 shallow-water solver) drop this term and become tractable as wave equations.
Where it shows up. Anywhere stuff is transported: tracer dye in a river, contaminant plume in groundwater, traffic density on a freeway, smoke advected by wind, sea-surface temperature carried by ocean currents.
Combining mechanisms — reaction-diffusion and friends
Real models stack diffusion, advection, and pointwise reaction (a source term f(u) that depends on u itself — not just on \mathbf{x}):
| Equation | Mechanisms | Example application |
|---|---|---|
| u_t = \alpha\,u_{xx} | Diffusion | Heat conduction |
| u_t + c\,u_x = 0 | Advection | Pollutant in a river |
| u_t + c\,u_x = \alpha\,u_{xx} | Advection + diffusion | Nutrient dispersal in a flow |
| u_t = \alpha\,u_{xx} + f(u) | Reaction-diffusion | Coral larval settlement; FitzHugh-Nagumo neuron; Turing patterns |
| u_{tt} = c^2\,\nabla^2 u - \gamma u_t | Wave with damping | Vibrating string with friction |
Each row inherits the qualitative behaviour of the parent mechanisms — diffusion smooths, advection translates, reaction sources/sinks. The mixed cases are where the modelling judgement lives: which terms to keep, which to drop, which constants to estimate. That’s the choice the capstone column PDE Unit 8 §8.2 makes for the oceanographic setting.
6.4 Finite Differences and Stability Analysis
The classical workhorse — and the method actually used by Unit 1’s shallow-water solver. The next three subsubsections cover the discretisation rules, the stability constraint they impose (CFL, von Neumann), and a worked benchmark against the separation-of-variables answer.
Discretisation in Space and Time
Replace continuous derivatives with algebraic approximations on a grid:
- Forward time difference: \partial_t u \approx (u_i^{n+1} - u_i^n)/\Delta t
- Central second-derivative: \partial_x^2 u \approx (u_{i+1} - 2u_i + u_{i-1})/\Delta x^2
The forward-time, central-space (FTCS) scheme is the simplest explicit approach for the heat equation.
CFL, Stability, Numerical Diffusion
The Courant–Friedrichs–Lewy (CFL) condition ensures the numerical domain of dependence contains the physical one — i.e., information can’t propagate faster in the physics than the grid can pass it along:
- Diffusion: \alpha\,\Delta t / \Delta x^2 \leq 0.5
- Advection / wave: |c|\,\Delta t / \Delta x \leq 1
Violating CFL causes exponential blow-up. Where does the CFL number come from? Von Neumann analysis: substitute a discrete Fourier mode u_i^n = G^n\, e^{i k x_i} into the scheme, solve for the amplification factor |G(k)|, and demand |G(k)| \leq 1 for every k. Working through this for FTCS on the heat equation yields exactly the \alpha\,\Delta t / \Delta x^2 \leq 1/2 bound above. The Fourier-mode analysis from §6.2 is the analytic tool that makes the numerical stability constraint predictable.
Two other artefacts of low-order schemes to be aware of:
- Numerical diffusion. First-order upwind for advection smears sharp features even at stable step sizes. The “diffusion” is real (it shows up in the modified equation) but artificial. Higher-order schemes (Lax–Wendroff, WENO, discontinuous Galerkin) reduce it at the cost of new artefacts (oscillations near shocks).
- Numerical dispersion. Different Fourier modes travel at slightly different speeds in the discrete scheme even if the physics is non-dispersive. Long-time simulations of wave-like problems accumulate phase errors. Spectral methods don’t suffer this; FD does.
Worked Example: Heat Equation vs the Exact Solution
A useful self-test for any FD scheme: pose the heat equation with the same Dirichlet BCs as the separation-of-variables solution from §6.2 (homogeneous Dirichlet, u(x, 0) = \sin(\pi x / L), L = 1), run the FTCS scheme above, and compare against the closed form u(x, t) = \sin(\pi x)\, e^{-\alpha \pi^2 t}. For \alpha = 0.01, \Delta x = 0.01, \Delta t = 0.4\,\Delta x^2 / \alpha, the FTCS solution agrees with the exact answer to \sim 10^{-3} over t \in [0, 5] — better than the rounded plot you’d see by eye. The script in §6.4 above is one line away from this benchmark (swap the IC).
6.5 Finite Volumes
Finite differences (§6.4) discretise the equation — replace derivatives with neighbour differences. Finite volumes (FV) discretise the conservation statement — for a chosen control volume V, the rate of change of \int_V u\,dV equals the net flux through \partial V. Two consequences:
- Local conservation is exact. A flux that leaves cell i must enter cell i \pm 1 because they share the face. Mass / momentum / energy can’t drift, by construction. FD doesn’t have this property — small errors can accumulate over long runs.
- The mesh can be unstructured. Control volumes are just polygons or polyhedra; you don’t need a Cartesian grid. This makes FV the obvious choice for irregular domains (river networks, coastlines, blood vessels).
The 1-D conservation-law example
For a generic 1-D conservation law u_t + f(u)_x = 0, divide the x-axis into cells of width \Delta x centred at x_i and define \bar u_i^n \approx \frac{1}{\Delta x}\int_{x_{i-1/2}}^{x_{i+1/2}} u(x, t_n)\,dx — the cell average. Integrating the PDE over the cell:
\frac{\bar u_i^{n+1} - \bar u_i^n}{\Delta t} \;=\; -\,\frac{F_{i+1/2}^n - F_{i-1/2}^n}{\Delta x},
where F_{i\pm 1/2} are the numerical fluxes at the cell faces — typically computed from a Riemann solver in compressible flow, or a simple upwind / Lax–Friedrichs formula for scalar advection. Choice of numerical flux is what distinguishes the zoo of FV schemes (Godunov, Roe, HLL, HLLC, AUSM, etc.).
The 2-D shallow-water solver Unit 1 uses is technically a finite-difference scheme on a staggered grid, but it can be re-derived as a finite-volume scheme on the same staggered cells — the bookkeeping is more natural in FV terms (the mass-continuity equation literally is “flux in minus flux out”).
When you reach for finite volumes
- Conservation laws (Euler, shallow water, Maxwell in conservation form, magnetohydrodynamics).
- Shocks and discontinuities — FV with a TVD limiter handles them without spurious oscillations; FD struggles.
- Unstructured grids around complex geometry.
Production tools that use FV: OpenFOAM, ANSYS Fluent, Star-CCM+, ClimateMachine.jl, the Trixi.jl Julia stack.
6.6 Finite Elements
Finite elements (FE) approach a PDE the way physics undergrads think about it: not as “set derivatives equal to algebraic expressions on a grid” but as “find a function in a chosen function space that satisfies the equation in a weak sense”. The three ideas:
- Weak (variational) formulation. Multiply the PDE by a smooth test function v(\mathbf{x}) and integrate by parts. Where FD/FV ask the PDE to hold pointwise (at grid centres or cell averages), FE asks it to hold on average against every test function in some space. For the Poisson problem -\nabla^2 u = f on \Omega with u = 0 on \partial\Omega, integrating by parts gives the weak form \int_\Omega \nabla u \cdot \nabla v\,d\mathbf{x} \;=\; \int_\Omega f\, v\, d\mathbf{x} \qquad \forall v \in H^1_0(\Omega).
- Galerkin discretisation. Pick a finite-dimensional subspace V_h \subset H^1_0(\Omega) with basis \{\phi_1, \ldots, \phi_N\} — typically piecewise polynomials on a triangulation of \Omega. Look for u_h = \sum_j u_j \phi_j that satisfies the weak form for every v = \phi_i. This collapses to a linear system: \underbrace{\Bigl[\int_\Omega \nabla\phi_i \cdot \nabla \phi_j\,d\mathbf{x}\Bigr]}_{K_{ij}\text{ — stiffness matrix}} \cdot \mathbf{u} \;=\; \underbrace{\Bigl[\int_\Omega f\,\phi_i\,d\mathbf{x}\Bigr]}_{b_i\text{ — load vector}}.
- Assembly. Compute K and b by looping over elements (the triangles), evaluating local contributions, and adding them in. Solve K\mathbf{u} = b with a direct or iterative solver. Done.
Why anyone bothers
- Arbitrary geometry. Triangulate any shape — coastlines, airframes, organs — to as much detail as you want; FE handles it natively. FD’s regular grid can’t.
- Higher-order accuracy on smooth solutions. Cubic or higher basis polynomials give O(\Delta x^p) accuracy with p the polynomial degree — FD/FV typically cap at second order in practice.
- Rigorous error theory. The standard FE convergence proofs (Céa’s lemma, Galerkin orthogonality, Bramble–Hilbert) give you a-priori and a-posteriori error bounds that justify adaptive mesh refinement.
When you reach for finite elements
- Elliptic problems on irregular domains (structural mechanics, electrostatics, steady heat conduction).
- Coupled multiphysics where you need the same mesh to carry multiple fields with different orders (Stokes flow with Taylor-Hood, electromagnetics with Nédélec elements).
- Problems where guaranteed conservation across a complex interface matters and FV’s cell-by-cell bookkeeping is too rigid — e.g. mixed/hybrid FE for porous-media flow.
Production tools that use FE: COMSOL, ANSYS Mechanical / Mapdl, Abaqus, FEniCS, deal.II, Gridap.jl, Ferrite.jl.
6.7 FD vs FV vs FE — when to reach for which
A quick comparison table. None of these is “best” — they pay for their strengths with different weaknesses, and the right choice is dictated by the problem.
| Finite Differences | Finite Volumes | Finite Elements | |
|---|---|---|---|
| Discretises | the PDE (pointwise) | the conservation statement (cell averages) | the weak form (against test functions) |
| Mesh | structured Cartesian | structured or unstructured | unstructured, complex geometry |
| Conservation | not automatic | exact, by construction | weak (needs special elements) |
| Best for | smooth solutions on simple geometry, PINN-style residuals | conservation laws, shocks, fluid dynamics | elliptic problems, complex geometry, multiphysics |
| Worst for | irregular geometry, shocks | mathematical analysis, very high order | hyperbolic problems with shocks |
| Typical tools | hand-rolled, Trixi.jl | OpenFOAM, Fluent, Trixi.jl | COMSOL, FEniCS, Gridap.jl, Ferrite.jl |
Where PINNs sit
A PINN is meshless: it parameterises the solution as a network u_\theta(\mathbf{x}, t) and asks for the PDE residual to be small at scattered collocation points. The most natural comparison is to FD — both discretise the PDE pointwise — but PINNs trade structured-grid efficiency for autodiff-exact derivatives at arbitrary coordinates, no mesh, and a differentiable forward map (the key enabler for Unit 7 inverse problems). PINNs are not yet competitive with FV for shocks (smooth-MLP ansatz rounds them off) or with FE for complex elliptic problems (no provable error bounds), so the classical methods aren’t going anywhere.
6.8 Revisiting Unit 1: the shallow-water solver in detail
Unit 1 showed a complete linearised shallow-water solver as a black box: input ψ(t), output gauge timeseries plus snapshots of \eta. Now that we have classical PDE language, we can unpack what that solver does, why each ingredient was needed, and what kinds of numerical issues necessarily came up.
The system, in PDE language
The 2-D linearised shallow-water equations Unit 1 used are
\underbrace{\eta_t + \nabla\!\cdot\!(H\,\mathbf{u}) = 0}_{\text{continuity}}, \qquad \underbrace{\mathbf{u}_t + g\,\nabla\eta = -b\,\mathbf{u}}_{\text{momentum}}.
By the §6.1 classification this is a hyperbolic system — the characteristic speeds are \pm \sqrt{g\,H}, propagating information through the bay at \sim 8–15 m/s depending on depth. Two implications follow immediately:
- Both ICs and BCs are needed. Hyperbolic systems propagate information along characteristics, so a well-posed problem needs both an initial state (provided: \eta = 0, \mathbf{u} = 0 everywhere at t = 0) and boundary data on the inflow characteristics (provided: the Dirichlet ψ(t) at the river mouth, no-flux at solid coasts).
- CFL is set by the fastest wave. \Delta t \leq \Delta x / \max\sqrt{g\,H}. Unit 1 uses \Delta x = 500 m and max depth ~50 m → c_{\max} \approx 22 m/s, so \Delta t \lesssim 22.7 s. The chosen \Delta t = 12 s sits comfortably inside that bound.
The staggered Arakawa-C grid
Unit 1’s solver lives on an Arakawa-C grid: \eta at cell centres, u at vertical cell faces, v at horizontal cell faces. The reason matters. On a co-located grid — all variables at cell centres — central differences of \nabla \eta in the momentum equation skip the immediate neighbours, producing a checkerboard mode that the scheme cannot damp. The staggered layout makes \nabla \eta in the momentum equation use adjacent face-to-face differences, killing the spurious mode at source. This is one of the oldest fixes in geophysical fluid dynamics (Arakawa & Lamb 1977) and it’s why every operational ocean / atmosphere model uses some flavour of staggered grid.
The eastern sponge layer
A finite computational domain has hard boundaries the real ocean doesn’t. Without intervention, a surge wave reaching the eastern edge would reflect back into the bay, contaminating the gauge signal. Unit 1 fixes this with a Rayleigh-damped sponge: a strip of cells where an extra term -\sigma(x)\,\eta is added to the continuity equation, with \sigma ramping smoothly from 0 inside the domain to a large value at the edge.
In PDE terms the sponge approximates a radiation (open) boundary condition — outgoing energy leaves, incoming energy is zero. Doing this exactly for a multi-frequency wave equation is hard (perfectly-matched layers, PMLs, are the modern best-practice solution). The Rayleigh sponge is the cheap and cheerful approximation that nonetheless gets you 99% of the way there.
What can still go wrong
Even with a staggered grid + sponge + correct CFL, a hyperbolic FD solver has a few characteristic failure modes worth flagging:
- Numerical dispersion of short-wavelength surges. The 500 m grid resolves features down to maybe 1–2 km; anything sharper gets travel-speed errors. Unit 1’s two-pulse surge is broad enough that this is invisible — a sharp injected impulse would show frequency-dependent travel times.
- Numerical diffusion from the upwind treatment of any remaining advective terms. Not an issue in Unit 1’s linearised setting (no advection), but the moment we add a real surge current the standard 1st-order upwind would smear it.
- Reflection from imperfect sponges. The Rayleigh sponge has a small impedance mismatch with the inner domain; ~1% of the incoming wave reflects. Long-time simulations or noise-sensitive inverse problems will see it. PMLs help but cost more.
Why a PINN at all?
Given that the FD solver in Unit 1 works fine for the forward problem, why would we ever reach for a PINN? Three reasons, foreshadowing Units 7 / 9:
- Inverse problems. The FD forward solve is the easy part. Inverting backward — given gauge data, recover the source ψ(t) — needs gradients of the forward map with respect to the inputs. PINNs have those by construction (they’re trained by gradient descent through autodiff); the FD solver needs the adjoint-method machinery of Unit 4 §4.2.
- Continuous-resolution inference. The FD solution lives on a fixed 500 m grid. A trained PINN can evaluate \eta(x, y, t) at any continuous coordinate — useful for upscaling, sub-grid interpolation, or fitting irregular gauge layouts.
- Sparse-data regimes. When the data are sparse enough that the FD inverse problem is severely ill-posed, the PINN’s smooth-function prior can stabilise the reconstruction in a way the FD adjoint cannot.
The three reasons above are not “PINNs are faster” or “PINNs are more accurate” — for the Unit 1 forward problem they’re neither. They’re: PINNs offer a different kind of access to the solution. Unit 7 makes that access actually work in practice.