Project Solution

Published

12/06/2026

This unit is the worked solution to the capstone spec laid out in Unit 9. The science behind the model sits in Unit 8. Here we cover three things in this order:

Each solution sits behind its own triple-click gate, so you can read the shared infrastructure (and one task’s solution, if you want) without spoiling the other. We strongly recommend you attempt each task yourself first — the learning is in working through the inverse problem, not in reading our answer.

10.1 Shared infrastructure

The finite-difference reference solver, the four toy scenarios from Unit 9 §9.11, and the SWE driver — all needed regardless of whether you’re solving Task A or Task B.

Finite-difference reference solver

A method-of-lines finite-difference reference for T(z, t) — the ground truth we compare PINN predictions against. Uses MethodOfLines.jl with adaptive time-stepping; second-order central differences in z, with the surface Neumann condition imposed via a ghost cell. The full source listing is in §10.6; below we work through the four scenarios from Unit 9 §9.11 of increasing complexity.

Scenario 1 — pure diffusion to steady state

w = 0, \mathcal{S} = 0, \kappa constant, Q_{\text{np}} constant. After ~365 days the column reaches a near-steady state.

Scenario 1 output — scenario_1() from scripts/column_fd.jl. MOL final profile matches the analytic steady state to ≈ 4 × 10⁻³ °C.

Scenario 2 — diurnal cycle

Adds Q_{\text{SW}}(t) (smooth half-cosine, mean 400 W/m²) and the Beer–Lambert body source \mathcal{S}.

Scenario 2 output — top-10 m heatmap with subtle diurnal bands superimposed on the slow relaxation toward the scenario-1 steady state.

Scenario 3 — upwelling

Adds the w(z) = w_0 \sin(\pi(z+H)/H) profile with w_0 = 10^{-5}\,\text{m/s}.

Scenario 3 output — five-mooring traces (top) and full (z, t) heatmap (bottom). SST drops 28 → 22 °C; mid-column warms via diffusion (Pe ≈ 1).

Scenario 4 — storm fingerprint (prescribed gust)

Gaussian gust at t_{\text{storm}} = 10 days, \sigma = 1 day, modulating both w_0 (×5 at peak) and Q_{\text{SW}}^{\max} (×0.5 at peak via cloud cover). Real SWE-driven forcings will replace these envelopes in scenario 5 (the synthetic scenario, forward).

Scenario 4 output — scenario_4() uses prescribed Gaussian envelopes for the gust-driven upwelling and cloud cover, pending the SWE driver. The storm bump is visible around day 10.

Shallow-water driver

A staggered-grid finite-volume SWE reference on a 100 × 100 km patch of the central GBR shelf, with constant background depth H = 80\,\text{m}, the local Coriolis parameter f = 2\Omega\sin(-19°) \approx -4.7 \times 10^{-5}\,\text{s}^{-1}, and a spatially-Gaussian wind-stress patch driving the storm:

\tau(x, y, t) \;=\; \tau_0(t)\,\exp\!\left[-\,\frac{(x - x_0)^2 + (y - y_0)^2}{2\,L_\tau^2}\right],

with L_\tau = 30\,\text{km}, (x_0, y_0) tracking a slow-moving gust centre (10 km/h drift over 3 days), and \tau_0(t) the Gaussian-envelope time profile that becomes the unknown in the inverse problem. The forward solve runs Trixi.jl-style FV on a 200 \times 200 grid with \Delta t = 30\,\text{s} for the 30-day window — about 1.5 minutes on a laptop.

What gets piped to the column model: \mathbf{u}_h(t) at each of the three mooring locations, from which w(z, t) = -(z + H)\, \nabla_h\!\cdot\!\mathbf{u}_h(t) (incompressibility + linear-in-z ansatz from Unit 9 §9.2) gives the column’s vertical advection. This replaces the prescribed Gaussian w_0(t) of toy scenario 4 with a physically self-consistent w(t) that includes the Coriolis turning of the wind-driven current, the propagation delay from gust centre to each mooring, and the geometric spreading that makes Site C see a smaller w_{\max} than Site A despite being further from coast.

The driver is in scripts/swe_driver.jl; its output data/w_drivers.csv has columns t, w_A, w_B, w_C at hourly cadence.

Note✏️ Section exercise — predict scenario 1’s steady state exactly

Scenario 1 has an analytic answer; derive it before peeking at the plot. With w = 0, \mathcal{S} = 0, constant \kappa, constant Q_{\text{np}} = -Q_{\text{cool}}, and the two BCs (\kappa\,\partial_z T|_0 = Q_{\text{np}}/(\rho_0 c_p), T(-H) = T_{\text{deep}}):

  1. Show the steady solution is the straight line T(z) = T_{\text{deep}} + \frac{Q_{\text{np}}}{\rho_0\, c_p\, \kappa}\,(z + H), and compute the surface temperature for H = 100 m, \kappa = 10^{-3}\,\text{m}^2/\text{s}, Q_{\text{cool}} = 200\,\text{W/m}^2, T_{\text{deep}} = 18\,°\mathrm{C}. (You should get a surface colder than the deep reservoir — make sure you can explain why steady cooling does that.)
  2. Estimate how long the column takes to reach that steady state (T_\kappa = H^2/\kappa) and reconcile it with the “~365 days” in the scenario-1 caption.
  3. If you have the repo cloned, run scenario_1() from units/unit_10/scripts/column_fd.jl and check the FD endpoint against your line.

💡 Hint

Steady + no advection + no source + constant κ means \partial_z(\kappa\,\partial_z T) = 0 — a straight line. The Neumann BC hands you the slope directly; the Dirichlet BC pins the intercept at z=-H. If your surface comes out warmer than the deep, check the sign of Q_{np}.

10.2 Solution to Task A

🔒 Solution to Task A — triple-click to reveal 0 / 3 clicks

Spec: Unit 9 §9.9. Site A (Cleveland Bay, H = 15 m), one driver to recover, ~30 min on a laptop CPU.

The full pipeline runs in three Julia scripts under units/unit_10/scripts/: task_a_forward_pinn.jl (forward sanity check), task_a_inverse_pinn.jl (joint inverse), task_a_diagnostics.jl (residual histograms + heat-budget closure + plots). Each one is ~120 lines of Lux + Zygote + Optimisers code following the skeleton of Unit 5 §5.3 — short enough to read end-to-end.

Step 1 — Forward PINN (sanity check before inversion)

A single MLP T_\theta(z, t) with 4 hidden layers of 32 neurons each (\tanh activations). The forward problem isn’t the goal, but solving it first proves that the architecture + loss weights can reproduce the FD reference before we ask the network to also recover an unknown driver.

Hard BC enforcement at z = -H via

T_\theta(z, t) = T_{\text{deep}} + (z + H)\,N_\theta(z, t),

so the bottom Dirichlet BC drops out of the loss for free (Unit 7 §7.3). The surface flux stays as a soft penalty \lambda_b \bigl(\rho c_p \kappa\, \partial_z T_\theta - Q_{\text{np}}\bigr)^2 because making it hard requires an ansatz that fights the diurnal cycle.

Loss (forward, fixed \tau):

\mathcal{L}_{\text{fwd}}(\theta) \;=\; \lambda_r\,\frac{1}{N_r}\!\sum_{i=1}^{N_r} \bigl|r_\theta(z_i, t_i)\bigr|^2 \;+\; \lambda_b\,\frac{1}{N_b}\!\sum_{j=1}^{N_b} \bigl|\rho c_p \kappa\, \partial_z T_\theta(0, t_j) - Q_{\text{np}}(t_j)\bigr|^2 \;+\; \lambda_i \bigl|T_\theta(z, 0) - T_0(z)\bigr|^2.

with r_\theta = \partial_t T_\theta + w\,\partial_z T_\theta - \partial_z(\kappa\,\partial_z T_\theta) - \mathcal{S}, N_r = 2000 collocation points (Latin-hypercube on [-H, 0] \times [0, T_f]), N_b = 200 surface points (uniform in t), and weights \lambda_r = 1, \lambda_b = 50, \lambda_i = 100.

Training schedule: Adam(1e-3) for 2 000 iterations → L-BFGS for 500. The Adam phase escapes the random initialisation; L-BFGS drives the residual the rest of the way. Wall-clock: ~3 minutes on an M2 MacBook.

Pass criterion. \|T_\theta - T_{\text{FD}}\|_{L^2} < 0.05\,°\text{C} on a 100×100 grid. If this fails, stop and diagnose before turning on the inverse loop — debugging the inverse with a broken forward is a rabbit hole.

Step 2 — Inverse PINN: recover \tau(t)

Now \tau(t) becomes unknown and we add a second network for it:

  • T_\theta(z, t) — same 4×32 MLP as before (~3 000 parameters).
  • \tau_\phi(t) — a small 2×16 MLP (~600 parameters) for the wind-stress envelope. Initialised near zero so the optimiser doesn’t fight a bad guess.

Both networks see the same forward residual; the difference from §10.2 Step 1 is that the column equation now uses w(z, t; \tau_\phi(t)) (the SWE-derived w scales with \tau) and \kappa(z, t; \tau_\phi(t)) (a simple wind-mixing closure \kappa_m \mapsto \kappa_m(1 + c |\tau_\phi|)).

Joint inverse loss:

\mathcal{L}(\theta, \phi) \;=\; \underbrace{\lambda_r\,\mathcal{L}_{\text{PDE}}(\theta, \phi)}_{\text{physics}} \;+\; \underbrace{\lambda_d\,\frac{1}{N_d}\!\sum_k\!\bigl|T_\theta(z_k, t_k) - T_{\text{obs}}(z_k, t_k)\bigr|^2}_{\text{data}} \;+\; \underbrace{\lambda_{\text{reg}}\,\int_0^{T_f}\!|\tau_\phi'(t)|^2\,dt}_{H^1 \text{ smoothness prior on } \tau} \;+\; \mathcal{L}_{\text{BC}} + \mathcal{L}_{\text{IC}}.

Hand-tuned weights: \lambda_r = 1, \lambda_d = 100, \lambda_{\text{reg}} = 10^{-2}. The data weight is high because the dataset is small (5 sensors × 720 hourly samples = 3 600 points) — without it, the optimiser satisfies the residual perfectly with \tau_\phi \equiv 0. The H^1 smoothness prior is the only thing keeping \tau_\phi from absorbing the 0.05\,°\text{C} observation noise; without it, the recovered envelope shows visible “ringing” at the single-hour-sample frequency.

Why these weights? Quick ablation:

\lambda_d \lambda_{\text{reg}} Recovered peak Comment
1 10^{-2} 5% of truth residual dominates; \tau collapses
100 0 130% of truth noise-fit; visible high-frequency ringing
100 10^{-2} 88% of truth the chosen operating point
100 10^{-1} 70% of truth over-smoothed; peak suppressed

Training schedule: Adam(1e-3) for 5 000 iterations → L-BFGS for 1 000. Wall-clock: ~15 minutes on an M2 MacBook.

Step 3 — Recovered \tau(t) vs truth

Plot the recovered \hat\tau(t) against the synthetic ground truth. The gust envelope is recovered with:

  • Peak-amplitude error ~12% — within the §9.9.3 success criterion of \leq 15\%.
  • Timing offset < 2 hours on the storm-day peak.
  • Rising edge slightly smoothed by the H^1 penalty — this is the inverse-problem cost of regularisation.
  • Tail over-relaxed by ~5% — the classic Tikhonov bias toward zero in the absence of data constraint.

Step 4 — Mechanism partition (the §9.1 answer)

The recovered \hat\tau(t) alone doesn’t tell you which of the three storm hypotheses dominated — it bundles all three through the column model (Unit 9 §9.1 “How the inverse answers the mechanism question” callout). The discriminator: re-run the forward solver with \hat\tau in hand, accumulate the three contributions to the temperature change at z = -10\,\text{m} (two-thirds down Cleveland Bay’s 15 m column, just below the typical thermocline) over the storm window:

\Delta T(t) \;=\; \underbrace{\int_0^t -\,w\,\partial_z T_\theta\, ds}_{\Delta T_{\text{adv}}(t)} \;+\; \underbrace{\int_0^t \partial_z(\kappa\,\partial_z T_\theta)\, ds}_{\Delta T_{\text{mix}}(t)} \;+\; \underbrace{\int_0^t \mathcal{S}\, ds}_{\Delta T_{\text{flux}}(t)}.

On Cleveland Bay (Pe ≪ 1), the expected partition over the storm 4-day window is roughly:

Mechanism Storm-window contribution
Vertical advection (upwelling) ~15% of total cooling
Turbulent mixing ~70% (dominant — diffusion-dominated regime)
Reduced surface heating ~15%

If your recovered partition lands within ±10% of these numbers, the inversion has correctly identified that wind-driven mixing dominates at Cleveland Bay. The plot task_a_partition.png from task_a_diagnostics.jl shows the three integrated curves vs t.

Step 5 — Diagnostics

Three checks that ship with task_a_diagnostics.jl:

  1. Residual histogram vs t. Bin the residual r_\theta(z_i, t_i) by its t_i value into 30 time bins; plot the mean and 90th percentile per bin. For Task A this should decrease roughly uniformly in time, confirming no causality violation (Unit 7 §7.5). A spike in the late bins is the signature of the failure mode that §10.3 Task B fixes with causal training.
  2. Heat-budget closure at each sensor depth. Evaluate each term of the column PDE on a dense (z_k, t) grid; the sum should be \sim 10^{-3}\,\text{W/m}^2 — well below the surface-flux scale of \sim 200\,\text{W/m}^2. A non-closing budget means the residual loss isn’t actually being minimised pointwise (often a sign of stale gradients in the inverse loop).
  3. Cross-validation on a held-out 24-hour window. Refit with the data from days 20–21 removed; predict that window from the trained network and the recovered \tau. Predictions should stay within \pm 0.05\,°\text{C} of the FD reference — the same magnitude as the observation noise.

What we didn’t do here (and why)

  • No Fourier features — the diurnal signal on a 15 m column is weak enough that the smooth-MLP ansatz handles it.
  • No causal training — the column is shallow enough that residual-decay-in-time is already uniform without it.
  • No adaptive loss weights — manual tuning is feasible because there are only 4 weights and the dynamic range is modest.
  • No GPU — by design.

Each of these choices fails at Task B’s scale; the gradient-balancing / Fourier / causal / hard-IC machinery in §10.3 is what closes the gap.

Files

10.3 Solution to Task B

🔒 Solution to Task B — triple-click to reveal 0 / 3 clicks

Spec: Unit 9 §9.10. Three moorings jointly, H = 100 m, modern PINN toolkit, GPU at full scale. Here we develop on CPU and queue the GPU launch.

The Task A recipe — a single small MLP with hand-tuned weights — breaks at Task B’s scale for three concrete reasons:

  1. The deep (H = 100 m) column has a much wider range of temporal scales (diurnal at the surface, 30-day at depth), so the smooth-MLP ansatz can’t fit the high-frequency surface signal and the slow deep relaxation simultaneously.
  2. The three-site joint loss has six per-site loss components whose magnitudes differ by 2–3 orders of magnitude — hand tuning isn’t feasible.
  3. Forward-over-forward second derivatives at N_r = 50\,000 collocation points per site are CPU-bound and stall L-BFGS.

The four modern-PINN fixes from Unit 7 §7.3 each address one of these failure modes, in the same order: Fourier features (1), adaptive loss weighting (2), causal training (1 again, time direction), and GPU vectorisation (3).

Step 1 — Forward PINN with the full modern toolkit

  • Network: 6-layer MLP, 128 neurons / layer, \tanh.
  • Fourier feature embedding at the input: \gamma(z, t) = [\sin(B\,(z, t)), \cos(B\,(z, t))] with B \sim \mathcal{N}(0, \sigma_B^2 I) — two banks, \sigma_B tuned per band (one for the diurnal cycle \sim 1/86400\,\text{s}, one for the storm envelope \sim 1/(3 \times 86400)\,\text{s}). This breaks the MLP’s spectral bias (Tancik et al. 2020).
  • Hard BC at z = -H as in Task A; hard IC: T_\theta(z, t) = T_0(z) + t\,N_\theta(z, t) so the IC loss term drops out entirely. Now only the residual + surface-flux BC + data loss compete for the optimiser’s attention.
  • Adaptive loss weighting (McClenny & Braga-Neto 2022) — gradient-balancing recipe: \lambda_{\text{term}}^{(k+1)} \leftarrow (1 - \alpha)\,\lambda_{\text{term}}^{(k)} + \alpha\,\frac{\max\|\nabla \mathcal{L}_{\text{residual}}\|_\infty}{\overline{\|\nabla \mathcal{L}_{\text{term}}\|}_2} with \alpha = 0.1 updated every 100 iterations.
  • Causal training (Wang et al. 2022) — temporal weight \omega(t_i) = \exp(-\epsilon\,\sum_{j<i}\!r^2(z_j, t_j)) so a t_i collocation only contributes after the t_{<i} residual has converged.

Step 2 — Joint three-site inverse

Three column networks T^{(i)}_\theta(z, t), i \in \{A, B, C\}, each with the architecture above and separate parameters. Shared: a single 4-layer × 64-neuron MLP \tau_\phi(t) for the storm wind-stress envelope all three sites feel.

Joint loss:

\mathcal{L} \;=\; \sum_{i \in \{A, B, C\}}\!\!\Bigl[ \lambda^{(i)}_r\,\mathcal{L}^{(i)}_{\text{PDE}} + \lambda^{(i)}_d\,\mathcal{L}^{(i)}_{\text{data}} + \lambda^{(i)}_b\,\mathcal{L}^{(i)}_{\text{BC}} \Bigr] \;+\; \lambda_{\text{reg}}\,\int |\tau_\phi'(t)|^2\, dt.

The site-specific weights \lambda^{(i)}_{r,d,b} are all learnt by gradient-balancing — the three regimes (Pe \ll 1, Pe \sim 1, Pe \gg 1) produce residual / data scales that differ by ~10^2 across sites, so a static \lambda^{(i)} = 1 collapses to “fit Site C only”.

Why the shared \tau_\phi matters: each site contributes a different observable consequence of the same physical stress. Site A (diffusion-dominated) sees the stress via \kappa modulation — its data constrains the amplitude. Site B (intermediate) sees the timing of the upwelling pulse. Site C (advection-dominated) sees the shape of the deep cooling. Combining all three is what cuts the recovery error from ~12% A or ~15% B alone down to ~3–5% jointly.

Step 3 — CPU sub-scale prototypes (what we actually run here)

The full 3-site H = 100 m / N_r = 50\,000 launch is GPU-class. Without one, the unit walks the first three of the four sub-scale prototype steps from §9.10.2:

  1. Sub-scale prototype on Task A’s column. Run the full modern-toolkit architecture (Fourier + hard BC + adaptive + causal) on Task A’s 15 m / single-site domain with N_r = 5\,000. ~30 min on M2 MacBook. Recovers \tau(t) at ~12% peak error — exactly Task A’s accuracy at ~1.5× the training cost. The point is to confirm the modern toolkit doesn’t hurt on the easy problem (which would suggest the adaptive weights or Fourier features are mis-tuned).

  2. Two-site joint inverse, Sites A + B, H = 60 m. Medium architecture (4 × 64 neurons + Fourier + hard BC + adaptive weights, no causal training yet), N_r = 10\,000 per site. ~2 h on CPU. Recovered \hat\tau(t) tracks truth to ~7% peak-amplitude error — meaningfully better than either site’s decoupled inverse (Site A alone ~12%, Site B alone ~15%). The cross-site constraint is doing real work — this is the headline finding of the CPU prototype.

  3. GPU-launch checklist. A one-page document of the changes needed for the full run; see Files below.

Step 4 — Recovered \tau(t) from the two-site CPU prototype

The two-site joint inversion produces a \hat\tau(t) noticeably sharper than Task A’s single-site recovery:

Feature Task A (1 site, H{=}15m) Step 3.2 (2 sites, H{=}60m)
Peak amplitude error ~12% ~7%
Storm timing error ~2 h ~45 min
Trailing tail relaxation over-relaxed 5% matches truth
Rising-edge smoothing visible barely visible

Site B’s stronger thermocline signature dominates the storm-day amplitude constraint; Site A’s noise floor sets the lower bound on the recovered envelope. Removing either site sends the error back up.

Step 5 — Mechanism partition across the three sites

The §9.1 multi-site story made quantitative. Run the forward solver at each site with the recovered \hat\tau(t) from the two-site sub-scale prototype (Sites A + B) and project the unknown Site C contribution forward; partition the storm-window cooling at each site’s diagnostic depth:

Site Depth Pe regime Adv (upwelling) Mix Surface flux
A — Cleveland Bay z = -10\,\text{m} \ll 1 ~15% ~70% ~15%
B — Davies Reef z = -30\,\text{m} \sim 1 ~45% ~40% ~15%
C — Myrmidon Reef z = -50\,\text{m} \gg 1 ~75% ~15% ~10%

The qualitative shift across sites is the headline finding: at Cleveland Bay the storm cooling is mixing-driven; at Myrmidon it’s upwelling-driven; Davies Reef sits in the transition. Three moorings → three distinct storm-response fingerprints from a single shared wind-stress event. That’s what the joint inversion delivers that no single-site analysis can.

Step 6 — Predicted GPU performance at full 3-site scale

Based on the sub-scale prototype scaling and reported PINN GPU benchmarks:

Metric CPU sub-scale (2-site, H = 60m) GPU full-scale (3-site, H = 100m)
Forward PINN ~2 h ~8 min
Inverse PINN ~12 h ~25 min
\hat\tau peak error ~7% ~3–5% (extrapolated)
Residual variance 1.4× target 1.0× target (with causal)

The CPU runs prove the recipe works at reduced scale; the GPU run is what makes it operationally usable for production mooring networks (a research group with three real moorings can re-run this monthly on overnight GPU time).

Open questions for the full run

  • Will causality violation re-appear at Myrmidon Reef (advection-dominated, \mathrm{Pe} \gg 1)? The two-site prototype excluded Myrmidon precisely because its longer characteristic timescale stresses the causal-training weights. Possibly need a per-site causal scheduler.
  • Will gradient-balancing converge on a stable weight ratio? Empirically yes on the sub-scale; theoretically not guaranteed.
  • How well does the recovered \tau(t) correlate with the independent SWE-driver w(t) inferred from local wind observations? This is the validation step that turns the recovered synthetic answer into something an oceanographer trusts.

Files

10.4 A Python parallel with DeepXDE

DeepXDE is the Python equivalent of NeuralPDE.jl: a high-level interface that takes a PDE definition and a collocation strategy, produces a trained network. Here’s the Task A forward column problem (Cleveland Bay, H = 15 m, fixed \tau) as a DeepXDE sketch — same architecture (4 × 32 tanh MLP), same hard-BC ansatz at the deep reservoir, same Adam → L-BFGS schedule:

units/unit_10/scripts/task_a_forward_deepxde.py
# pip install deepxde[torch]
import numpy as np
import deepxde as dde

H = 15.0;  Tf = 30.0 * 86400.0
T_deep = 22.0
def kappa_z(z):  return 1e-3 * np.exp(z / 5.0) + 1e-5     # mixed-layer profile
def w_z(z, t):   return 0.0                              # forward problem: no wind
def Q_np(t):     return -120.0                           # net cooling (W/m^2)
def S_z(z, t):   return 400.0 / 4.0e6 * np.exp(z / 8.0)  # body source

# Geometry: z in [-H, 0], t in [0, Tf]
geom     = dde.geometry.Interval(-H, 0.0)
timedom  = dde.geometry.TimeDomain(0.0, Tf)
geomtime = dde.geometry.GeometryXTime(geom, timedom)

def pde(zt, T):
    z, t = zt[:, 0:1], zt[:, 1:2]
    dT_dt = dde.grad.jacobian(T, zt, j=1)
    dT_dz = dde.grad.jacobian(T, zt, j=0)
    d2T_dz2 = dde.grad.hessian(T, zt, component=0, i=0, j=0)
    kappa = kappa_z(z)
    return dT_dt + w_z(z, t) * dT_dz - kappa * d2T_dz2 - S_z(z, t)

# Hard BC at the deep reservoir via output transform.
def output_transform(zt, T):
    z = zt[:, 0:1]
    return T_deep + (z + H) * T          # T(-H, t) = T_deep automatically

# Soft surface flux at z = 0.
def surface_flux(zt, T, _):
    dT_dz_top = dde.grad.jacobian(T, zt, j=0)
    rho, cp = 1025.0, 3990.0
    return rho * cp * kappa_z(zt[:, 0:1]) * dT_dz_top - Q_np(zt[:, 1:2])
bc_surface = dde.icbc.OperatorBC(
    geomtime, surface_flux,
    lambda zt, on_boundary: on_boundary and np.isclose(zt[0], 0.0),
)

# IC: initial thermocline tanh profile
def T0(z): return 25.0 + 3.0 * np.tanh((z + 5.0) / 2.0)
ic = dde.icbc.IC(geomtime, lambda zt: T0(zt[:, 0:1]),
                 lambda _, on_initial: on_initial)

data = dde.data.TimePDE(
    geomtime, pde, [bc_surface, ic],
    num_domain=2000, num_boundary=200, num_initial=200,
)
net = dde.nn.FNN([2] + [32] * 4 + [1], "tanh", "Glorot uniform")
net.apply_output_transform(output_transform)
model = dde.Model(data, net)
model.compile("adam", lr=1e-3); model.train(iterations=2000)
model.compile("L-BFGS"); model.train()

Available as scripts/task_a_forward_deepxde.py. Runtime: ~5 min on a laptop CPU with the PyTorch backend.

The structural one-to-one with the Julia version:

Concern Julia (NeuralPDE.jl) Python (DeepXDE)
Geometry IntervalDomain × TimeDomain Interval × TimeDomain
Residual Differential(t) + symbolic AD dde.grad.jacobian / hessian
Hard BC ansatz manual product form apply_output_transform
Soft BC bcs vector OperatorBC
Optimiser Optimization.solve(Adam → LBFGS) model.compile/train twice
AD backend Zygote + ForwardDiff PyTorch / JAX / TF

Cross-ecosystem comparison

Strengths and weaknesses, honestly:

  • Julia stack — composes cleanly with classical SciML solvers, faster autodiff for stiff problems, smaller ML community.
  • Python stack — broader ML community support, more ergonomic when extending with bespoke loss terms, larger ecosystem of pre-trained models and tutorials.

Neither is strictly better. Pick by who’s using it and what they already know.

Note✏️ Section exercise — make the DeepXDE column feel the sun

The DeepXDE sketch above runs with a constant net cooling and no wind. Bring it one step closer to the real spec: replace Q_np(t) with the diurnal model from Unit 9 §9.5 (Q_{\text{SW}} half-cosine entering through the body source S_z, Q_{\text{cool}} = 200\,\text{W/m}^2 steady) and retrain. Two implementation traps to find and report: (a) the body source must now depend on t, which changes the pde closure — make sure the half-cosine is written with backend-friendly ops; (b) the time scale of the diurnal forcing vs the 30-day domain creates exactly the multi-scale problem Fourier features exist for — compare the trained network’s surface trace against the expected daily wiggle and say whether the plain MLP caught it. Then port the same change to the NeuralPDE.jl version and note which framework made the edit easier and why.

💡 Hint

Two rules keep you out of the traps: every op inside DeepXDE’s pde() must be a backend tensor op (torch.clamp/cos/exp via dde.backend, never np.maximum), and a 30-oscillation forcing on a smooth MLP is a spectral-bias problem — check the surface trace’s FFT at 1 cpd before trusting the fit. In NeuralPDE.jl the same change is a symbolic one-liner in eq.

10.5 Where to go from here

Beyond the workshop:

  • Real mooring data — replace the synthetic observations with AIMS records.
  • Coupling with operational ocean models — ROMS, MOM6 — for realistic horizontal forcings.
  • Uncertainty quantification on the recovered driver, via Bayesian PINNs or ensembles.
  • Transfer to other reef sites with different climatologies.

The PINN software ecosystem you’d reach for next is catalogued in Unit 7 §7.6; the broader PIML field is mapped in the references.

10.6 Reference solver source: scripts/column_fd.jl

The MethodOfLines finite-difference reference for T(z, t), kept as a sidecar script (see SETUP.md) so quarto render stays cheap. Run with ./build.sh execute 10; outputs land in output/. Folded for brevity; expand to read.

# Reference 1D vertical heat-transport column solver (finite-difference,
# method-of-lines). See unit_09.qmd for the model specification.
#
# Scenarios (per unit_09.qmd §9.11):
#   1. pure diffusion to steady state
#   2. diurnal cycle (Q_SW + body source S)
#   3. add upwelling (w(z))
#   4. storm fingerprint (gust modulates w, κ, Q_SW around day 10)
#
# Run a scenario:
#   include("column_fd.jl"); r = run_scenario(scenario_1());
#   plot_scn1(r)

using ModelingToolkit, MethodOfLines, OrdinaryDiffEq
using OrdinaryDiffEqBDF: QNDF  # umbrella OrdinaryDiffEq no longer re-exports BDF solvers
using DomainSets: ClosedInterval
using CairoMakie
using Printf

# ── reference parameters (unit_09.qmd §9.6) ────────────────────────────
const PARAMS = (
    H         = 100.0,    # column depth (m)
    ρ         = 1025.0,   # seawater density (kg/m³)
    cp        = 4000.0,   # specific heat (J/(kg·K))
    α         = 2e-4,     # thermal expansion (1/K)
    T_surface = 28.0,     # initial SST (°C)
    T_deep    = 18.0,     # deep reservoir (°C)
    z_t       = -30.0,    # thermocline depth (m)
    δ_t       = 5.0,      # thermocline sharpness (m)
    QSW_max   = 800.0,    # peak noon shortwave (W/m²)
    Q_cool    = 200.0,    # non-penetrative net cooling (W/m²)
    ζ         = 10.0,     # shortwave penetration scale (m)
    κ_b       = 1e-5,     # background diffusivity (m²/s)
    κ_m       = 1e-3,     # mixed-layer diffusivity (m²/s)
    h_m       = 20.0,     # mixed-layer scale (m)
    w0        = 1e-5,     # peak upwelling (m/s)
    τ_d       = 86400.0,  # diurnal period (s)
    # storm scenario: Gaussian gust around day 10, ~3 days wide.
    t_storm   = 10 * 86400.0,
    σ_storm   = 1.0 * 86400.0,
    w0_storm  = 5e-5,     # gust-driven upwelling (5× baseline)
    cloud_amp = 0.5,      # peak SW reduction during gust
)

storm_envelope(t, p) = exp(-((t - p.t_storm) / p.σ_storm)^2)

# ── initial profile ─────────────────────────────────────────────────────
T0(z, p=PARAMS) = 0.5*(p.T_surface + p.T_deep) +
                  0.5*(p.T_surface - p.T_deep) * tanh((z - p.z_t)/p.δ_t)

# ── vertical velocity profiles ──────────────────────────────────────────
w_zero(z, t, p)       = 0.0
w_upwelling(z, t, p)  = p.w0 * sin(π * (z + p.H) / p.H)
w_storm(z, t, p)      = p.w0_storm * storm_envelope(t, p) * sin(π * (z + p.H) / p.H)

# ── eddy diffusivity profiles ───────────────────────────────────────────
κ_const(z, t, p)   = p.κ_m
κ_profile(z, t, p) = p.κ_b + (p.κ_m - p.κ_b) * exp(z / p.h_m)

# ── surface forcing ─────────────────────────────────────────────────────
# Smooth diurnal: use (1 + cos)/2 instead of max(0, cos) to keep the RHS
# differentiable for MTK. Same daily mean energy as max(0, cos)·(2/π)·QSW_max
# would give, but a smooth half-cosine — close enough for the toy model.
QSW_diurnal(t, p) = p.QSW_max * 0.5 * (1.0 + cos(2π * t / p.τ_d))
QSW_storm(t, p)   = p.QSW_max * (1.0 - p.cloud_amp * storm_envelope(t, p)) *
                    0.5 * (1.0 + cos(2π * t / p.τ_d))
Q_np_steady(t, p) = -p.Q_cool

# Body source from Beer–Lambert: I(z,t) = Q_SW(t) e^{z/ζ},
# S = (1/(ρcp)) ∂I/∂z = Q_SW(t)/(ζ ρ cp) · e^{z/ζ}.
S_off(z, t, p)     = 0.0
S_diurnal(z, t, p) = QSW_diurnal(t, p) * exp(z / p.ζ) / (p.ζ * p.ρ * p.cp)
S_storm(z, t, p)   = QSW_storm(t, p)   * exp(z / p.ζ) / (p.ζ * p.ρ * p.cp)

# ── PDE builder ─────────────────────────────────────────────────────────
function build_pde(scn, p=PARAMS)
    @parameters z t
    @variables T(..)
    Dt = Differential(t)
    Dz = Differential(z)

    # Substitute scenario profiles symbolically so MOL sees closed-form
    # expressions in (z, t).
    w_sym  = scn.w(z, t, p)
    κ_sym  = scn.κ(z, t, p)
    S_sym  = scn.S(z, t, p)
    Qnp_sym = scn.Q_np(t, p)
    κ_top  = scn.κ(0.0, t, p)

    eq = Dt(T(z, t)) + w_sym * Dz(T(z, t)) ~
         Dz(κ_sym * Dz(T(z, t))) + S_sym

    domains = [z  ClosedInterval(-p.H, 0.0),
               t  ClosedInterval(0.0, scn.Tf)]

    bcs = [
        T(z, 0.0)   ~ T0(z, p),
        T(-p.H, t)  ~ p.T_deep,
        κ_top * Dz(T(0.0, t)) ~ Qnp_sym / (p.ρ * p.cp),
    ]

    @named pde_system = PDESystem(eq, bcs, domains, [z, t], [T(z, t)])
    return pde_system, z, t, T
end

# ── run dispatcher ──────────────────────────────────────────────────────
function run_scenario(scn; dz=1.0, p=PARAMS)
    pde_system, z, t, T = build_pde(scn, p)
    disc = MOLFiniteDifference([z => dz], t)
    prob = discretize(pde_system, disc)
    @info "solving scenario: $(scn.name) (Tf = $(scn.Tf/86400) days)"
    sol = solve(prob, scn.alg; saveat=scn.saveat, abstol=1e-8, reltol=1e-6)
    (; sol, z, t, T, scn)
end

# ── scenario configs ────────────────────────────────────────────────────
# Tf chosen long enough to reach (near-)steady state: T_κ = H²/κ_m ≈ 116 days,
# so ~3·T_κ ≈ 350 days gets us to within a few percent.
scenario_1() = (
    name   = "diffusion to steady state",
    w      = w_zero,
    κ      = κ_const,
    S      = S_off,
    Q_np   = Q_np_steady,
    Tf     = 365 * 86400.0,
    saveat = 10 * 86400.0,
    alg    = QNDF(),
)

scenario_2() = (
    name   = "diurnal cycle",
    w      = w_zero,
    κ      = κ_const,
    S      = S_diurnal,
    Q_np   = Q_np_steady,
    Tf     = 10 * 86400.0,
    saveat = 600.0,
    alg    = QNDF(),
)

scenario_3() = (
    name   = "upwelling",
    w      = w_upwelling,
    κ      = κ_const,
    S      = S_diurnal,
    Q_np   = Q_np_steady,
    Tf     = 30 * 86400.0,
    saveat = 3600.0,
    alg    = QNDF(),
)

# Scenario 4 (storm): Gaussian gust centred at t_storm. The gust spikes
# upwelling (w_storm) and dims the surface SW (cloud cover via S_storm).
# Real SWE-driven forcings would replace these envelopes; this is the
# prescribed-driver stub used in §9.11 task 5.
scenario_4() = (
    name   = "storm fingerprint",
    w      = w_storm,
    κ      = κ_const,
    S      = S_storm,
    Q_np   = Q_np_steady,
    Tf     = 20 * 86400.0,
    saveat = 3600.0,
    alg    = QNDF(),
)

# ── analytic check for scenario 1 ───────────────────────────────────────
# Steady state of (κ T_z)_z = 0 with T(-H)=T_deep and κ T_z|₀ = Q_np/(ρcp):
#   T(z) = T_deep − (Q_cool/(κ_m ρ cp)) · (z + H)
# (Using Q_np = -Q_cool < 0, so the surface is cooler than the deep
# reservoir — heat flows up from depth to be lost at the air-sea
# interface.)
T_analytic_scn1(z, p=PARAMS) =
    p.T_deep - p.Q_cool / (p.κ_m * p.ρ * p.cp) * (z + p.H)

# ── plotting ────────────────────────────────────────────────────────────
# Extract the (Nz, Nt) matrix and grids from a run_scenario result,
# regardless of MOL's internal axis ordering. We declared PDESystem with
# ivs = [z, t], so sol[z] and sol[t] are the canonical grids; we transpose
# the matrix to (Nz, Nt) if MOL returned it in (Nt, Nz) form.
function _grids(r)
    zg = r.sol[r.z]
    tg = r.sol[r.t]
    Tg = r.sol[r.T(r.z, r.t)]
    Tg = size(Tg, 1) == length(zg) ? Tg : permutedims(Tg)
    (zg, tg, Tg)
end

"Final-time T(z) vs analytic steady state."
function plot_scn1(r; outpath=joinpath(@__DIR__, "..", "output", "scn1_steady.png"))
    zg, tg, Tg = _grids(r)
    Tf = Tg[:, end]
    fig = Figure(size=(700, 500))
    ax = Axis(fig[1,1], xlabel="T (°C)", ylabel="z (m)",
              title="Scenario 1: pure diffusion to (near-)steady state")
    lines!(ax, T_analytic_scn1.(zg), zg;
           linestyle=:dash, linewidth=2, label="analytic")
    lines!(ax, Tf, zg; linewidth=2, label="MOL final")
    lines!(ax, T0.(zg), zg;
           color=:gray, linewidth=1, label="IC")
    axislegend(ax, position=:rb)
    save(outpath, fig)
    err = maximum(abs, Tf .- T_analytic_scn1.(zg))
    @info "saved $outpath; max |T_MOL − T_analytic| = $(round(err, sigdigits=3)) °C"
    fig
end

"Near-surface (z, t) heatmap for scenario 2."
function plot_scn2(r; zmax=10.0,
                   outpath=joinpath(@__DIR__, "..", "output", "scn2_diurnal.png"))
    zg, tg, Tg = _grids(r)
    keep = zg .>= -zmax
    fig = Figure(size=(900, 450))
    ax = Axis(fig[1,1], xlabel="t (days)", ylabel="z (m)",
              title="Scenario 2: diurnal warm layer")
    hm = heatmap!(ax, tg ./ 86400.0, zg[keep], Tg[keep, :]')
    Colorbar(fig[1,2], hm, label="T (°C)")
    save(outpath, fig)
    @info "saved $outpath"
    fig
end

"Five-mooring-depth time series + full (z,t) heatmap for scenarios 3–4."
function plot_mooring(r; depths_m=(2.0, 10.0, 30.0, 60.0, 90.0),
                      outpath=joinpath(@__DIR__, "..", "output", "$(replace(r.scn.name, ' '=>'_'))_mooring.png"))
    zg, tg, Tg = _grids(r)
    days = tg ./ 86400.0
    fig = Figure(size=(1000, 700))
    ax1 = Axis(fig[1,1], xlabel="t (days)", ylabel="T (°C)",
               title="Mooring traces — $(r.scn.name)")
    for d in depths_m
        zi = argmin(abs.(zg .+ d))  # z = -d
        lines!(ax1, days, Tg[zi, :]; label="z = −$(d) m")
    end
    axislegend(ax1, position=:rt)
    ax2 = Axis(fig[2,1], xlabel="t (days)", ylabel="z (m)",
               title="T(z, t)")
    hm = heatmap!(ax2, days, zg, Tg')
    Colorbar(fig[2,2], hm, label="T (°C)")
    save(outpath, fig)
    @info "saved $outpath"
    fig
end

# ── entry point when run as script ──────────────────────────────────────
# Used by ./build.sh execute 10 to populate units/unit_10/output/.
if abspath(PROGRAM_FILE) == @__FILE__
    r1 = run_scenario(scenario_1());  plot_scn1(r1)
    r2 = run_scenario(scenario_2());  plot_scn2(r2)
    r3 = run_scenario(scenario_3());  plot_mooring(r3)
    r4 = run_scenario(scenario_4());  plot_mooring(r4)
end

Captured stdout from ./build.sh execute 10:

[ Info: solving scenario: diffusion to steady state (Tf = 365.0 days)
[ Info: saved /Users/uqjnazar/git/PIML/Julia_PINN_training_2026/units/unit_11/scripts/../output/scn1_steady.png; max |T_MOL − T_analytic| = 0.00407 °C
[ Info: solving scenario: diurnal cycle (Tf = 10.0 days)
[ Info: saved /Users/uqjnazar/git/PIML/Julia_PINN_training_2026/units/unit_11/scripts/../output/scn2_diurnal.png
[ Info: solving scenario: upwelling (Tf = 30.0 days)
[ Info: saved /Users/uqjnazar/git/PIML/Julia_PINN_training_2026/units/unit_11/scripts/../output/upwelling_mooring.png
[ Info: solving scenario: storm fingerprint (Tf = 20.0 days)
[ Info: saved /Users/uqjnazar/git/PIML/Julia_PINN_training_2026/units/unit_11/scripts/../output/storm_fingerprint_mooring.png