Project Solution

Published

26/06/2026

This unit is the worked solution to the capstone spec laid out in Unit 9. The science behind the model sits in Unit 8. Here we cover three things in this order:

Shared infrastructure (§10.1 — open). The finite-difference reference solver, the toy-task ladder (Unit 9 §9.11), and the SWE driver. Both Task A and Task B start from these.
Solution to Task A (§10.2 — locked, triple-click to unlock). Single-site CPU-friendly inverse problem.
Solution to Task B (§10.3 — locked, triple-click to unlock). Three-site joint inverse with the modern PINN toolkit, plus the sub-scale CPU prototype steps.

Each solution sits behind its own triple-click gate, so you can read the shared infrastructure (and one task’s solution, if you want) without spoiling the other. We strongly recommend you attempt each task yourself first — the learning is in working through the inverse problem, not in reading our answer.

10.1 Shared infrastructure: the forward solver and scenarios

Before either solution, here are the building blocks both tasks rely on: the finite-difference forward solver (the ground-truth T(z,t) the PINNs are scored against — and the thing that generates the mooring data), the five toy scenarios of Unit 9 §9.11, and the SWE driver.

This section is deliberately not behind a solution gate like §10.2–§10.3: it is shared setup you need to start either task, not the answer to one. The gated sections below are the actual inverse solutions.

The five scenarios here are the §9.11 toy-task ladder, in the same order and numbering — §9.11 describes each step and what mechanism it isolates; this section works it with the solver and shows the figure. Each one just switches on one more term of the column PDE:

#	§9.11 ladder step	what it adds
1	pure diffusion, steady forcing	\kappa\,\partial_{zz}T only → linear steady profile
2	+ diurnal cycle	time-varying Q_{\text{SW}}(t) + body source
3	+ upwelling	advection w\,\partial_z T (κ stays constant)
4	+ storm gust (prescribed)	a Gaussian storm modulating w and cloud
5	+ SWE-driven storm	w(t) from a shallow-water solve (the design, §10.5)

Scenarios 1–4 ship with worked figures below; scenario 5 (the full SWE driver) is the design for the headline extension (§10.5), not shipped with the solution.

Finite-difference reference solver

A method-of-lines finite-difference reference for T(z, t) — the ground truth we compare PINN predictions against. Uses MethodOfLines.jl with adaptive time-stepping; second-order central differences in z, with the surface Neumann condition imposed via a ghost cell. The full source listing is in §10.6; below we work through scenarios 1–4 of the Unit 9 §9.11 ladder (scenario 5, the SWE driver, follows as the design).

These four scenario plots all run on the generic reference column (the §9.6 reference parameters, H = 100 m) so each mechanism shows cleanly in isolation — they are not the per-site mooring records (Sites A/B/C). Those site records are produced separately, each at its own depth, by generate_mooring_csvs.jl (which runs this same solver through the scenario-4 storm per site).

Scenario 1 — pure diffusion to steady state (§9.11 step 1)

w = 0, \mathcal{S} = 0, \kappa constant, Q_{\text{np}} constant. After ~365 days the column reaches a near-steady state.

Scenario 1 — pure diffusion to a (near-)steady state, from `scenario_1()` in `scripts/column_fd.jl`. The grey IC is the initial \tanh thermocline: a warm (~28 °C) mixed layer over an 18 °C deep reservoir, with the sharp transition near z=-30 m. With no advection, no source, and a constant surface cooling Q_{\text{np}}=-200 W/m², diffusion erases that thermocline and the column relaxes toward the **straight-line** steady profile — heat conducts up the gradient from the fixed deep reservoir to be lost at the surface, so the surface ends up *colder* (~13 °C) than the deep (18 °C). The blue **MOL final** curve is the finite-difference solution after ~365 days; the dashed black **analytic** line T(z)=T_{\text{deep}}-\tfrac{Q_{\text{cool}}}{\kappa_m \rho_0 c_p}(z+H) is drawn on top of it. The two are indistinguishable — they agree to \approx 4\times10^{-3} °C, so the dashes ride directly along the blue line, and *that overlap is the validation* of the solver. The analytic line is T(z)=T_{\text{deep}}+\tfrac{Q_{\text{np}}}{\rho_0 c_p\kappa_m}(z+H) — steady, no advection or source, so \partial_z(\kappa\partial_z T)=0 gives a straight line whose slope the surface flux sets and whose intercept the deep reservoir pins.

Scenario 2 — diurnal cycle (§9.11 step 2)

Adds Q_{\text{SW}}(t) (smooth half-cosine, mean 400 W/m²) and the Beer–Lambert body source \mathcal{S}.

Scenario 2 output — near-surface traces (top) make the daily warm/cool cycle plain: ten diurnal oscillations ride on the slow relaxation toward the scenario-1 steady state, with the amplitude largest at z=-1 m and damped by z=-7 m. The top-10 m (z,t) heatmap (bottom) shows the same field.

Scenario 3 — upwelling (§9.11 step 3)

Adds an upwelling profile w(z) = w_0 \sin(\pi(z+H)/H) with w_0 = 10^{-5}\,\text{m/s}. (This toy scenario uses a \sin shape to show the mechanism cleanly on the generic column; the capstone’s own w is the linear-in-z profile derived in Unit 9 §9.2.)

Scenario 3 output — five-mooring traces (top) and full (z, t) heatmap (bottom). SST drops 28 → 22 °C; mid-column warms via diffusion (Pe ≈ 1).

Scenario 4 — storm fingerprint (prescribed gust) (§9.11 step 4)

Gaussian gust at t_{\text{storm}} = 10 days, \sigma = 1 day, modulating both w_0 (×5 at peak) and Q_{\text{SW}}^{\max} (×0.5 at peak via cloud cover). Real SWE-driven forcings will replace these envelopes in scenario 5 (the synthetic scenario, forward).

Scenario 4 output — `scenario_4()` uses prescribed Gaussian envelopes for the gust-driven upwelling and cloud cover, pending the SWE driver. The storm bump is visible around day 10.

Shallow-water driver (scenario-5 forcing — the design)

The shipped column scripts use the prescribed Gaussian upwelling/cloud envelopes of toy scenario 4 (above). The natural next step — scenario 5 — replaces those hand-drawn envelopes with a physically self-consistent horizontal current from a shallow-water solve. We describe that bridge here as the design; wiring it into the column model is the headline extension in §10.5 rather than something the capstone solution ships.

The driver would be a staggered-grid finite-volume SWE reference on a 100 × 100 km patch of the central GBR shelf, with constant background depth H = 80\,\text{m}, the local Coriolis parameter f = 2\Omega\sin(-19°) \approx -4.7 \times 10^{-5}\,\text{s}^{-1}, and a spatially-Gaussian wind-stress patch driving the storm:

\tau(x, y, t) \;=\; \tau_0(t)\,\exp\!\left[-\,\frac{(x - x_0)^2 + (y - y_0)^2}{2\,L_\tau^2}\right],

with L_\tau = 30\,\text{km}, (x_0, y_0) tracking a slow-moving gust centre (10 km/h drift over 3 days), and \tau_0(t) the Gaussian-envelope time profile that becomes the unknown in the inverse problem.

What would get piped to the column model: \mathbf{u}_h(t) at each of the three mooring locations, from which w(z, t) = -(z + H)\, \nabla_h\cdot \mathbf{u}_h(t) (incompressibility + linear-in-z ansatz from Unit 9 §9.2) gives the column’s vertical advection. This is what replaces the prescribed Gaussian w_0(t) of toy scenario 4 with a physically self-consistent w(t) that carries the Coriolis turning of the wind-driven current, the propagation delay from gust centre to each mooring, and the geometric spreading that makes Site C see a smaller w_{\max} than Site A despite being further from coast — exactly the inter-site contrast the joint inverse (§10.3) leans on. The same linearised shallow-water machinery is already worked end-to-end in Unit 1, which is the cleanest starting point if you build it.

10.2 Solution to Task A

🔒 Solution to Task A — triple-click to reveal 0 / 3 clicks

Spec: Unit 9 §9.9. Site B (Davies Reef, H = 60 m, advection + diffusion), one driver to recover.

The solution is a single script, task_a_inverse_pinn.jl (~230 lines, on the Unit 5 §5.3 Lux + Zygote + Optimisers skeleton). It synthesises the mooring data from a finite-difference reference solve of the column PDE with a known storm \tau^*(t), samples five sensor depths hourly and adds noise, then trains a PINN to recover \tau(t) from those noisy traces — so the recovery error \hat\tau vs \tau^* is a real number we can score.

This is a self-contained twin experiment: the script does not read the committed mooring_*.csv (those are the “look at realistic data” artifact of §9.1) — it plants its own \tau^\star and recovers it, so the truth is known. For self-containment it hand-codes the finite-difference forward solve rather than calling MethodOfLines.jl; the column model (same w(z), \kappa, S) is the one the inverse uses, so forward and inverse stay consistent.

So the script reads as two clearly-marked parts. Part 1 — generate the twin’s data: plant a known \tau^\star, forward-solve, sample the sensors, add noise. This is test-data generation, not the answer — it looks like it lives “inside the solution,” and it does, on purpose. Part 2 — the inverse PINN: it sees only that noisy data and recovers \hat\tau; the planted \tau^\star is used at the very end only to score the result. Generating the data in-file is exactly what makes the error measurable: with a real mooring you would not know \tau^\star, so you could never report a “9.6% recovery error.” This is the standard way to validate an inverse method before trusting it on real data.

Step 1 — The problem: source recovery

The storm modulates the column’s heating: a known surface-weighted vertical shape S(\zeta) (Beer–Lambert-like) times an unknown time signal \tau(t). Non-dimensionalising the column — temperature \tilde T = (T-T_\text{deep})/\Delta T (with \Delta T the fixed surface-to-deep temperature scale, \approx 3\,°C at Site B; this is a temperature scale, not a timestep \Delta t), depth \zeta = (z+H)/H, and time \tau_t = t/T_f — gives

\partial_{\tau}\tilde T = -W_\text{adv}\,w(\zeta)\,\partial_\zeta\tilde T + \mathrm{Pe}\,\partial_{\zeta\zeta}\tilde T + S(\zeta)\,\tau(t), \quad \tilde T(0,\tau)=0,\ \ \partial_\zeta\tilde T(1,\tau)=0,\ \ \tilde T(\zeta,0)=0,

with the upwelling shape w(\zeta)=\zeta (zero at depth, maximal at the surface). The unknown is the scalar-in-time forcing \tau(t) (one storm bump); S(\zeta), Pe, W_\text{adv} and the BCs are given. (Two roles for \tau to keep straight: as a derivative or subscript — \partial_\tau, \tau_t = t/T_f — it is non-dimensional time; the standalone \tau(t) is the storm forcing, the unknown we recover. And the coefficient here written \mathrm{Pe} is the normalised diffusion term, =1; the actual Péclet number is W_\text{adv}\approx1.2 — see §9.6.) This source-recovery form is the most identifiable inverse: the source over-determines \tau(t) across all depths at each instant, and a \tau\equiv0 field cannot fit source-driven data — so the network can’t hide the forcing inside the temperature field (the failure mode a mixing- or coefficient-recovery setup hits when the column sits near steady state). Two networks:

\tilde T_\theta(\zeta,\tau) — the temperature field (weights \theta), with the hard IC and deep BC baked into the ansatz \tilde T = \zeta\,\tau\,N_\theta(\zeta,\tau), so \tilde T(\zeta,0)=0 and \tilde T(0,\tau)=0 hold for free (Unit 7 §7.3). The insulating surface flux \partial_\zeta\tilde T(1,\tau)=0 stays a soft penalty.
\tau_\phi(\tau) — a small MLP for the storm envelope (signed).

Derivatives use the finite-difference-in-input stencil (no nested AD, Unit 5 §5.3), so a single reverse-mode pass keeps each training step cheap — cheap enough that the whole inverse trains on a CPU.

Step 2 — The loss, and the lesson it teaches

\mathcal{L}(\theta,\phi) = \underbrace{\lambda_r\,\mathcal{L}_\text{PDE}}_{\lambda_r=1} + \underbrace{\lambda_d\,\tfrac{1}{N_d}\!\sum_k|\tilde T_\theta(\zeta_k,\tau_k)-\tilde T_\text{obs}|^2}_{\lambda_d=6000} + \underbrace{\lambda_b\,\mathcal{L}_\text{surf-BC}}_{\lambda_b=10} + \underbrace{\lambda_\text{reg}\!\int_0^1|\tau_\phi'|^2}_{\lambda_\text{reg}=10^{-5}}.

The recovered forcing \tau_\phi enters only the PDE residual, so it regresses to (\partial_\tau\tilde T + W_\text{adv}\,w\,\partial_\zeta\tilde T - \mathrm{Pe}\,\partial_{\zeta\zeta}\tilde T)/S — it is driven by derivatives of the temperature field. Here is the trap: the column low-pass-filters \tau (response time ≈ storm width), so a network can match the noisy data in value with a too-smooth \tilde T whose derivatives — and hence the recovered storm peak — collapse well below truth. The cure is not fancier collocation; it is the loss weights. A large data weight \lambda_d forces \tilde T to honour the sharp data, which keeps its derivatives sharp and lifts the recovered peak; the tiny H^1 weight \lambda_\text{reg} only removes single-hour-sample ringing. A weight sweep makes the effect concrete:

\lambda_d	\lambda_\text{reg}	Recovered peak	Comment
100	10^{-2}	8% of truth (92% err)	too-smooth \tilde T; storm collapses
500	10^{-4}	56%	sharper, but data weight still too low
2000	3\times10^{-5}	78%	approaching the floor
6000	10^{-5}	90% (9.6% err)	the chosen operating point

At this sensor noise (storm SNR ≈ 29), ~10% peak error is close to the deconvolution floor: pushing \lambda_d harder buys little and starts re-introducing ringing. The remaining gap to truth is the inverse-problem reality at this noise level, not a tuning failure.

Step 3 — Results

The operating point trains a 32-wide × 4-deep field network for 12 000 Adam steps — about four minutes on a CPU. Measured recovery:

Peak-amplitude error 9.6% — comfortably inside the §9.9.3 criterion of \leq 15\%.
Storm-day timing exact (well under the 2-hour bar).
Forward field — reconstructed as T = T_\text{deep} + \Delta T\,\tilde T_\theta, it matches the FD reference to \|T - T_\text{FD}\|_{L^2} = 0.002\,°\text{C}, far below the 0.015\,°\text{C} sensor noise the twin injects.
Whole-envelope relative L^2 ≈ 0.09, with the rising edge slightly smoothed and the tail over-relaxed by a few percent — the classic Tikhonov pull toward zero, the cost of the H^1 regulariser.

Site B clears the bar on a CPU because its 60 m column with upwelling carries a sharp, depth-structured storm fingerprint (the sensor fits below): the surface sensor swings ~0.15 nondim while the deep sensors lag and damp, so a single mooring already over-determines \tau(t). The shallower diffusion-only site washes that structure out — which is exactly why it is the harder, less informative inverse, and is kept only as the “easy site” contrast in the Task B joint study (§10.3). No GPU is needed for Task A; a GPU only lets you widen the network or iterate faster.

Recovered storm forcing \hat\tau(t) vs truth (left) and the five-sensor temperature fits over the noisy data (right), from `task_a_inverse_pinn.jl` on Site B. Peak error 9.6%, storm day exact, forward field within 0.002 °C of the FD reference — all on a CPU in about four minutes.

Forward sanity check (shipped)

Before debugging an inverse on top of a possibly-broken forward, certify the forward solve. column_pinn_gpu.jl does exactly this on the column’s pure-diffusion relaxation, where a single-eigenmode initial condition has an exact closed-form solution \tilde T^*(\zeta,\tau) = g\,\zeta + A\sin(\tfrac{\pi}{2}\zeta)\,e^{-\mathrm{Pe}(\pi/2)^2\tau} to score against. Measured: the CPU sub-scale fit reaches \|\tilde T_\theta-\tilde T^*\|_{L^2} = 6\times10^{-4}, the A10G full-scale run (30× the collocation, ~85× the throughput) 2\times10^{-3} — both far inside any noise scale, confirming the field operator and the FD-in-input derivative stencil before any inversion.

Optional extensions (not shipped as code)

The inverse script plus that forward check answer §9.9. To go further, these are good exercises — they don’t ship here:

Mechanism partition (the §9.1 question). The recovered \hat\tau(t) gives the storm’s strength and timing, not which physical mechanism dominated the cooling. Re-running the forward solver with \hat\tau and accumulating the advection / mixing / surface-flux contributions to \Delta T(z=-30\,\text{m}) would partition this site’s response — but the interesting result is the contrast across regimes, which needs Task B’s three sites (§10.3), not one column.
Diagnostics. Residual-vs-time histograms (a causality check, Unit 7 §7.5), heat-budget closure at each sensor depth, and cross-validation on a held-out 24-hour window.

Files

scripts/task_a_inverse_pinn.jl — source-recovery inverse PINN: FD reference solve, synthetic mooring data, joint \tilde T_\theta + \tau_\phi training; clears the §9.9.3 targets on a CPU in about four minutes.
scripts/column_pinn_gpu.jl — forward sanity check: pure-diffusion forward PINN scored against the closed-form solution, CPU sub-scale + A10G full-scale.
scripts/column_fd.jl — the finite-difference column solver used as the forward reference.

10.3 Solution to Task B

🔒 Solution to Task B — triple-click to reveal 0 / 3 clicks

Spec: Unit 9 §9.10. Three moorings jointly, H = 100 m, GPU at full scale (the modern-PINN toolkit is an optional scale-up). Here we develop on CPU and queue the GPU launch.

The Task A recipe — a single small MLP with hand-tuned weights — breaks at Task B’s scale for three concrete reasons:

The deep (H = 100 m) column has a much wider range of temporal scales (diurnal at the surface, 30-day at depth), so the smooth-MLP ansatz can’t fit the high-frequency surface signal and the slow deep relaxation simultaneously.
The three-site joint loss has six per-site loss components whose magnitudes differ by 2–3 orders of magnitude — hand tuning isn’t feasible.
Forward-over-forward second derivatives at N_r = 50\,000 collocation points per site are CPU-bound and stall L-BFGS.

The four modern-PINN fixes from Unit 7 §7.3 each address one of these failure modes, in the same order: Fourier features (1), adaptive loss weighting (2), causal training (1 again, time direction), and GPU vectorisation (3).

Step 1 — The field network and the modern-PINN toolkit

Network: 6-layer MLP, 64 neurons / layer, \tanh.
Fourier feature embedding at the input: \gamma(z, t) = [\sin(B\,(z, t)), \cos(B\,(z, t))] with B \sim \mathcal{N}(0, \sigma_B^2 I) — two banks, \sigma_B tuned per band (one for the diurnal cycle \sim 1/86400\,\text{s}, one for the storm envelope \sim 1/(3 \times 86400)\,\text{s}). This breaks the MLP’s spectral bias (Tancik et al. 2020).
Hard BC at z = -H as in Task A; hard IC: T_\theta(z, t) = T_0(z) + t\,N_\theta(z, t) so the IC loss term drops out entirely. Now only the residual + surface-flux BC + data loss compete for the optimiser’s attention.
Adaptive loss weighting (McClenny & Braga-Neto 2022) — gradient-balancing recipe: \lambda_{\text{term}}^{(k+1)} \leftarrow (1 - \alpha)\,\lambda_{\text{term}}^{(k)} + \alpha\,\frac{\max\|\nabla \mathcal{L}_{\text{residual}}\|_\infty}{\overline{\|\nabla \mathcal{L}_{\text{term}}\|}_2} with \alpha = 0.1 updated every 100 iterations.
Causal training (Wang et al. 2022) — temporal weight \omega(t_i) = \exp(-\epsilon\,\sum_{j<i}\!r^2(z_j, t_j)) so a t_i collocation only contributes after the t_{<i} residual has converged.

This is the menu the literature offers for the deep multi-scale column. The shipped solution is deliberately more conservative: both the sub-scale prototype and the GPU full-scale inverse below use hard IC/deep BC + the proven static-weight recipe (Step 2), which is what converges cleanly at this scale. Fourier features earn their keep once the column is deep enough to carry genuinely separated time-scales; adaptive weighting and causal training are the next layer the inverse can lean on — Step 6’s open questions flag where causal still needs care. We flag the menu so you know what to reach for, not because all four are needed to clear the §9.10 bar.

Step 2 — Joint three-site inverse

Three column networks T^{(i)}_\theta(z, t), i \in \{A, B, C\}, each with the architecture above and separate parameters. Shared: a single 3-layer × 48-neuron MLP \tau_\phi(t) for the storm wind-stress envelope all three sites feel.

Joint loss:

\mathcal{L} \;=\; \sum_{i \in \{A, B, C\}}\!\!\Bigl[ \lambda^{(i)}_r\,\mathcal{L}^{(i)}_{\text{PDE}} + \lambda^{(i)}_d\,\mathcal{L}^{(i)}_{\text{data}} + \lambda^{(i)}_b\,\mathcal{L}^{(i)}_{\text{BC}} \Bigr] \;+\; \lambda_{\text{reg}}\,\int |\tau_\phi'(t)|^2\, dt.

The site-specific weights \lambda^{(i)}_{r,d,b} matter because the three regimes (Péclet \ll 1, \sim 1, \gg 1) produce residual / data scales that differ across sites. The literature learns them by gradient-balancing; in practice, at the scale we run, the proven static recipe from Task A — a large data weight \lambda_d = 6000, a tiny H^1 smoothing \lambda_\text{reg} = 10^{-5}, and hard IC + deep BC — is what converges cleanly, and is what task_b_joint_inverse.jl ships. (A naively-tuned adaptive scheme can run away on this deconvolution — over-weighting the noisy data until the recovered \tau blows up; gradient-balancing is the right next step only once the static baseline is trustworthy.)

Why the shared \tau_\phi matters: each site contributes a different observable consequence of the same physical stress. Diffusion-only Site A constrains the storm amplitude through the mixing response; advection-dominated Site C sees the shape of the deep upwelling pulse; Site B sits between. The single shared \tau_\phi must satisfy all three at once — and that joint constraint is what cuts the recovered-peak error from ~20% (Site A alone) or ~12% (Site C alone) down to ~7% jointly, measured on the GPU full-scale run (Step 6).

Step 3 — CPU sub-scale prototype (what we actually run here)

The full 3-site H = 100 m / N_r = 50\,000 launch is GPU-class. Without one, task_b_subscale_prototype.jl runs the two-site joint inverse at CPU sub-scale: Site A (15 m, diffusion-only) and Site B (60 m, advection + diffusion) sharing one storm envelope \tau(t). It deliberately uses the same proven static-weight recipe as Task A — large data weight \lambda_d = 6000, tiny H^1 smoothing \lambda_\text{reg} = 10^{-5}, hard IC + hard deep BC baked into \tilde T = \zeta\,\tau\,N, and the FD-in-input derivative stencil — extended minimally to two sites with a single shared \tau-network. (The GPU full-scale run in Step 6 uses the same static recipe at three sites and a wider network — the Fourier / adaptive / causal machinery of Step 1 is the menu you reach for when the deep H=100 m column forces genuinely separated time-scales, not something this problem needs to clear the bar.) It performs three inversions — each ~4–7 min on a CPU at 12 000 Adam steps, N_r = 4\,000 per site:

Site A alone — peak-amplitude error ~16%.
Site B alone — ~11%. Site B’s deeper, advection-sharpened thermocline carries more storm information than Cleveland Bay’s shallow diffusion-only column, so it recovers the envelope better on its own.
Joint A + B — ~7%, with the storm day recovered exactly. The shared \tau(t) must satisfy both columns at once, so the cross-site constraint cuts the peak error below either single site. This is the headline finding — and it is a real, reproducible measurement, not an extrapolation.

A one-page GPU-launch checklist (the deltas for the full 3-site run) is in Files below.

Step 4 — Recovered \tau(t) from the two-site CPU prototype

The two-site joint inversion produces a \hat\tau(t) measurably sharper than either single-site recovery:

Recovered storm envelope \hat\tau(t) from `task_b_subscale_prototype.jl`. Either mooring alone under- or over-shoots the peak (Site A ~16%, Site B ~11%); the joint inversion with one shared \tau(t) tracks the truth to ~7% and locks the storm day exactly.

Feature	Site A alone (H{=}15m)	Site B alone (H{=}60m)	Joint A+B
Peak-amplitude error	~16%	~11%	~7%
Storm-day timing	exact	exact	exact
Whole-envelope rel. L^2	~0.14	~0.10	~0.07

Site B’s stronger thermocline signature dominates the storm-day amplitude constraint; Site A pins the quieter shoulders of the envelope. Removing either site sends the joint error back up — the two moorings genuinely constrain different parts of the same signal.

Step 5 — Mechanism partition across the three sites

The §9.1 multi-site story made quantitative. For the toy nondim column the three right-hand-side terms — vertical advection -W_\text{adv}\,w(\zeta)\,\partial_\zeta\tilde T, vertical mixing \mathrm{Pe}\,\partial_{\zeta\zeta}\tilde T, and the storm source S(\zeta)\,\tau(t) — are the mechanisms. Integrating each term’s magnitude over the storm window at each site’s diagnostic depth (from the FD reference, computed in task_b_joint_inverse.jl) gives a real partition:

Site	Depth	Péclet W_\text{adv}/\mathrm{Pe}	Advection	Mixing	Storm source
A — Cleveland Bay	z = -10\,\text{m}	0	0%	39%	61%
B — Davies Reef	z = -30\,\text{m}	\sim 1.2	15%	20%	65%
C — Myrmidon Reef	z = -50\,\text{m}	\sim 4	27%	20%	53%

The headline is the trend, and it is real: the advection share climbs from 0\% at diffusion-only Cleveland Bay to \sim\!27\% at advection-dominated Myrmidon, while the directly-forced storm source stays the largest single term (this is, after all, a source-recovery problem). Cleveland Bay’s storm response is mixing-and-forcing only; Myrmidon’s carries a substantial upwelling signature; Davies Reef sits in between. Three moorings → three distinct storm-response fingerprints from one shared wind-stress event. That contrast is what the joint inversion exploits and no single-site analysis can see.

Step 6 — GPU full-scale run (measured)

Running task_b_joint_inverse.jl on the course GPU hub (NVIDIA A10G) takes the recipe to all three sites at the wider 64\times5 network. These are measured, not extrapolated:

Run	Config	Wall-clock	\hat\tau peak error
CPU 2-site prototype	2 sites, 32\times4, 12k steps	~7 min (joint)	6.8%
GPU 3-site full-scale (A10G)	3 sites, 64\times5, 3k steps	~5 min (joint), ~10 min (all four inversions)	6.7%

The per-site GPU breakdown tells the joint-inversion story directly: Site A alone recovers the peak to ~20%, Site B to ~15%, Site C to ~12% — and the shared-\tau joint of all three lands at 6.7%, better than any single column. The advection-dominated Site C does best alone (it carries the most storm information) yet still loses to the joint.

The headline is that the GPU does the harder three-site / wider-network / deeper-column problem in less wall-clock than the CPU spent on the two-site prototype — that is what “GPU buys you full scale” means here. (The raw training-step throughput behind that is benchmarked in Unit 7 §7.6 ▸ Why a GPU changes which PINNs you can afford: on the A10G the batched collocation pass runs tens of times faster than the CPU.) A research group with three real moorings can re-run this on overnight GPU time.

Open questions for the full run

Will causality violation re-appear at Myrmidon Reef (advection-dominated, \mathrm{Pe} \gg 1)? The CPU prototype is kept to two sites just to stay small and fast; the shipped 3-site run includes Myrmidon and clears the bar with static weights. Its longer characteristic timescale is the open risk a per-site causal scheduler would address in a deeper scale-up.
Will gradient-balancing converge on a stable weight ratio? The static recipe is stable at sub-scale; gradient-balancing itself is untested here (Step 2 notes a naive adaptive scheme can run away).
How well does the recovered \tau(t) correlate with the independent SWE-driver w(t) inferred from local wind observations? This is the validation step that turns the recovered synthetic answer into something an oceanographer trusts.

Files

scripts/task_b_subscale_prototype.jl — two-site CPU sub-scale joint inverse (the headline prototype)
scripts/task_b_joint_inverse.jl — three-site GPU full-scale joint trainer + mechanism partition
scripts/task_b_gpu_launch.md — GPU-launch checklist

10.4 A Python parallel with DeepXDE

DeepXDE is the Python equivalent of NeuralPDE.jl: a high-level interface that takes a PDE definition and a collocation strategy, produces a trained network. Here’s the Task A forward column problem (Davies Reef, H = 60 m, fixed \tau) as a DeepXDE sketch — same architecture (4 × 32 tanh MLP), same hard-BC ansatz at the deep reservoir, same Adam → L-BFGS schedule. It conveys the shape of a DeepXDE port, not a byte-for-byte match to the Julia twin: the forcing constants are round illustrative values and the diffusion is written non-conservatively (\kappa\,\partial_{zz}T) for brevity:

units/unit_10/scripts/task_a_forward_deepxde.py

# pip install deepxde[torch]
import numpy as np
import deepxde as dde

H = 60.0;  Tf = 30.0 * 86400.0
T_deep = 22.0
def kappa_z(z):  return 1e-3 * np.exp(z / 20.0) + 1e-5    # mixed-layer profile (Site B)
def w_z(z, t):   return 5e-5 * (z + H) / H               # upwelling profile (Site B): linear in z (SWE), m/s
def Q_np(t):     return -120.0                           # net cooling (W/m^2)
def S_z(z, t):   return 400.0 / 4.0e6 / 8.0 * np.exp(z / 8.0)  # body source S = Q_SW/(ρ₀cₚζ)·e^(z/ζ), ζ=8 m

# Geometry: z in [-H, 0], t in [0, Tf]
geom     = dde.geometry.Interval(-H, 0.0)
timedom  = dde.geometry.TimeDomain(0.0, Tf)
geomtime = dde.geometry.GeometryXTime(geom, timedom)

def pde(zt, T):
    z, t = zt[:, 0:1], zt[:, 1:2]
    dT_dt = dde.grad.jacobian(T, zt, j=1)
    dT_dz = dde.grad.jacobian(T, zt, j=0)
    d2T_dz2 = dde.grad.hessian(T, zt, component=0, i=0, j=0)
    kappa = kappa_z(z)
    return dT_dt + w_z(z, t) * dT_dz - kappa * d2T_dz2 - S_z(z, t)

# Hard BC at the deep reservoir via output transform.
def output_transform(zt, T):
    z = zt[:, 0:1]
    return T_deep + (z + H) * T          # T(-H, t) = T_deep automatically

# Soft surface flux at z = 0.
def surface_flux(zt, T, _):
    dT_dz_top = dde.grad.jacobian(T, zt, j=0)
    rho, cp = 1025.0, 3990.0
    return rho * cp * kappa_z(zt[:, 0:1]) * dT_dz_top - Q_np(zt[:, 1:2])
bc_surface = dde.icbc.OperatorBC(
    geomtime, surface_flux,
    lambda zt, on_boundary: on_boundary and np.isclose(zt[0], 0.0),
)

# IC: initial thermocline tanh profile (Site B: thermocline at z = -25 m)
def T0(z): return 25.0 + 3.0 * np.tanh((z + 25.0) / 5.0)
ic = dde.icbc.IC(geomtime, lambda zt: T0(zt[:, 0:1]),
                 lambda _, on_initial: on_initial)

data = dde.data.TimePDE(
    geomtime, pde, [bc_surface, ic],
    num_domain=2000, num_boundary=200, num_initial=200,
)
net = dde.nn.FNN([2] + [32] * 4 + [1], "tanh", "Glorot uniform")
net.apply_output_transform(output_transform)
model = dde.Model(data, net)
model.compile("adam", lr=1e-3); model.train(iterations=2000)
model.compile("L-BFGS"); model.train()

Available as scripts/task_a_forward_deepxde.py. Runtime: ~5 min on a CPU with the PyTorch backend.

The structural one-to-one with the Julia version:

Concern	Julia (`NeuralPDE.jl`)	Python (`DeepXDE`)
Geometry	`IntervalDomain` × `TimeDomain`	`Interval` × `TimeDomain`
Residual	`Differential(t)` + symbolic AD	`dde.grad.jacobian` / `hessian`
Hard BC ansatz	manual product form	`apply_output_transform`
Soft BC	`bcs` vector	`OperatorBC`
Optimiser	`Optimization.solve(Adam → LBFGS)`	`model.compile/train` twice
AD backend	Zygote + ForwardDiff	PyTorch / JAX / TF

Cross-ecosystem comparison

Strengths and weaknesses, honestly:

Julia stack — composes cleanly with classical SciML solvers, faster autodiff for stiff problems, smaller ML community.
Python stack — broader ML community support, more ergonomic when extending with bespoke loss terms, larger ecosystem of pre-trained models and tutorials.

Neither is strictly better. Pick by who’s using it and what they already know.

10.5 Where to go from here

Beyond the workshop:

Scenario 5 — couple in the shallow-water driver. Replace the prescribed scenario-4 envelopes with a physically self-consistent w(t) from the SWE solve sketched in §10.1, feeding each mooring’s horizontal divergence into its column. The linearised shallow-water machinery in Unit 1 is the cleanest starting point.
Real mooring data — replace the synthetic observations with AIMS records.
Coupling with operational ocean models — ROMS, MOM6 — for realistic horizontal forcings.
Uncertainty quantification on the recovered driver, via Bayesian PINNs or ensembles.
Transfer to other reef sites with different climatologies.

The PINN software ecosystem you’d reach for next is catalogued in Unit 7 §7.6; the broader PIML field is mapped in the references.

10.6 Reference solver source: `scripts/column_fd.jl`

The MethodOfLines finite-difference reference for T(z, t), kept as a standalone script (see SETUP.md) so quarto render stays cheap. Run with ./build.sh execute 10; outputs land in output/. Folded for brevity; expand to read.

# Reference 1D vertical heat-transport column solver (finite-difference,
# method-of-lines). See unit_09.qmd for the model specification.
#
# Scenarios (per unit_09.qmd §9.11):
#   1. pure diffusion to steady state
#   2. diurnal cycle (Q_SW + body source S)
#   3. add upwelling (w(z))
#   4. storm fingerprint (gust modulates w, κ, Q_SW around day 10)
#
# Run a scenario:
#   include("column_fd.jl"); r = run_scenario(scenario_1());
#   plot_scn1(r)

using ModelingToolkit, MethodOfLines, OrdinaryDiffEq
using OrdinaryDiffEqBDF: QNDF  # umbrella OrdinaryDiffEq no longer re-exports BDF solvers
using DomainSets: ClosedInterval
using CairoMakie
using Printf

# ── reference parameters (unit_09.qmd §9.6) ────────────────────────────
const PARAMS = (
    H         = 100.0,    # column depth (m)
    ρ         = 1025.0,   # seawater density (kg/m³)
    cp        = 3990.0,   # specific heat (J/(kg·K))
    α         = 2e-4,     # thermal expansion (1/K)
    T_surface = 28.0,     # initial SST (°C)
    T_deep    = 18.0,     # deep reservoir (°C)
    z_t       = -30.0,    # thermocline depth (m)
    δ_t       = 5.0,      # thermocline sharpness (m)
    QSW_max   = 800.0,    # peak noon shortwave (W/m²)
    Q_cool    = 200.0,    # non-penetrative net cooling (W/m²)
    ζ         = 10.0,     # shortwave penetration scale (m)
    κ_b       = 1e-5,     # background diffusivity (m²/s)
    κ_m       = 1e-3,     # mixed-layer diffusivity (m²/s)
    h_m       = 20.0,     # mixed-layer scale (m)
    w0        = 1e-5,     # peak upwelling (m/s)
    τ_d       = 86400.0,  # diurnal period (s)
    # storm scenario: Gaussian gust around day 10, ~3 days wide.
    t_storm   = 10 * 86400.0,
    σ_storm   = 1.0 * 86400.0,
    w0_storm  = 5e-5,     # gust-driven upwelling (5× baseline)
    cloud_amp = 0.5,      # peak SW reduction during gust
)

storm_envelope(t, p) = exp(-((t - p.t_storm) / p.σ_storm)^2)

# ── initial profile ─────────────────────────────────────────────────────
T0(z, p=PARAMS) = 0.5*(p.T_surface + p.T_deep) +
                  0.5*(p.T_surface - p.T_deep) * tanh((z - p.z_t)/p.δ_t)

# ── vertical velocity profiles ──────────────────────────────────────────
w_zero(z, t, p)       = 0.0
w_upwelling(z, t, p)  = p.w0 * sin(π * (z + p.H) / p.H)
w_storm(z, t, p)      = p.w0_storm * storm_envelope(t, p) * sin(π * (z + p.H) / p.H)

# ── eddy diffusivity profiles ───────────────────────────────────────────
κ_const(z, t, p)   = p.κ_m
κ_profile(z, t, p) = p.κ_b + (p.κ_m - p.κ_b) * exp(z / p.h_m)

# ── surface forcing ─────────────────────────────────────────────────────
# Smooth diurnal: use (1 + cos)/2 instead of max(0, cos) to keep the RHS
# differentiable for MTK. Same daily mean energy as max(0, cos)·(2/π)·QSW_max
# would give, but a smooth half-cosine — close enough for the toy model.
QSW_diurnal(t, p) = p.QSW_max * 0.5 * (1.0 + cos(2π * t / p.τ_d))
QSW_storm(t, p)   = p.QSW_max * (1.0 - p.cloud_amp * storm_envelope(t, p)) *
                    0.5 * (1.0 + cos(2π * t / p.τ_d))
Q_np_steady(t, p) = -p.Q_cool

# Body source from Beer–Lambert: I(z,t) = Q_SW(t) e^{z/ζ},
# S = (1/(ρcp)) ∂I/∂z = Q_SW(t)/(ζ ρ cp) · e^{z/ζ}.
S_off(z, t, p)     = 0.0
S_diurnal(z, t, p) = QSW_diurnal(t, p) * exp(z / p.ζ) / (p.ζ * p.ρ * p.cp)
S_storm(z, t, p)   = QSW_storm(t, p)   * exp(z / p.ζ) / (p.ζ * p.ρ * p.cp)

# ── PDE builder ─────────────────────────────────────────────────────────
function build_pde(scn, p=PARAMS)
    @parameters z t
    @variables T(..)
    Dt = Differential(t)
    Dz = Differential(z)

    # Substitute scenario profiles symbolically so MOL sees closed-form
    # expressions in (z, t).
    w_sym  = scn.w(z, t, p)
    κ_sym  = scn.κ(z, t, p)
    S_sym  = scn.S(z, t, p)
    Qnp_sym = scn.Q_np(t, p)
    κ_top  = scn.κ(0.0, t, p)

    eq = Dt(T(z, t)) + w_sym * Dz(T(z, t)) ~
         Dz(κ_sym * Dz(T(z, t))) + S_sym

    domains = [z ∈ ClosedInterval(-p.H, 0.0),
               t ∈ ClosedInterval(0.0, scn.Tf)]

    bcs = [
        T(z, 0.0)   ~ T0(z, p),
        T(-p.H, t)  ~ p.T_deep,
        κ_top * Dz(T(0.0, t)) ~ Qnp_sym / (p.ρ * p.cp),
    ]

    @named pde_system = PDESystem(eq, bcs, domains, [z, t], [T(z, t)])
    return pde_system, z, t, T
end

# ── run dispatcher ──────────────────────────────────────────────────────
function run_scenario(scn; dz=1.0, p=PARAMS)
    pde_system, z, t, T = build_pde(scn, p)
    disc = MOLFiniteDifference([z => dz], t)
    prob = discretize(pde_system, disc)
    @info "solving scenario: $(scn.name) (Tf = $(scn.Tf/86400) days)"
    sol = solve(prob, scn.alg; saveat=scn.saveat, abstol=1e-8, reltol=1e-6)
    (; sol, z, t, T, scn)
end

# ── scenario configs ────────────────────────────────────────────────────
# Tf chosen long enough to reach (near-)steady state: T_κ = H²/κ_m ≈ 116 days,
# so ~3·T_κ ≈ 350 days gets us to within a few percent.
scenario_1() = (
    name   = "diffusion to steady state",
    w      = w_zero,
    κ      = κ_const,
    S      = S_off,
    Q_np   = Q_np_steady,
    Tf     = 365 * 86400.0,
    saveat = 10 * 86400.0,
    alg    = QNDF(),
)

scenario_2() = (
    name   = "diurnal cycle",
    w      = w_zero,
    κ      = κ_const,
    S      = S_diurnal,
    Q_np   = Q_np_steady,
    Tf     = 10 * 86400.0,
    saveat = 600.0,
    alg    = QNDF(),
)

scenario_3() = (
    name   = "upwelling",
    w      = w_upwelling,
    κ      = κ_const,
    S      = S_diurnal,
    Q_np   = Q_np_steady,
    Tf     = 30 * 86400.0,
    saveat = 3600.0,
    alg    = QNDF(),
)

# Scenario 4 (storm): Gaussian gust centred at t_storm. The gust spikes
# upwelling (w_storm) and dims the surface SW (cloud cover via S_storm).
# Real SWE-driven forcings would replace these envelopes; this is the
# prescribed-driver stub used in §9.11 task 5.
scenario_4() = (
    name   = "storm fingerprint",
    w      = w_storm,
    κ      = κ_const,
    S      = S_storm,
    Q_np   = Q_np_steady,
    Tf     = 30 * 86400.0,
    saveat = 3600.0,
    alg    = QNDF(),
)

# ── analytic check for scenario 1 ───────────────────────────────────────
# Steady state of (κ T_z)_z = 0 with T(-H)=T_deep and κ T_z|₀ = Q_np/(ρcp):
#   T(z) = T_deep − (Q_cool/(κ_m ρ cp)) · (z + H)
# (Using Q_np = -Q_cool < 0, so the surface is cooler than the deep
# reservoir — heat flows up from depth to be lost at the air-sea
# interface.)
T_analytic_scn1(z, p=PARAMS) =
    p.T_deep - p.Q_cool / (p.κ_m * p.ρ * p.cp) * (z + p.H)

# ── plotting ────────────────────────────────────────────────────────────
# Extract the (Nz, Nt) matrix and grids from a run_scenario result,
# regardless of MOL's internal axis ordering. We declared PDESystem with
# ivs = [z, t], so sol[z] and sol[t] are the canonical grids; we transpose
# the matrix to (Nz, Nt) if MOL returned it in (Nt, Nz) form.
function _grids(r)
    zg = r.sol[r.z]
    tg = r.sol[r.t]
    Tg = r.sol[r.T(r.z, r.t)]
    Tg = size(Tg, 1) == length(zg) ? Tg : permutedims(Tg)
    (zg, tg, Tg)
end

"Final-time T(z) vs analytic steady state."
function plot_scn1(r; outpath=joinpath(@__DIR__, "..", "output", "scn1_steady.png"))
    zg, tg, Tg = _grids(r)
    Tf = Tg[:, end]
    fig = Figure(size=(700, 500))
    ax = Axis(fig[1,1], xlabel="T (°C)", ylabel="z (m)",
              title="Scenario 1: pure diffusion to (near-)steady state")
    # Draw the MOL solution first (solid), then the analytic steady state
    # as a dashed overlay ON TOP, in a contrasting colour — otherwise the
    # solid MOL line hides the dashes entirely (the two agree to ~4e-3 °C,
    # so the visible dashes riding on the orange line ARE the validation).
    lines!(ax, Tf, zg; linewidth=3, label="MOL final")
    lines!(ax, T_analytic_scn1.(zg), zg;
           linestyle=:dash, linewidth=2, color=:black, label="analytic")
    lines!(ax, T0.(zg), zg;
           color=:gray, linewidth=1, label="IC")
    axislegend(ax, position=:rb)
    save(outpath, fig)
    err = maximum(abs, Tf .- T_analytic_scn1.(zg))
    @info "saved $outpath; max |T_MOL − T_analytic| = $(round(err, sigdigits=3)) °C"
    fig
end

"Scenario 2: near-surface traces (showing the diurnal cycle) over the (z, t) heatmap."
function plot_scn2(r; zmax=10.0,
                   outpath=joinpath(@__DIR__, "..", "output", "scn2_diurnal.png"))
    zg, tg, Tg = _grids(r)
    days = tg ./ 86400.0
    keep = zg .>= -zmax
    fig = Figure(size=(900, 640))

    # Top: a few near-surface traces — the daily warm/cool cycle is obvious here
    # even though it is washed out by the slow relaxation in the heatmap below.
    ax1 = Axis(fig[1,1], xlabel="t (days)", ylabel="T (°C)",
               title="Scenario 2: diurnal warm layer — near-surface traces")
    for (d, col) in ((1.0, :firebrick), (3.0, :darkorange), (7.0, :seagreen))
        zi = argmin(abs.(zg .+ d))
        lines!(ax1, days, Tg[zi, :]; color=col, linewidth=1.6, label="z = −$(Int(d)) m")
    end
    axislegend(ax1, position=:rt, framevisible=true)

    ax2 = Axis(fig[2,1], xlabel="t (days)", ylabel="z (m)", title="T(z, t) — top $(Int(zmax)) m")
    hm = heatmap!(ax2, days, zg[keep], Tg[keep, :]')
    Colorbar(fig[2,2], hm, label="T (°C)")
    rowsize!(fig.layout, 1, Relative(0.42))
    save(outpath, fig)
    @info "saved $outpath"
    fig
end

"Five-mooring-depth time series + full (z,t) heatmap for scenarios 3–4."
function plot_mooring(r; depths_m=(2.0, 10.0, 30.0, 60.0, 90.0),
                      outpath=joinpath(@__DIR__, "..", "output", "$(replace(r.scn.name, ' '=>'_'))_mooring.png"))
    zg, tg, Tg = _grids(r)
    days = tg ./ 86400.0
    fig = Figure(size=(1000, 700))
    ax1 = Axis(fig[1,1], xlabel="t (days)", ylabel="T (°C)",
               title="Mooring traces — $(r.scn.name)")
    for d in depths_m
        zi = argmin(abs.(zg .+ d))  # z = -d
        lines!(ax1, days, Tg[zi, :]; label="z = −$(d) m")
    end
    axislegend(ax1, position=:rt)
    ax2 = Axis(fig[2,1], xlabel="t (days)", ylabel="z (m)",
               title="T(z, t)")
    hm = heatmap!(ax2, days, zg, Tg')
    Colorbar(fig[2,2], hm, label="T (°C)")
    save(outpath, fig)
    @info "saved $outpath"
    fig
end

# ── entry point when run as script ──────────────────────────────────────
# Used by ./build.sh execute 10 to populate units/unit_10/output/.
if abspath(PROGRAM_FILE) == @__FILE__
    r1 = run_scenario(scenario_1());  plot_scn1(r1)
    r2 = run_scenario(scenario_2());  plot_scn2(r2)
    r3 = run_scenario(scenario_3());  plot_mooring(r3)
    r4 = run_scenario(scenario_4());  plot_mooring(r4)
end

Running it produces:

[ Info: solving scenario: diffusion to steady state (Tf = 365.0 days)
[ Info: saved units/unit_10/output/scn1_steady.png; max |T_MOL − T_analytic| = 0.00407 °C
[ Info: solving scenario: diurnal cycle (Tf = 10.0 days)
[ Info: saved units/unit_10/output/scn2_diurnal.png
[ Info: solving scenario: upwelling (Tf = 30.0 days)
[ Info: saved units/unit_10/output/upwelling_mooring.png
[ Info: solving scenario: storm fingerprint (Tf = 30.0 days)
[ Info: saved units/unit_10/output/storm_fingerprint_mooring.png

10.1 Shared infrastructure: the forward solver and scenarios

Finite-difference reference solver

Scenario 1 — pure diffusion to steady state (§9.11 step 1)

Scenario 2 — diurnal cycle (§9.11 step 2)

Scenario 3 — upwelling (§9.11 step 3)

Scenario 4 — storm fingerprint (prescribed gust) (§9.11 step 4)

Shallow-water driver (scenario-5 forcing — the design)

10.2 Solution to Task A

Step 1 — The problem: source recovery

Step 2 — The loss, and the lesson it teaches

Step 3 — Results

Forward sanity check (shipped)

Optional extensions (not shipped as code)

Files

10.3 Solution to Task B

Step 1 — The field network and the modern-PINN toolkit

Step 2 — Joint three-site inverse

Step 3 — CPU sub-scale prototype (what we actually run here)

Step 4 — Recovered \tau(t) from the two-site CPU prototype

Step 5 — Mechanism partition across the three sites

Step 6 — GPU full-scale run (measured)

Open questions for the full run

Files

10.4 A Python parallel with DeepXDE

Cross-ecosystem comparison

10.5 Where to go from here

10.6 Reference solver source: scripts/column_fd.jl

10.6 Reference solver source: `scripts/column_fd.jl`