Project Solution
This unit is the worked solution to the capstone spec laid out in Unit 9. The science behind the model sits in Unit 8. Here we cover three things in this order:
- Shared infrastructure (§10.1 — open). The finite-difference reference solver, the toy-task ladder (Unit 9 §9.11), and the SWE driver. Both Task A and Task B start from these.
- Solution to Task A (§10.2 — locked, triple-click to unlock). Single-site CPU-friendly inverse problem.
- Solution to Task B (§10.3 — locked, triple-click to unlock). Three-site joint inverse with the modern PINN toolkit, plus the sub-scale CPU prototype steps.
Each solution sits behind its own triple-click gate, so you can read the shared infrastructure (and one task’s solution, if you want) without spoiling the other. We strongly recommend you attempt each task yourself first — the learning is in working through the inverse problem, not in reading our answer.
10.2 Solution to Task A
Spec: Unit 9 §9.9. Site A (Cleveland Bay, H = 15 m), one driver to recover, ~30 min on a laptop CPU.
The full pipeline runs in three Julia scripts under units/unit_10/scripts/: task_a_forward_pinn.jl (forward sanity check), task_a_inverse_pinn.jl (joint inverse), task_a_diagnostics.jl (residual histograms + heat-budget closure + plots). Each one is ~120 lines of Lux + Zygote + Optimisers code following the skeleton of Unit 5 §5.3 — short enough to read end-to-end.
Step 1 — Forward PINN (sanity check before inversion)
A single MLP T_\theta(z, t) with 4 hidden layers of 32 neurons each (\tanh activations). The forward problem isn’t the goal, but solving it first proves that the architecture + loss weights can reproduce the FD reference before we ask the network to also recover an unknown driver.
Hard BC enforcement at z = -H via
T_\theta(z, t) = T_{\text{deep}} + (z + H)\,N_\theta(z, t),
so the bottom Dirichlet BC drops out of the loss for free (Unit 7 §7.3). The surface flux stays as a soft penalty \lambda_b \bigl(\rho c_p \kappa\, \partial_z T_\theta - Q_{\text{np}}\bigr)^2 because making it hard requires an ansatz that fights the diurnal cycle.
Loss (forward, fixed \tau):
\mathcal{L}_{\text{fwd}}(\theta) \;=\; \lambda_r\,\frac{1}{N_r}\!\sum_{i=1}^{N_r} \bigl|r_\theta(z_i, t_i)\bigr|^2 \;+\; \lambda_b\,\frac{1}{N_b}\!\sum_{j=1}^{N_b} \bigl|\rho c_p \kappa\, \partial_z T_\theta(0, t_j) - Q_{\text{np}}(t_j)\bigr|^2 \;+\; \lambda_i \bigl|T_\theta(z, 0) - T_0(z)\bigr|^2.
with r_\theta = \partial_t T_\theta + w\,\partial_z T_\theta - \partial_z(\kappa\,\partial_z T_\theta) - \mathcal{S}, N_r = 2000 collocation points (Latin-hypercube on [-H, 0] \times [0, T_f]), N_b = 200 surface points (uniform in t), and weights \lambda_r = 1, \lambda_b = 50, \lambda_i = 100.
Training schedule: Adam(1e-3) for 2 000 iterations → L-BFGS for 500. The Adam phase escapes the random initialisation; L-BFGS drives the residual the rest of the way. Wall-clock: ~3 minutes on an M2 MacBook.
Pass criterion. \|T_\theta - T_{\text{FD}}\|_{L^2} < 0.05\,°\text{C} on a 100×100 grid. If this fails, stop and diagnose before turning on the inverse loop — debugging the inverse with a broken forward is a rabbit hole.
Step 2 — Inverse PINN: recover \tau(t)
Now \tau(t) becomes unknown and we add a second network for it:
- T_\theta(z, t) — same 4×32 MLP as before (~3 000 parameters).
- \tau_\phi(t) — a small 2×16 MLP (~600 parameters) for the wind-stress envelope. Initialised near zero so the optimiser doesn’t fight a bad guess.
Both networks see the same forward residual; the difference from §10.2 Step 1 is that the column equation now uses w(z, t; \tau_\phi(t)) (the SWE-derived w scales with \tau) and \kappa(z, t; \tau_\phi(t)) (a simple wind-mixing closure \kappa_m \mapsto \kappa_m(1 + c |\tau_\phi|)).
Joint inverse loss:
\mathcal{L}(\theta, \phi) \;=\; \underbrace{\lambda_r\,\mathcal{L}_{\text{PDE}}(\theta, \phi)}_{\text{physics}} \;+\; \underbrace{\lambda_d\,\frac{1}{N_d}\!\sum_k\!\bigl|T_\theta(z_k, t_k) - T_{\text{obs}}(z_k, t_k)\bigr|^2}_{\text{data}} \;+\; \underbrace{\lambda_{\text{reg}}\,\int_0^{T_f}\!|\tau_\phi'(t)|^2\,dt}_{H^1 \text{ smoothness prior on } \tau} \;+\; \mathcal{L}_{\text{BC}} + \mathcal{L}_{\text{IC}}.
Hand-tuned weights: \lambda_r = 1, \lambda_d = 100, \lambda_{\text{reg}} = 10^{-2}. The data weight is high because the dataset is small (5 sensors × 720 hourly samples = 3 600 points) — without it, the optimiser satisfies the residual perfectly with \tau_\phi \equiv 0. The H^1 smoothness prior is the only thing keeping \tau_\phi from absorbing the 0.05\,°\text{C} observation noise; without it, the recovered envelope shows visible “ringing” at the single-hour-sample frequency.
Why these weights? Quick ablation:
| \lambda_d | \lambda_{\text{reg}} | Recovered peak | Comment |
|---|---|---|---|
| 1 | 10^{-2} | 5% of truth | residual dominates; \tau collapses |
| 100 | 0 | 130% of truth | noise-fit; visible high-frequency ringing |
| 100 | 10^{-2} | 88% of truth | the chosen operating point |
| 100 | 10^{-1} | 70% of truth | over-smoothed; peak suppressed |
Training schedule: Adam(1e-3) for 5 000 iterations → L-BFGS for 1 000. Wall-clock: ~15 minutes on an M2 MacBook.
Step 3 — Recovered \tau(t) vs truth
Plot the recovered \hat\tau(t) against the synthetic ground truth. The gust envelope is recovered with:
- Peak-amplitude error ~12% — within the §9.9.3 success criterion of \leq 15\%.
- Timing offset < 2 hours on the storm-day peak.
- Rising edge slightly smoothed by the H^1 penalty — this is the inverse-problem cost of regularisation.
- Tail over-relaxed by ~5% — the classic Tikhonov bias toward zero in the absence of data constraint.
Step 4 — Mechanism partition (the §9.1 answer)
The recovered \hat\tau(t) alone doesn’t tell you which of the three storm hypotheses dominated — it bundles all three through the column model (Unit 9 §9.1 “How the inverse answers the mechanism question” callout). The discriminator: re-run the forward solver with \hat\tau in hand, accumulate the three contributions to the temperature change at z = -10\,\text{m} (two-thirds down Cleveland Bay’s 15 m column, just below the typical thermocline) over the storm window:
\Delta T(t) \;=\; \underbrace{\int_0^t -\,w\,\partial_z T_\theta\, ds}_{\Delta T_{\text{adv}}(t)} \;+\; \underbrace{\int_0^t \partial_z(\kappa\,\partial_z T_\theta)\, ds}_{\Delta T_{\text{mix}}(t)} \;+\; \underbrace{\int_0^t \mathcal{S}\, ds}_{\Delta T_{\text{flux}}(t)}.
On Cleveland Bay (Pe ≪ 1), the expected partition over the storm 4-day window is roughly:
| Mechanism | Storm-window contribution |
|---|---|
| Vertical advection (upwelling) | ~15% of total cooling |
| Turbulent mixing | ~70% (dominant — diffusion-dominated regime) |
| Reduced surface heating | ~15% |
If your recovered partition lands within ±10% of these numbers, the inversion has correctly identified that wind-driven mixing dominates at Cleveland Bay. The plot task_a_partition.png from task_a_diagnostics.jl shows the three integrated curves vs t.
Step 5 — Diagnostics
Three checks that ship with task_a_diagnostics.jl:
- Residual histogram vs t. Bin the residual r_\theta(z_i, t_i) by its t_i value into 30 time bins; plot the mean and 90th percentile per bin. For Task A this should decrease roughly uniformly in time, confirming no causality violation (Unit 7 §7.5). A spike in the late bins is the signature of the failure mode that §10.3 Task B fixes with causal training.
- Heat-budget closure at each sensor depth. Evaluate each term of the column PDE on a dense (z_k, t) grid; the sum should be \sim 10^{-3}\,\text{W/m}^2 — well below the surface-flux scale of \sim 200\,\text{W/m}^2. A non-closing budget means the residual loss isn’t actually being minimised pointwise (often a sign of stale gradients in the inverse loop).
- Cross-validation on a held-out 24-hour window. Refit with the data from days 20–21 removed; predict that window from the trained network and the recovered \tau. Predictions should stay within \pm 0.05\,°\text{C} of the FD reference — the same magnitude as the observation noise.
What we didn’t do here (and why)
- No Fourier features — the diurnal signal on a 15 m column is weak enough that the smooth-MLP ansatz handles it.
- No causal training — the column is shallow enough that residual-decay-in-time is already uniform without it.
- No adaptive loss weights — manual tuning is feasible because there are only 4 weights and the dynamic range is modest.
- No GPU — by design.
Each of these choices fails at Task B’s scale; the gradient-balancing / Fourier / causal / hard-IC machinery in §10.3 is what closes the gap.
Files
scripts/task_a_forward_pinn.jl— forward PINN trainer (~120 lines)scripts/task_a_inverse_pinn.jl— joint T_\theta + \tau_\phi inverse trainer (~180 lines)scripts/task_a_diagnostics.jl— the three diagnostic plots above (~80 lines)
10.3 Solution to Task B
Spec: Unit 9 §9.10. Three moorings jointly, H = 100 m, modern PINN toolkit, GPU at full scale. Here we develop on CPU and queue the GPU launch.
The Task A recipe — a single small MLP with hand-tuned weights — breaks at Task B’s scale for three concrete reasons:
- The deep (H = 100 m) column has a much wider range of temporal scales (diurnal at the surface, 30-day at depth), so the smooth-MLP ansatz can’t fit the high-frequency surface signal and the slow deep relaxation simultaneously.
- The three-site joint loss has six per-site loss components whose magnitudes differ by 2–3 orders of magnitude — hand tuning isn’t feasible.
- Forward-over-forward second derivatives at N_r = 50\,000 collocation points per site are CPU-bound and stall L-BFGS.
The four modern-PINN fixes from Unit 7 §7.3 each address one of these failure modes, in the same order: Fourier features (1), adaptive loss weighting (2), causal training (1 again, time direction), and GPU vectorisation (3).
Step 1 — Forward PINN with the full modern toolkit
- Network: 6-layer MLP, 128 neurons / layer, \tanh.
- Fourier feature embedding at the input: \gamma(z, t) = [\sin(B\,(z, t)), \cos(B\,(z, t))] with B \sim \mathcal{N}(0, \sigma_B^2 I) — two banks, \sigma_B tuned per band (one for the diurnal cycle \sim 1/86400\,\text{s}, one for the storm envelope \sim 1/(3 \times 86400)\,\text{s}). This breaks the MLP’s spectral bias (Tancik et al. 2020).
- Hard BC at z = -H as in Task A; hard IC: T_\theta(z, t) = T_0(z) + t\,N_\theta(z, t) so the IC loss term drops out entirely. Now only the residual + surface-flux BC + data loss compete for the optimiser’s attention.
- Adaptive loss weighting (McClenny & Braga-Neto 2022) — gradient-balancing recipe: \lambda_{\text{term}}^{(k+1)} \leftarrow (1 - \alpha)\,\lambda_{\text{term}}^{(k)} + \alpha\,\frac{\max\|\nabla \mathcal{L}_{\text{residual}}\|_\infty}{\overline{\|\nabla \mathcal{L}_{\text{term}}\|}_2} with \alpha = 0.1 updated every 100 iterations.
- Causal training (Wang et al. 2022) — temporal weight \omega(t_i) = \exp(-\epsilon\,\sum_{j<i}\!r^2(z_j, t_j)) so a t_i collocation only contributes after the t_{<i} residual has converged.
Step 2 — Joint three-site inverse
Three column networks T^{(i)}_\theta(z, t), i \in \{A, B, C\}, each with the architecture above and separate parameters. Shared: a single 4-layer × 64-neuron MLP \tau_\phi(t) for the storm wind-stress envelope all three sites feel.
Joint loss:
\mathcal{L} \;=\; \sum_{i \in \{A, B, C\}}\!\!\Bigl[ \lambda^{(i)}_r\,\mathcal{L}^{(i)}_{\text{PDE}} + \lambda^{(i)}_d\,\mathcal{L}^{(i)}_{\text{data}} + \lambda^{(i)}_b\,\mathcal{L}^{(i)}_{\text{BC}} \Bigr] \;+\; \lambda_{\text{reg}}\,\int |\tau_\phi'(t)|^2\, dt.
The site-specific weights \lambda^{(i)}_{r,d,b} are all learnt by gradient-balancing — the three regimes (Pe \ll 1, Pe \sim 1, Pe \gg 1) produce residual / data scales that differ by ~10^2 across sites, so a static \lambda^{(i)} = 1 collapses to “fit Site C only”.
Why the shared \tau_\phi matters: each site contributes a different observable consequence of the same physical stress. Site A (diffusion-dominated) sees the stress via \kappa modulation — its data constrains the amplitude. Site B (intermediate) sees the timing of the upwelling pulse. Site C (advection-dominated) sees the shape of the deep cooling. Combining all three is what cuts the recovery error from ~12% A or ~15% B alone down to ~3–5% jointly.
Step 3 — CPU sub-scale prototypes (what we actually run here)
The full 3-site H = 100 m / N_r = 50\,000 launch is GPU-class. Without one, the unit walks the first three of the four sub-scale prototype steps from §9.10.2:
Sub-scale prototype on Task A’s column. Run the full modern-toolkit architecture (Fourier + hard BC + adaptive + causal) on Task A’s 15 m / single-site domain with N_r = 5\,000. ~30 min on M2 MacBook. Recovers \tau(t) at ~12% peak error — exactly Task A’s accuracy at ~1.5× the training cost. The point is to confirm the modern toolkit doesn’t hurt on the easy problem (which would suggest the adaptive weights or Fourier features are mis-tuned).
Two-site joint inverse, Sites A + B, H = 60 m. Medium architecture (4 × 64 neurons + Fourier + hard BC + adaptive weights, no causal training yet), N_r = 10\,000 per site. ~2 h on CPU. Recovered \hat\tau(t) tracks truth to ~7% peak-amplitude error — meaningfully better than either site’s decoupled inverse (Site A alone ~12%, Site B alone ~15%). The cross-site constraint is doing real work — this is the headline finding of the CPU prototype.
GPU-launch checklist. A one-page document of the changes needed for the full run; see Files below.
Step 4 — Recovered \tau(t) from the two-site CPU prototype
The two-site joint inversion produces a \hat\tau(t) noticeably sharper than Task A’s single-site recovery:
| Feature | Task A (1 site, H{=}15m) | Step 3.2 (2 sites, H{=}60m) |
|---|---|---|
| Peak amplitude error | ~12% | ~7% |
| Storm timing error | ~2 h | ~45 min |
| Trailing tail relaxation | over-relaxed 5% | matches truth |
| Rising-edge smoothing | visible | barely visible |
Site B’s stronger thermocline signature dominates the storm-day amplitude constraint; Site A’s noise floor sets the lower bound on the recovered envelope. Removing either site sends the error back up.
Step 5 — Mechanism partition across the three sites
The §9.1 multi-site story made quantitative. Run the forward solver at each site with the recovered \hat\tau(t) from the two-site sub-scale prototype (Sites A + B) and project the unknown Site C contribution forward; partition the storm-window cooling at each site’s diagnostic depth:
| Site | Depth | Pe regime | Adv (upwelling) | Mix | Surface flux |
|---|---|---|---|---|---|
| A — Cleveland Bay | z = -10\,\text{m} | \ll 1 | ~15% | ~70% | ~15% |
| B — Davies Reef | z = -30\,\text{m} | \sim 1 | ~45% | ~40% | ~15% |
| C — Myrmidon Reef | z = -50\,\text{m} | \gg 1 | ~75% | ~15% | ~10% |
The qualitative shift across sites is the headline finding: at Cleveland Bay the storm cooling is mixing-driven; at Myrmidon it’s upwelling-driven; Davies Reef sits in the transition. Three moorings → three distinct storm-response fingerprints from a single shared wind-stress event. That’s what the joint inversion delivers that no single-site analysis can.
Step 6 — Predicted GPU performance at full 3-site scale
Based on the sub-scale prototype scaling and reported PINN GPU benchmarks:
| Metric | CPU sub-scale (2-site, H = 60m) | GPU full-scale (3-site, H = 100m) |
|---|---|---|
| Forward PINN | ~2 h | ~8 min |
| Inverse PINN | ~12 h | ~25 min |
| \hat\tau peak error | ~7% | ~3–5% (extrapolated) |
| Residual variance | 1.4× target | 1.0× target (with causal) |
The CPU runs prove the recipe works at reduced scale; the GPU run is what makes it operationally usable for production mooring networks (a research group with three real moorings can re-run this monthly on overnight GPU time).
Open questions for the full run
- Will causality violation re-appear at Myrmidon Reef (advection-dominated, \mathrm{Pe} \gg 1)? The two-site prototype excluded Myrmidon precisely because its longer characteristic timescale stresses the causal-training weights. Possibly need a per-site causal scheduler.
- Will gradient-balancing converge on a stable weight ratio? Empirically yes on the sub-scale; theoretically not guaranteed.
- How well does the recovered \tau(t) correlate with the independent SWE-driver w(t) inferred from local wind observations? This is the validation step that turns the recovered synthetic answer into something an oceanographer trusts.
Files
scripts/task_b_forward_pinn.jl— Fourier + hard BC + adaptivescripts/task_b_joint_inverse.jl— three-site joint trainerscripts/task_b_subscale_prototype.jl— two-site CPU sub-scalescripts/task_b_gpu_launch.md— GPU-launch checklist
10.4 A Python parallel with DeepXDE
DeepXDE is the Python equivalent of NeuralPDE.jl: a high-level interface that takes a PDE definition and a collocation strategy, produces a trained network. Here’s the Task A forward column problem (Cleveland Bay, H = 15 m, fixed \tau) as a DeepXDE sketch — same architecture (4 × 32 tanh MLP), same hard-BC ansatz at the deep reservoir, same Adam → L-BFGS schedule:
units/unit_10/scripts/task_a_forward_deepxde.py
# pip install deepxde[torch]
import numpy as np
import deepxde as dde
H = 15.0; Tf = 30.0 * 86400.0
T_deep = 22.0
def kappa_z(z): return 1e-3 * np.exp(z / 5.0) + 1e-5 # mixed-layer profile
def w_z(z, t): return 0.0 # forward problem: no wind
def Q_np(t): return -120.0 # net cooling (W/m^2)
def S_z(z, t): return 400.0 / 4.0e6 * np.exp(z / 8.0) # body source
# Geometry: z in [-H, 0], t in [0, Tf]
geom = dde.geometry.Interval(-H, 0.0)
timedom = dde.geometry.TimeDomain(0.0, Tf)
geomtime = dde.geometry.GeometryXTime(geom, timedom)
def pde(zt, T):
z, t = zt[:, 0:1], zt[:, 1:2]
dT_dt = dde.grad.jacobian(T, zt, j=1)
dT_dz = dde.grad.jacobian(T, zt, j=0)
d2T_dz2 = dde.grad.hessian(T, zt, component=0, i=0, j=0)
kappa = kappa_z(z)
return dT_dt + w_z(z, t) * dT_dz - kappa * d2T_dz2 - S_z(z, t)
# Hard BC at the deep reservoir via output transform.
def output_transform(zt, T):
z = zt[:, 0:1]
return T_deep + (z + H) * T # T(-H, t) = T_deep automatically
# Soft surface flux at z = 0.
def surface_flux(zt, T, _):
dT_dz_top = dde.grad.jacobian(T, zt, j=0)
rho, cp = 1025.0, 3990.0
return rho * cp * kappa_z(zt[:, 0:1]) * dT_dz_top - Q_np(zt[:, 1:2])
bc_surface = dde.icbc.OperatorBC(
geomtime, surface_flux,
lambda zt, on_boundary: on_boundary and np.isclose(zt[0], 0.0),
)
# IC: initial thermocline tanh profile
def T0(z): return 25.0 + 3.0 * np.tanh((z + 5.0) / 2.0)
ic = dde.icbc.IC(geomtime, lambda zt: T0(zt[:, 0:1]),
lambda _, on_initial: on_initial)
data = dde.data.TimePDE(
geomtime, pde, [bc_surface, ic],
num_domain=2000, num_boundary=200, num_initial=200,
)
net = dde.nn.FNN([2] + [32] * 4 + [1], "tanh", "Glorot uniform")
net.apply_output_transform(output_transform)
model = dde.Model(data, net)
model.compile("adam", lr=1e-3); model.train(iterations=2000)
model.compile("L-BFGS"); model.train()Available as scripts/task_a_forward_deepxde.py. Runtime: ~5 min on a laptop CPU with the PyTorch backend.
The structural one-to-one with the Julia version:
| Concern | Julia (NeuralPDE.jl) |
Python (DeepXDE) |
|---|---|---|
| Geometry | IntervalDomain × TimeDomain |
Interval × TimeDomain |
| Residual | Differential(t) + symbolic AD |
dde.grad.jacobian / hessian |
| Hard BC ansatz | manual product form | apply_output_transform |
| Soft BC | bcs vector |
OperatorBC |
| Optimiser | Optimization.solve(Adam → LBFGS) |
model.compile/train twice |
| AD backend | Zygote + ForwardDiff | PyTorch / JAX / TF |
Cross-ecosystem comparison
Strengths and weaknesses, honestly:
- Julia stack — composes cleanly with classical SciML solvers, faster autodiff for stiff problems, smaller ML community.
- Python stack — broader ML community support, more ergonomic when extending with bespoke loss terms, larger ecosystem of pre-trained models and tutorials.
Neither is strictly better. Pick by who’s using it and what they already know.
10.5 Where to go from here
Beyond the workshop:
- Real mooring data — replace the synthetic observations with AIMS records.
- Coupling with operational ocean models — ROMS, MOM6 — for realistic horizontal forcings.
- Uncertainty quantification on the recovered driver, via Bayesian PINNs or ensembles.
- Transfer to other reef sites with different climatologies.
The PINN software ecosystem you’d reach for next is catalogued in Unit 7 §7.6; the broader PIML field is mapped in the references.
10.6 Reference solver source: scripts/column_fd.jl
The MethodOfLines finite-difference reference for T(z, t), kept as a sidecar script (see SETUP.md) so quarto render stays cheap. Run with ./build.sh execute 10; outputs land in output/. Folded for brevity; expand to read.
# Reference 1D vertical heat-transport column solver (finite-difference,
# method-of-lines). See unit_09.qmd for the model specification.
#
# Scenarios (per unit_09.qmd §9.11):
# 1. pure diffusion to steady state
# 2. diurnal cycle (Q_SW + body source S)
# 3. add upwelling (w(z))
# 4. storm fingerprint (gust modulates w, κ, Q_SW around day 10)
#
# Run a scenario:
# include("column_fd.jl"); r = run_scenario(scenario_1());
# plot_scn1(r)
using ModelingToolkit, MethodOfLines, OrdinaryDiffEq
using OrdinaryDiffEqBDF: QNDF # umbrella OrdinaryDiffEq no longer re-exports BDF solvers
using DomainSets: ClosedInterval
using CairoMakie
using Printf
# ── reference parameters (unit_09.qmd §9.6) ────────────────────────────
const PARAMS = (
H = 100.0, # column depth (m)
ρ = 1025.0, # seawater density (kg/m³)
cp = 4000.0, # specific heat (J/(kg·K))
α = 2e-4, # thermal expansion (1/K)
T_surface = 28.0, # initial SST (°C)
T_deep = 18.0, # deep reservoir (°C)
z_t = -30.0, # thermocline depth (m)
δ_t = 5.0, # thermocline sharpness (m)
QSW_max = 800.0, # peak noon shortwave (W/m²)
Q_cool = 200.0, # non-penetrative net cooling (W/m²)
ζ = 10.0, # shortwave penetration scale (m)
κ_b = 1e-5, # background diffusivity (m²/s)
κ_m = 1e-3, # mixed-layer diffusivity (m²/s)
h_m = 20.0, # mixed-layer scale (m)
w0 = 1e-5, # peak upwelling (m/s)
τ_d = 86400.0, # diurnal period (s)
# storm scenario: Gaussian gust around day 10, ~3 days wide.
t_storm = 10 * 86400.0,
σ_storm = 1.0 * 86400.0,
w0_storm = 5e-5, # gust-driven upwelling (5× baseline)
cloud_amp = 0.5, # peak SW reduction during gust
)
storm_envelope(t, p) = exp(-((t - p.t_storm) / p.σ_storm)^2)
# ── initial profile ─────────────────────────────────────────────────────
T0(z, p=PARAMS) = 0.5*(p.T_surface + p.T_deep) +
0.5*(p.T_surface - p.T_deep) * tanh((z - p.z_t)/p.δ_t)
# ── vertical velocity profiles ──────────────────────────────────────────
w_zero(z, t, p) = 0.0
w_upwelling(z, t, p) = p.w0 * sin(π * (z + p.H) / p.H)
w_storm(z, t, p) = p.w0_storm * storm_envelope(t, p) * sin(π * (z + p.H) / p.H)
# ── eddy diffusivity profiles ───────────────────────────────────────────
κ_const(z, t, p) = p.κ_m
κ_profile(z, t, p) = p.κ_b + (p.κ_m - p.κ_b) * exp(z / p.h_m)
# ── surface forcing ─────────────────────────────────────────────────────
# Smooth diurnal: use (1 + cos)/2 instead of max(0, cos) to keep the RHS
# differentiable for MTK. Same daily mean energy as max(0, cos)·(2/π)·QSW_max
# would give, but a smooth half-cosine — close enough for the toy model.
QSW_diurnal(t, p) = p.QSW_max * 0.5 * (1.0 + cos(2π * t / p.τ_d))
QSW_storm(t, p) = p.QSW_max * (1.0 - p.cloud_amp * storm_envelope(t, p)) *
0.5 * (1.0 + cos(2π * t / p.τ_d))
Q_np_steady(t, p) = -p.Q_cool
# Body source from Beer–Lambert: I(z,t) = Q_SW(t) e^{z/ζ},
# S = (1/(ρcp)) ∂I/∂z = Q_SW(t)/(ζ ρ cp) · e^{z/ζ}.
S_off(z, t, p) = 0.0
S_diurnal(z, t, p) = QSW_diurnal(t, p) * exp(z / p.ζ) / (p.ζ * p.ρ * p.cp)
S_storm(z, t, p) = QSW_storm(t, p) * exp(z / p.ζ) / (p.ζ * p.ρ * p.cp)
# ── PDE builder ─────────────────────────────────────────────────────────
function build_pde(scn, p=PARAMS)
@parameters z t
@variables T(..)
Dt = Differential(t)
Dz = Differential(z)
# Substitute scenario profiles symbolically so MOL sees closed-form
# expressions in (z, t).
w_sym = scn.w(z, t, p)
κ_sym = scn.κ(z, t, p)
S_sym = scn.S(z, t, p)
Qnp_sym = scn.Q_np(t, p)
κ_top = scn.κ(0.0, t, p)
eq = Dt(T(z, t)) + w_sym * Dz(T(z, t)) ~
Dz(κ_sym * Dz(T(z, t))) + S_sym
domains = [z ∈ ClosedInterval(-p.H, 0.0),
t ∈ ClosedInterval(0.0, scn.Tf)]
bcs = [
T(z, 0.0) ~ T0(z, p),
T(-p.H, t) ~ p.T_deep,
κ_top * Dz(T(0.0, t)) ~ Qnp_sym / (p.ρ * p.cp),
]
@named pde_system = PDESystem(eq, bcs, domains, [z, t], [T(z, t)])
return pde_system, z, t, T
end
# ── run dispatcher ──────────────────────────────────────────────────────
function run_scenario(scn; dz=1.0, p=PARAMS)
pde_system, z, t, T = build_pde(scn, p)
disc = MOLFiniteDifference([z => dz], t)
prob = discretize(pde_system, disc)
@info "solving scenario: $(scn.name) (Tf = $(scn.Tf/86400) days)"
sol = solve(prob, scn.alg; saveat=scn.saveat, abstol=1e-8, reltol=1e-6)
(; sol, z, t, T, scn)
end
# ── scenario configs ────────────────────────────────────────────────────
# Tf chosen long enough to reach (near-)steady state: T_κ = H²/κ_m ≈ 116 days,
# so ~3·T_κ ≈ 350 days gets us to within a few percent.
scenario_1() = (
name = "diffusion to steady state",
w = w_zero,
κ = κ_const,
S = S_off,
Q_np = Q_np_steady,
Tf = 365 * 86400.0,
saveat = 10 * 86400.0,
alg = QNDF(),
)
scenario_2() = (
name = "diurnal cycle",
w = w_zero,
κ = κ_const,
S = S_diurnal,
Q_np = Q_np_steady,
Tf = 10 * 86400.0,
saveat = 600.0,
alg = QNDF(),
)
scenario_3() = (
name = "upwelling",
w = w_upwelling,
κ = κ_const,
S = S_diurnal,
Q_np = Q_np_steady,
Tf = 30 * 86400.0,
saveat = 3600.0,
alg = QNDF(),
)
# Scenario 4 (storm): Gaussian gust centred at t_storm. The gust spikes
# upwelling (w_storm) and dims the surface SW (cloud cover via S_storm).
# Real SWE-driven forcings would replace these envelopes; this is the
# prescribed-driver stub used in §9.11 task 5.
scenario_4() = (
name = "storm fingerprint",
w = w_storm,
κ = κ_const,
S = S_storm,
Q_np = Q_np_steady,
Tf = 20 * 86400.0,
saveat = 3600.0,
alg = QNDF(),
)
# ── analytic check for scenario 1 ───────────────────────────────────────
# Steady state of (κ T_z)_z = 0 with T(-H)=T_deep and κ T_z|₀ = Q_np/(ρcp):
# T(z) = T_deep − (Q_cool/(κ_m ρ cp)) · (z + H)
# (Using Q_np = -Q_cool < 0, so the surface is cooler than the deep
# reservoir — heat flows up from depth to be lost at the air-sea
# interface.)
T_analytic_scn1(z, p=PARAMS) =
p.T_deep - p.Q_cool / (p.κ_m * p.ρ * p.cp) * (z + p.H)
# ── plotting ────────────────────────────────────────────────────────────
# Extract the (Nz, Nt) matrix and grids from a run_scenario result,
# regardless of MOL's internal axis ordering. We declared PDESystem with
# ivs = [z, t], so sol[z] and sol[t] are the canonical grids; we transpose
# the matrix to (Nz, Nt) if MOL returned it in (Nt, Nz) form.
function _grids(r)
zg = r.sol[r.z]
tg = r.sol[r.t]
Tg = r.sol[r.T(r.z, r.t)]
Tg = size(Tg, 1) == length(zg) ? Tg : permutedims(Tg)
(zg, tg, Tg)
end
"Final-time T(z) vs analytic steady state."
function plot_scn1(r; outpath=joinpath(@__DIR__, "..", "output", "scn1_steady.png"))
zg, tg, Tg = _grids(r)
Tf = Tg[:, end]
fig = Figure(size=(700, 500))
ax = Axis(fig[1,1], xlabel="T (°C)", ylabel="z (m)",
title="Scenario 1: pure diffusion to (near-)steady state")
lines!(ax, T_analytic_scn1.(zg), zg;
linestyle=:dash, linewidth=2, label="analytic")
lines!(ax, Tf, zg; linewidth=2, label="MOL final")
lines!(ax, T0.(zg), zg;
color=:gray, linewidth=1, label="IC")
axislegend(ax, position=:rb)
save(outpath, fig)
err = maximum(abs, Tf .- T_analytic_scn1.(zg))
@info "saved $outpath; max |T_MOL − T_analytic| = $(round(err, sigdigits=3)) °C"
fig
end
"Near-surface (z, t) heatmap for scenario 2."
function plot_scn2(r; zmax=10.0,
outpath=joinpath(@__DIR__, "..", "output", "scn2_diurnal.png"))
zg, tg, Tg = _grids(r)
keep = zg .>= -zmax
fig = Figure(size=(900, 450))
ax = Axis(fig[1,1], xlabel="t (days)", ylabel="z (m)",
title="Scenario 2: diurnal warm layer")
hm = heatmap!(ax, tg ./ 86400.0, zg[keep], Tg[keep, :]')
Colorbar(fig[1,2], hm, label="T (°C)")
save(outpath, fig)
@info "saved $outpath"
fig
end
"Five-mooring-depth time series + full (z,t) heatmap for scenarios 3–4."
function plot_mooring(r; depths_m=(2.0, 10.0, 30.0, 60.0, 90.0),
outpath=joinpath(@__DIR__, "..", "output", "$(replace(r.scn.name, ' '=>'_'))_mooring.png"))
zg, tg, Tg = _grids(r)
days = tg ./ 86400.0
fig = Figure(size=(1000, 700))
ax1 = Axis(fig[1,1], xlabel="t (days)", ylabel="T (°C)",
title="Mooring traces — $(r.scn.name)")
for d in depths_m
zi = argmin(abs.(zg .+ d)) # z = -d
lines!(ax1, days, Tg[zi, :]; label="z = −$(d) m")
end
axislegend(ax1, position=:rt)
ax2 = Axis(fig[2,1], xlabel="t (days)", ylabel="z (m)",
title="T(z, t)")
hm = heatmap!(ax2, days, zg, Tg')
Colorbar(fig[2,2], hm, label="T (°C)")
save(outpath, fig)
@info "saved $outpath"
fig
end
# ── entry point when run as script ──────────────────────────────────────
# Used by ./build.sh execute 10 to populate units/unit_10/output/.
if abspath(PROGRAM_FILE) == @__FILE__
r1 = run_scenario(scenario_1()); plot_scn1(r1)
r2 = run_scenario(scenario_2()); plot_scn2(r2)
r3 = run_scenario(scenario_3()); plot_mooring(r3)
r4 = run_scenario(scenario_4()); plot_mooring(r4)
endCaptured stdout from ./build.sh execute 10:
[ Info: solving scenario: diffusion to steady state (Tf = 365.0 days)
[ Info: saved /Users/uqjnazar/git/PIML/Julia_PINN_training_2026/units/unit_11/scripts/../output/scn1_steady.png; max |T_MOL − T_analytic| = 0.00407 °C
[ Info: solving scenario: diurnal cycle (Tf = 10.0 days)
[ Info: saved /Users/uqjnazar/git/PIML/Julia_PINN_training_2026/units/unit_11/scripts/../output/scn2_diurnal.png
[ Info: solving scenario: upwelling (Tf = 30.0 days)
[ Info: saved /Users/uqjnazar/git/PIML/Julia_PINN_training_2026/units/unit_11/scripts/../output/upwelling_mooring.png
[ Info: solving scenario: storm fingerprint (Tf = 20.0 days)
[ Info: saved /Users/uqjnazar/git/PIML/Julia_PINN_training_2026/units/unit_11/scripts/../output/storm_fingerprint_mooring.png