Unit 9: Project Specification

Published

12/06/2026

The capstone project, written out concretely. The science behind every term sits in Unit 8; the solution — finite-difference reference, forward PINN, inverse PINN — is in Unit 10. This unit is the spec: every equation, every parameter, every toy task you’ll need to reproduce the workshop.

9.1 The synthetic scenario

You are a fictional oceanographer with three thermistor chains deployed along a cross-shelf transect of the central Great Barrier Reef, offshore from Townsville. Each chain samples temperature at five depths, hourly, for 30 days. A storm passes through on day 10, and across the three sites the deeper sensors record signatures that do not line up with each other — one site cools quickly and recovers, another shows a deep slow cooling, and the third looks almost untouched at the surface.

Synthetic but plausibly-sited mooring locations along the central GBR, drawn on real OSM coastlines (CartoDB Voyager tiles). Panel (a) is the regional cross-shelf view of all three moorings — Cleveland Bay (inshore) through Davies Reef (mid-shelf) to Myrmidon Reef (outer shelf). Panel (b) zooms in on Cleveland Bay so Site A’s coastal context near Townsville is legible.
Site Name Lat, Lon Depth Sensor depths (m) Dominant regime
A Cleveland Bay -19.20°S, 146.81°E 15 m 1, 4, 8, 12, 14 tidal mixing, \mathrm{Pe}\ll 1
B Davies Reef -18.83°S, 147.647°E 60 m 2, 10, 25, 45, 58 classical thermocline, \mathrm{Pe}\sim 1
C Myrmidon Reef -18.27°S, 147.39°E 100 m 2, 15, 40, 70, 95 advection-dominated, \mathrm{Pe}\gg 1

All three sites are AIMS-monitored locations on the central GBR; the coordinates are sensible (Davies Reef’s real AIMS tower sits at 18°49'31''S, 147°38'50''E) but the time series are synthetic — generated by running a per-site finite-difference column model through a common storm scenario with \sigma = 0.05\,°\mathrm{C} observation noise. Source: units/unit_10/scripts/generate_mooring_csvs.jl; data: units/unit_10/data/mooring_{A,B,C}.csv plus units/unit_10/data/sites_metadata.csv.

Three hypotheses for the deeper-sensor cooling, applied at each site:

  1. The storm intensified upwelling, pumping cold water up from below.
  2. The storm intensified vertical mixing, drawing heat away from the surface faster.
  3. The storm caused reduced surface heating (cloud cover, evaporation).

You want a model simple enough to test these hypotheses against each mooring’s data. The 1D column below — coupled to a 2D shallow-water driver — is that model. The capstone solves it three times, once per site, with each site giving a qualitatively different fingerprint of the same event.

ImportantHow the inverse answers the mechanism question

Tasks A and B don’t recover three separate drivers — they recover one scalar wind-stress envelope \tau(t) that implicitly couples all three mechanisms through the column model: Ekman pumping sets w(z, t; \tau), the wind-mixing closure modulates \kappa_m(t; \tau), and the cloud-cover correlation reduces Q_{\text{SW}}^{\max}(t; \tau) during the storm.

So the recovered \hat\tau(t) doesn’t by itself answer “which hypothesis dominated”. The mechanism-discrimination step is the partition deliverable in §9.9.5 / §9.10.5: run the calibrated forward model with \hat\tau, decompose the cooling at a diagnostic depth (default: z = -30\,\text{m}, near the thermocline) into the three terms

\underbrace{\Delta T_{\text{adv}}(t)}_{\int -w\,\partial_z T\, dt} \;+\; \underbrace{\Delta T_{\text{mix}}(t)}_{\int \partial_z(\kappa\,\partial_z T)\, dt} \;+\; \underbrace{\Delta T_{\text{flux}}(t)}_{\int \mathcal{S} + \text{surface BC}\, dt} \;=\; \Delta T_{\text{total}}(t),

and report which integrated contribution dominates over the storm window. The single-driver inversion plus the partition plot is what turns “we recovered \tau” into “the storm cooling at Davies Reef was 65% upwelling, 30% mixing, 5% reduced heating”.

If you want a direct mechanism-discrimination inverse instead (recover \tau_w, \tau_\kappa, \tau_Q separately), that’s a genuine extension and a stretch goal — see the “open questions” at the end of §9.10.

TipDecoupled per-site vs. joint three-site inversion

The simplest reading of the capstone is three independent inverse problems: each site recovers its own local wind-stress envelope \tau_i(t) from its own mooring data. That’s what §9.9 (Task A) below sets up.

An extended problem couples all three sites by sharing a single storm-event forcing \tau(t) (an atmospheric input felt across the whole ~100 km transect), and recovers it jointly from all three moorings’ data. The joint problem is better posed (three complementary regimes constrain one signal) and tells one coherent story instead of three parallel ones. Both versions are in scope for the workshop — the decoupled version is the baseline; the joint version is a stretch goal documented as a closing exercise.

Note✏️ Section exercise — read the mooring CSVs before modelling them

Load units/unit_10/data/mooring_{A,B,C}.csv and do the look-before-you-leap pass every inverse problem deserves:

  1. Plot all five depths per site on shared axes (15 traces, three panels). Mark day 10.
  2. From the raw traces alone — no model — write down for each site: which sensor cools most during the storm window, the approximate lag between surface and deepest response, and whether the site recovers by day 30.
  3. Estimate the noise floor empirically: difference each trace at one-hour lag and compute the standard deviation of the increments in a calm window (days 1–5). Do you recover something near the advertised \sigma = 0.05\,°\mathrm{C} — and why is the raw increment SD an over-estimate?

💡 Hint

CSV.read(path, DataFrame) — the columns are time_hours plus one per sensor depth. For the noise floor, diff(df.T_col) in a calm window and std; the factor between increment-SD and σ comes from \mathrm{Var}(X_{t+1}-X_t) = 2\sigma^2 for white noise. Think about which sensor’s calm window is least calm.

9.2 The coupled model

The 1D column is the central object. A 2D shallow-water solver lives alongside it to drive the column with a realistic vertical velocity, but does not itself receive any feedback from the column (the SWE solve is offline). The three pieces:

1D vertical heat transport

Imagine a vertical pipe of seawater, 100 m tall, sitting under the sea surface. Heat enters from the top (sun, atmosphere). Cold water can be pushed up from the bottom (currents elsewhere). Inside the pipe, turbulence stirs warm and cold water together. We track only one thing: the temperature T as a function of depth z and time t. We ignore everything horizontal — no horizontal currents inside the pipe, no horizontal temperature variation. One column, by itself.

2D shallow water for horizontal flow

The linearised shallow water equations on a 2D domain produce horizontal velocity \mathbf{u}_h(x, y, t) in response to a wind-stress forcing. We use these as the driver of the column’s upwelling, not as a coupled-PDE PINN problem (that’s the path de Wolff et al. (2021) found brittle). The SWE solve is offline; its output is a time series at the mooring location.

Coupling: w from \nabla_h\!\cdot\!\mathbf{u}_h

By incompressibility, \partial_z w = -\nabla_h\!\cdot\!\mathbf{u}_h. With w(-H) = 0 and the SWE flow uniform in depth,

w(z, t) = -(z + H)\,\nabla_h\!\cdot\!\mathbf{u}_h(t).

A simple linear-in-z vertical velocity profile, computed offline from the SWE run and fed as a time series into the column model.

Note✏️ Section exercise — build w(z,t) from a divergence trace

Implement the coupling formula. Take a synthetic divergence time series mimicking a storm response, \nabla_h\!\cdot\!\mathbf{u}_h(t) = -2\times10^{-7}\, \exp[-((t - 10\,\text{d})/1\,\text{d})^2]\;\text{s}^{-1}, over 30 days, and compute w(z, t) = -(z + H)\,\nabla_h\!\cdot\!\mathbf{u}_h(t) for the H = 100 m column. Plot w at z = -10, -50, -90 m. Check three things: w(-H, t) = 0 always; the sign during the storm (negative divergence = convergence… or is it? — work out whether this trace gives upwelling or downwelling); and the peak |w| in m/day at mid-column. How does that peak compare with the reference w_0 \sim 1 m/day, and what does the linear-in-z shape assume physically about the horizontal flow?

💡 Hint

Implement w(z,t) = -(z+H)\,\nabla_h\!\cdot\mathbf{u}_h(t) directly and evaluate at three depths. For the sign, integrate continuity \partial_z w = -\nabla_h\!\cdot\mathbf{u}_h upward from w(-H)=0 rather than guessing. Convert to m/day (×86 400) before comparing with w_0.

9.3 The column equation

\underbrace{\frac{\partial T}{\partial t}}_{\text{(a) change in time}} \;+\; \underbrace{w(z,t)\,\frac{\partial T}{\partial z}}_{\text{(b) advection}} \;=\; \underbrace{\frac{\partial}{\partial z}\!\left(\kappa(z,t)\,\frac{\partial T}{\partial z}\right)}_{\text{(c) diffusion}} \;+\; \underbrace{\mathcal{S}(z,t)}_{\text{(d) sun heating}}

on the domain z \in [-H, 0], t \in [0, T_f], with z = 0 at the surface and z = -H at the bottom. The four terms (a)–(d) were unpacked term-by-term in Unit 8 §8.3–§8.4; here we’ll just write them down.

NoteSign convention for w

w = \mathrm{d}z/\mathrm{d}t. With z measured upward from the surface (so z < 0 inside the ocean), w > 0 is upward motion of water. Upwelling — cold deep water rising — therefore corresponds to w > 0. The advection term w \, \partial_z T then has the right physical sign.

Note✏️ Section exercise — get the signs right, once and forever

Sign errors in the advection term are the single most common bug in student column models. Settle yours with a two-minute argument and a five-minute check:

  1. Argument. In a stably stratified column (\partial_z T > 0: warmer above), upwelling is w > 0. Show from the equation (term (b) moved to the right-hand side: \partial_t T = -w\,\partial_z T + \ldots) that this cools every fixed depth — as cold deep water should.
  2. Check. Solve the pure-advection column \partial_t T = -w_0\,\partial_z T with a first-order upwind scheme, w_0 = 10^{-4} m/s (exaggerated for visibility), the §9.4 tanh initial profile, for 2 days. Confirm the thermocline translates upward and every sensor depth cools. Then flip the sign of w_0 and confirm downwelling warms the interior instead.

💡 Hint

For the numerical check, first-order upwind with the stencil direction set by the flow: w > 0 carries information from below, so the difference is (T[i] - T[i-1])/dz. Keep \Delta t \le \Delta z/|w|. If your thermocline moves the wrong way, you’ve either flipped the stencil or the sign in the update — which is exactly the bug the exercise inoculates against.

9.4 Boundary conditions

Top, z = 0: surface heat flux

\kappa\,\frac{\partial T}{\partial z}\bigg|_{z=0} \;=\; \frac{Q_{\text{np}}(t)}{\rho_0\, c_p}.

The downward diffusive heat flux into the ocean from the air–sea interface equals the non-penetrative part of the surface heat budget, divided by the volumetric heat capacity of seawater (so units come out as K·m/s, matching \kappa\, \partial_z T). With z measured upward, \partial_z T > 0 corresponds to a warmer surface, and \kappa\, \partial_z T is the downward heat flux — positive Q_{\text{np}} heats the ocean by driving the surface temperature up.

  • Q_{\text{np}}(t) — net non-penetrative heat flux at the surface. Sum of: longwave radiation, sensible heat, latent heat (evaporation). Units: W/m². Positive means heat into the ocean.
  • \rho_0 — reference seawater density, \approx 1025\,\text{kg/m}^3.
  • c_p — specific heat of seawater, \approx 3990\,\text{J/(kg·K)}.

The penetrating shortwave Q_{\text{SW}} is not in this BC — it lives inside \mathcal{S}(z, t) as a body source instead.

Bottom, z = -H: cold reservoir

T(-H, t) \;=\; T_{\text{deep}}.

A fixed-temperature reservoir, as motivated in Unit 8 §8.5. Alternative: zero-flux, \partial_z T |_{-H} = 0, if you prefer an insulating floor.

Initial condition

T(z, 0) \;=\; T_0(z), \qquad T_0(z) = \tfrac12 (T_{\text{surface}} + T_{\text{deep}}) + \tfrac12 (T_{\text{surface}} - T_{\text{deep}})\, \tanh\!\bigl((z - z_t)/\delta_t\bigr),

with z_t \approx -30\,\text{m} sitting the thermocline at 30 m depth and \delta_t \approx 5\,\text{m} setting its sharpness.

Note✏️ Section exercise — what does the bottom BC actually cost?

The spec offers two bottom boundary conditions: Dirichlet (T(-H) = T_{\text{deep}}) and zero-flux (\partial_z T|_{-H} = 0). They are not interchangeable — find out how much it matters:

  1. For the steady-state pure-diffusion problem (constant \kappa, constant Q_{\text{np}} < 0, no advection, no source), derive the steady profile under each bottom BC. One of the two has no steady state at all — which, and why? (Think about where the heat extracted at the surface comes from.)
  2. For a 30-day run, estimate how far up from the bottom the choice of BC can influence the solution (\sqrt{\kappa_b\, \times 30\,\text{d}} with the background diffusivity). Which of the five Site B sensor depths (2, 10, 25, 45, 58 m on a 60 m column) could tell the difference?
  3. Conclude in one sentence: for the capstone inverse problem, is the bottom-BC choice a risk to the recovered \hat\tau(t)?

💡 Hint

Part 1: integrate the steady PDE over the column and ask where the surface heat extraction is resupplied from under each BC — one of the two has no supply. Part 2 is one square root with the background \kappa_b (that’s what governs near the floor). Part 3 follows from comparing part 2’s reach against the sensor depths.

9.5 Forcing functions

Surface heat flux

A minimal diurnal model:

Q_{\text{SW}}(t) \;=\; Q_{\text{SW}}^{\max}\, \max\!\bigl(0,\, \cos(2\pi t / \tau_d)\bigr), \qquad Q_{\text{np}}(t) \;=\; -\,Q_{\text{cool}}.

  • \tau_d = 86400\,\text{s} — one day.
  • Q_{\text{SW}}^{\max} — peak noon shortwave (e.g. 800\,\text{W/m}^2).
  • Q_{\text{cool}} — steady net cooling from longwave + evaporation (e.g. 200\,\text{W/m}^2).

Penetrating shortwave (the body source)

Sunlight is absorbed exponentially with depth (Beer–Lambert):

I(z, t) \;=\; Q_{\text{SW}}(t)\,e^{z/\zeta}, \qquad \mathcal{S}(z, t) \;=\; \frac{1}{\rho_0\, c_p}\,\frac{\partial I}{\partial z}.

  • \zeta — light penetration scale (e.g. \zeta \approx 10\,\text{m} for a single-band model; refined two-band Paulson–Simpson splits this into a 0.35 m red band and a 23 m blue–green band).

Vertical velocity (upwelling)

Prescribe a simple profile pinned to zero at top and bottom:

w(z) \;=\; w_0\, \sin\!\bigl(\pi\,(z+H)/H\bigr),

with w_0 > 0 giving upward motion. A typical magnitude is w_0 \sim 10^{-5}\,\text{m/s}, i.e. about 1 m/day.

Eddy diffusivity

Three closures in increasing realism (pick one per experiment):

  1. Constant: \kappa = \kappa_0.
  2. Profile: \kappa(z) = \kappa_b + (\kappa_m - \kappa_b)\,e^{z/h_m} — large near the surface (mixed layer of scale h_m), small at depth.
  3. Stratification-dependent: \kappa(z, t) = \kappa_b + \kappa_0 / (1 + 5\,\mathrm{Ri})^2, with the Richardson number \mathrm{Ri} = N^2 / (\partial_z U)^2, buoyancy frequency N^2 = \alpha g\, \partial_z T, and a prescribed shear \partial_z U. Mixing is suppressed in stably stratified water (see Unit 8 §8.5).
Note✏️ Section exercise — plot every forcing before you trust it

One figure, four panels, using the reference values of §9.6: (a) Q_{\text{SW}}(t) over 3 days; (b) the Beer–Lambert source \rho_0 c_p\,\mathcal{S}(z) at noon with \zeta = 10 m; (c) the w(z) sine profile; (d) all three \kappa closures overlaid (use \mathrm{Ri} from the §9.4 initial profile and \partial_z U = 0.01\,\text{s}^{-1} for closure 3). Then read three numbers off your own figure: the depth at which the noon body source has dropped to 10% of its surface value, w at the Site B thermocline (z = -30 m of H = 60… careful — the sine profile depends on the site’s H), and the factor between closure 2’s \kappa at the surface and at -60 m. Any surprises versus what you assumed while reading?

💡 Hint

Four small plot panels assembled with plot(p1, p2, p3, p4, layout = (2,2)). Two traps built in: the sine profile’s argument uses the site’s H, and closure 3 needs the Ri formula from the §9.4 initial profile (Solution 8.5 has the sech² derivative). Log-x for the κ panel or you’ll see nothing.

9.6 Reference parameter values and dimensionless groups

Symbol Value Units Meaning
H 100 m column depth
T_f 30 days s simulation horizon
\rho_0 1025 kg/m³ seawater density
c_p 3990 J/(kg·K) specific heat
\alpha 2\!\times\!10^{-4} 1/K thermal expansion
T_{\text{surface}} 28 °C initial SST
T_{\text{deep}} 18 °C deep reservoir
z_t,\, \delta_t -30, 5 m thermocline depth, width
Q_{\text{SW}}^{\max} 800 W/m² peak noon SW
Q_{\text{cool}} 200 W/m² non-penetrative net cooling
\zeta 10 m shortwave penetration scale
\kappa_b 10^{-5} m²/s background diffusivity
\kappa_m 10^{-3} m²/s mixed-layer diffusivity
h_m 20 m mixed-layer scale
w_0 10^{-5} m/s peak upwelling (~1 m/day)

The accompanying dimensionless groups — \mathrm{Pe} = w_0 H / \kappa_m \approx 1, T_\kappa = H^2/\kappa_m \approx 116 days, \tau_d / T_\kappa \sim 10^{-2} — are tabulated and interpreted in Unit 8 §8.6.

Note✏️ Section exercise — the daily heat budget, by hand

Does the reference parameter set heat or cool the column on an average day? Integrate over 24 hours:

  1. Total shortwave in: \int_0^{\tau_d} Q_{\text{SW}}^{\max}\max(0, \cos(2\pi t/\tau_d))\,dt — evaluate it analytically (the positive half of a cosine integrates to \tau_d/\pi).
  2. Total non-penetrative out: Q_{\text{cool}} \times \tau_d.
  3. Net daily heat input in J/m², and the implied warming rate of a 20 m mixed layer in K/day (\Delta T = E_{\text{net}} / (\rho_0 c_p h_m)).

Cross-check your K/day against toy-task 2’s claim that the bulk profile “barely changes” over 30 days — is the net warming over a month small compared to the 10 K surface-to-deep contrast? Finally: by what fraction must Q_{\text{SW}}^{\max} drop (storm-cloud scenario) before the daily budget goes negative?

💡 Hint

The positive half of a cosine integrates to amplitude × period/π — no numerics. Then it’s bookkeeping: net J/m² per day, divided by \rho_0 c_p h_m for K/day. The threshold question inverts the same formula: set net = 0 and solve for Q_{SW}^{max}.

9.7 The shared forward problem

Given all four drivers (w, \kappa, Q_{\text{np}}, Q_{\text{SW}}) as time series, predict T(z, t) on the 30-day window.

Solved two ways, both in Unit 10:

  1. A MethodOfLines.jl finite-difference reference (ground truth).
  2. A forward PINN trained against the column equation residual, the surface-flux BC, the bottom Dirichlet BC, and the IC — with the modern fixes from Unit 7 (hard BC, causal training, Fourier features).

The toy-task ladder of §9.11 builds up to the forward problem in five stages. Both Task A and Task B inverse problems start from this same forward solver.

9.8 Which task is for which audience?

The capstone comes in two parallel versions. Pick by your hardware, your time budget, and what you want out of the exercise.

Task A (§9.9) Task B (§9.10)
Audience self-paced learner, one-day workshop, no GPU research project, semester-long student, industrial prototyping
Hardware laptop CPU (any modern M-series or recent x86) NVIDIA GPU for the full run; CPU for sub-scale prototypes
Time budget ~30 min once data is generated ~30 min on GPU per run; ~30 h on CPU
Geometry one site (Cleveland Bay, H = 15 m) three sites jointly (Cleveland Bay + Davies Reef + Myrmidon Reef, H up to 100 m)
PINN toolkit small MLP, soft BCs, hand-tuned weights Fourier features + hard BC + adaptive weights + causal training
Parameter count ~5 000 ~200 000
What it demonstrates the full forward-PINN + inverse-PINN pipeline operational engineering — joint inversion, modern fixes, GPU scaling

Both tasks share §9.7 (forward solve) and §9.11 toy ladder. The remainder of this unit is each task’s full spec.

9.9 Task A — single-site inverse on a laptop (CPU)

The introductory capstone. Recover the storm wind-stress envelope \tau(t) from a single mooring’s data using a small PINN that trains in 10–30 minutes on a laptop CPU.

The goal is to see the full forward-PINN + inverse-PINN pipeline end-to-end without waiting for a GPU. We deliberately keep everything small enough that an L-BFGS run finishes in coffee-break time on commodity hardware.

9.9.1 Spec

  • Site. Cleveland Bay only (H = 15 m, \mathrm{Pe} \ll 1 — diffusion-dominated, the simplest of the three regimes). 5 sensors at z = -1, -4, -8, -12, -14 m.
  • Data. data/mooring_A.csv (synthetic, \sigma = 0.05\,°\text{C} Gaussian noise on hourly samples for 30 days = 3 600 data points).
  • Unknown. A single scalar function \tau(t) — the local wind-stress envelope — over the 30-day window.
  • Given. \kappa(z) (profile closure), Q_{\text{np}}(t), Q_{\text{SW}}(t), T_0(z), T_{\text{deep}}, all of the reference parameter values from §9.6.
  • Networks. T_\theta(z, t) as a 4-layer 32-neuron MLP (\tanh); \tau_\phi(t) as a 2-layer 16-neuron MLP. ~5 000 parameters total.
  • Collocation. N_r = 2000 residual points, N_b = 200 BC points, N_d = 5 \text{ sensors} \times 720 \text{ samples} = 3600 data points.

9.9.2 Workflow

  1. Generate the synthetic data (if not already present) with scripts/generate_mooring_csvs.jl --site A — produces data/mooring_A.csv.
  2. Solve the forward problem with a small PINN against the MethodOfLines.jl reference (§9.7). Hard BC at z = -H via T_\theta(z, t) = T_{\text{deep}} + (z + H)\,N_\theta(z, t). Train Adam(1e-3) for 2 000 iterations then L-BFGS for 500.
  3. Set up the joint inverse loss \mathcal{L} = \lambda_r \mathcal{L}_{\text{PDE}} + \lambda_d \mathcal{L}_{\text{data}} + \lambda_{\text{reg}} \int |\tau_\phi'|^2\,dt + \mathcal{L}_{\text{BC}} + \mathcal{L}_{\text{IC}} with \lambda_r = 1, \lambda_d = 100, \lambda_{\text{reg}} = 10^{-2}.
  4. Train the inverse PINN. Adam(1e-3) for 5 000 iterations then L-BFGS for 1 000.
  5. Diagnose — residual histogram vs t (Unit 7 §7.5) and heat-budget closure at each depth.

9.9.3 Success criteria

Metric Target
Forward PINN L^2 vs FD reference < 0.05\,°\text{C}
Recovered \hat\tau peak-amplitude error < 15%
Storm-day timing error < 2 h
Residual histogram monotonically decreasing in t yes

9.9.4 Expected runtime

Stage Wall-clock (M2 MacBook)
MethodOfLines FD reference (1 site, 30 days) ~5 s
Forward PINN training (Adam 2 000 + L-BFGS 500) ~3 min
Inverse PINN training (Adam 5 000 + L-BFGS 1 000) ~15 min

Total: under 30 minutes from a cold cache. Re-running to retune \lambda_r / \lambda_d adds ~10 minutes per pass.

9.9.5 Deliverables

  1. Plot of \hat\tau(t) overlaid on the synthetic truth.
  2. Mechanism partition plot at z = -10\,\text{m} (Site A’s diagnostic depth, two-thirds down its 15 m column) showing the three integrated cooling contributions \Delta T_{\text{adv}}(t), \Delta T_{\text{mix}}(t), \Delta T_{\text{flux}}(t) over the storm window. This is what actually answers the §9.1 question “which hypothesis dominated” for Cleveland Bay.
  3. Residual-histogram-vs-time diagnostic.
  4. Heat-budget closure plot for each sensor depth.
  5. A one-paragraph honest assessment of where the recovery fails (typically the rising edge — smoothed by the H^1 penalty — and the tail — over-relaxed).

9.9.6 What you don’t do here

  • No cross-site coupling.
  • No Fourier-feature embedding.
  • No adaptive loss weighting.
  • No GPU.

Those land in Task B.

9.10 Task B — three-site joint inverse (GPU first steps)

The advanced capstone. Recover a single shared wind-stress envelope \tau(t) from all three moorings jointly, with the full modern-PINN toolkit, on the full H = 100 m / 30-day domain. CPU runtime is hours; GPU runtime is minutes. We develop the CPU sub-scale prototypes here and write the GPU-launch checklist.

Task B exploits the three mooring regimes — Cleveland Bay’s diffusion-dominated column constrains \kappa, Davies Reef’s mid-Pe column constrains the timing, and Myrmidon’s advection-dominated column constrains \tau’s amplitude. A single coherent storm signal explains all three.

9.10.1 Spec

  • Sites. A + B + C jointly. Three column PINNs T^{(i)}_\theta(z, t), i \in \{A, B, C\}. Shared parameters for the wind-stress envelope \tau_\phi(t); separate parameters for each site’s temperature network.
  • Data. data/mooring_{A,B,C}.csv, totalling 3 × 3 600 = 10 800 data points.
  • Unknown. \tau(t) — one shared scalar function, three moorings’ worth of data constraining it.
  • Networks. T^{(i)}_\theta(z, t) as a 6-layer 128-neuron MLP per site; \tau_\phi(t) as a 4-layer 64-neuron MLP. ~200 000 parameters total.
  • Modern PINN toolkit (Unit 7 §7.3). Fourier-feature embedding \gamma(z, t) = [\sin(B(z,t)), \cos(B(z,t))] for the diurnal cycle; hard BC at z = -H and at the IC; adaptive loss weighting (gradient-balancing) for the per-site residual and BC losses; causal time training.
  • Collocation. N_r = 50\,000 per site, N_d = 3\,600 per site (5 sensors × 720 hourly samples); 10 800 data points across the three sites combined.

9.10.2 Workflow

  1. Generate the three sites’ synthetic data with scripts/generate_mooring_csvs.jl (all three).
  2. Sub-scale prototype on Task A’s column. Run Task B’s architecture (Fourier features, hard BC, adaptive weights) on the single 15 m column with N_r = 5\,000. ~30 min on CPU. Sanity check: should match Task A’s accuracy at ~1.5× the training cost (the modern fixes shouldn’t hurt on easy problems).
  3. Two-site joint inverse on A + B at H = 60 m. Medium architecture (4-layer × 64-neuron), N_r = 10\,000 per site. ~2 h on CPU. Lets you see the joint-vs-decoupled improvement and tune cross-site loss weights.
  4. GPU-launch checklist. Document the changes needed for the full 3-site run:
    • wrap ps, inputs, and data in Lux.gpu_device(),
    • Reactant.@compile the training step,
    • keep the same Adam → L-BFGS schedule,
    • JAX equivalents (jinns / Equinox) if your team prefers the JAX stack (Unit 7 §7.6).
  5. Queue the full run on a GPU when available — the code doesn’t change, only the device and the collocation count.

9.10.3 Success criteria

Metric Sub-scale on CPU Full scale on GPU (predicted)
Sub-scale prototype matches Task A yes, < 1.2× error
Two-site joint \hat\tau peak error < 7%
Full 3-site joint \hat\tau peak error 3–5%
Residual histogram monotone in t, all three sites yes

9.10.4 Expected runtime

Stage CPU (M2 MacBook) GPU (A100 / H100)
Sub-scale prototype (Task A column, Task B architecture) ~30 min ~2 min
Two-site joint inverse (A + B, H = 60 m) ~2 h ~5 min
Forward PINN, joint 3-site, full scale ~6 h ~8 min
Inverse PINN, joint 3-site, full scale ~24 h ~25 min

The CPU full-scale column is not a typo: technically possible but the iteration loop kills the development cycle. Prototype on CPU at reduced scale, then deploy on GPU at full scale.

9.10.5 Deliverables

  1. Sub-scale prototype results on Task A’s column with Task B’s architecture.
  2. Two-site joint-inverse \hat\tau(t) overlaid on the synthetic truth, with the per-site decoupled inverses on the same axes for comparison.
  3. Mechanism partition plots at all three sites (z = -10\,\text{m} for Site A, z = -30\,\text{m} for Site B, z = -50\,\text{m} for Site C — each one near or just below the local thermocline). The partition should show qualitatively different mechanism weights across the three sites — that’s the §9.1 multi-site story made quantitative.
  4. The GPU-launch checklist as a .md document.
  5. A written assessment of what changes at the full 3-site, H = 100 m scale: predicted accuracy, predicted training cost on GPU, what could still go wrong (causality violation at Myrmidon, BC imbalance from the four-orders-of-magnitude scale gap between \kappa at surface and depth).

9.10.6 Open questions for the full GPU run

These are honest “we don’t know yet” items the participant should report on if they get GPU time:

  • Will causality violation re-appear at Myrmidon Reef (advection-dominated, \mathrm{Pe} \gg 1)? Possibly need a per-site causal scheduler.
  • Will gradient-balancing converge on a stable weight ratio? Empirically yes on the sub-scale; theoretically not guaranteed.
  • How well does the recovered \tau(t) correlate with the independent SWE-driver w(t) inferred from local wind observations? This is the validation step that turns the recovered synthetic answer into something an oceanographer trusts.
  • Stretch goal — direct mechanism inversion. Instead of one shared \tau(t) implicitly coupling all three mechanisms, expand the inverse to recover three separate scalar functions \tau_w(t), \tau_\kappa(t), \tau_Q(t) (driving upwelling, mixing, and surface-flux modulation independently). This makes the §9.1 hypothesis discrimination direct rather than partition-inferred, at the cost of a three-fold-larger inverse problem (worse conditioning, more regularisation tuning). A good test of whether the joint three-site data is informative enough to separate three coupled drivers.

9.11 Toy-task ladder (shared)

Build up to the forward problem in stages. Each step isolates one mechanism, so when the full storm scenario lands you can read each contribution off the trace. Plots and code for these scenarios live in Unit 10. Task A and Task B both start from these.

  1. Pure diffusion, steady forcing. Set w = 0, \mathcal{S} = 0, \kappa constant, Q_{\text{np}} constant. Solve to steady state. Expected: a linear T(z) profile balancing surface flux against the deep reservoir. Sanity check: analytic solution exists.

  2. Add the diurnal cycle. Turn on time-varying Q_{\text{SW}}(t) and the body source \mathcal{S}. Keep w = 0, \kappa constant. Expected: a diurnal warm layer in the top few metres that warms in the afternoon and erodes overnight. Bulk profile barely changes over 30 days.

  3. Add upwelling. Turn on w(z) with w_0 = 10^{-5}\,\text{m/s}. Expected: cold water from depth invades the bulk; SST drops slowly over weeks. The diurnal warm layer survives but sits on top of cooler water.

  4. Vary mixing. Switch \kappa from constant to the profile closure with mixed-layer scale h_m. Try h_m = 5, 20, 50 m. Expected: deeper mixed layer \Rightarrow thicker but cooler warm layer at the surface, smoother thermocline.

  5. The synthetic scenario, forward. Run a single wind event in the 2D shallow-water model — a Gaussian gust passing over the mooring on day 10, lasting 3 days. Read off the time series w(t), |\boldsymbol{\tau}|(t), and Q_{\text{SW}}^{\max}(t) (cloud cover tied to the gust) at the mooring location, and feed them into the 1D column. Plot temperature traces at the five mooring depths. Expected: deeper sensors cool first (upwelling fingerprint), surface stays close to baseline because cloud and stress cancel. Compare to “decoupled” runs that perturb only one driver at a time.