Unit 5 - Machine Learning, Statistics, and Optimization
In this unit we explore some deep learning, machine learning, statistics, and optimization libraries in Julia. We also use Makie as an alternative to the plots used in the previous units.
1 Makie and AlgebraOfGraphics
Makie.jl is a powerful and high-performance plotting ecosystem for Julia. It stands out for its speed, interactivity, and a unified API across multiple backends, including GPU-accelerated plotting with GLMakie.jl and publication-quality vector graphics with CairoMakie.jl. While Makie provides a detailed, imperative API for fine-grained control, AlgebraOfGraphics.jl (AoG) is a declarative layer built on top of it. AoG implements a “grammar of graphics” inspired by R’s ggplot2, allowing users to build complex statistical visualizations from tidy data with concise and composable code. This contrasts with other major players in the Julia ecosystem, such as Plots.jl which we used in most other units.
The examples above illustrate the core “grammar” of AlgebraOfGraphics, where plots are constructed by combining different specification components. The process typically follows this pattern:
data(): Specifies the source dataset, which is usually a table-like object.
mapping(): Defines the aesthetic mappings. It links columns from the data (e.g., :bill_length_mm) to visual roles such as the x and y axes, color, or faceting dimensions (row and col).
Visualization Layer: This determines how the data is represented. It can be a direct geometric object from visual() (like visual(Scatter) or visual(Contour)), or a statistical transformation like linear() or density(), which first processes the data and then plots the result.
These components are chained together with the * operator to build a single plot specification. To overlay multiple layers on the same axes (for example, scatter points and a regression line), you combine them with the + operator, as seen in visual(Scatter, ...) + linear(). This entire chain creates a declarative plot object, which is then passed to the draw() function to be rendered into a final image.
Here is another example:
spec =data(dat) *mapping(:bill_length_mm, color =:species) * (histogram(normalization=:pdf, bins =20)*visual(alpha=0.5) + AlgebraOfGraphics.density()*visual(linewidth=3))draw(spec, axis=(; title="Bill Length Distribution by Species"))
1.2 General Makie
Sometimes we want to use Makie without AoG. Here is some Makie code that does not use AoG:
The core philosophy of Makie revolves around a few key objects:
Figure: This is the top-level container or canvas for your entire visualization. It doesn’t contain the plot data itself, but rather holds all the different components, such as axes, legends, and colorbars, arranging them in a grid-based layout. You create it once with f = Figure().
Axis: This is the actual plotting area where data is visualized, complete with x/y axes, labels, and a title. An Axis is placed within the Figure’s layout system. The syntax f[1, 1] creates an Axis in the first row and first column of the figure’s grid, making it easy to compose complex multi-plot figures.
Plotting Functions (scatter!, lines!): These functions add visual elements to a specified Axis.
This layered approach separates the overall scene layout (Figure) from the individual plot contents (Axis). A key feature of Makie is its backend system. The same code for creating figures and axes works regardless of the output format. By starting your script with using CairoMakie, the figure is rendered into high-quality static formats like SVG, PDF, or PNG, perfect for publications. If you had used using GLMakie instead, the exact same code would produce a window that you can pan, zoom, and rotate in real-time, making it ideal for data exploration.
Here is another example:
x =0:0.1:4πy_sin =sin.(x)y_cos =cos.(x)f =Figure(size = (800, 400))ax1 =Axis(f[1, 1], title ="Sine Wave", xlabel ="x", ylabel ="sin(x)",)lines!(ax1, x, y_sin, color =:blue, label ="sin")ax2 =Axis(f[1, 2], title ="Cosine Wave", xlabel ="x", ylabel ="cos(x)",)scatter!(ax2, x, y_cos, color =:red, markersize=4, label ="cos")Label(f[0, 1:2], "Side-by-Side Plots", fontsize =22, tellwidth=false)f
2 Deep Learning (and a bit of SciML)
Julia started with a few deep learning libraries until Flux.jl emerged. See also Flux Docs. More recently, an alternative, Lux.jl emerged to work better with the SciML ecosystem. See also the Lux Docs.
Let us also note a very light-weight neural networks package, SimpleChains.jl. We’ll also use on of the SciML packages, DiffEqFlux.jl. See Liquet, Moka, and Nazarathy (2024) as an introductory deep learning text.
Flux vs. Lux: At its core, the distinction between Flux and Lux is a design philosophy choice between a stateful and a stateless API, which profoundly impacts how they interact with Julia’s automatic differentiation (AD) systems like Zygote.jl. Flux embodies a stateful, object-oriented approach. A Flux.Chain or Flux.Dense object is a mutable struct that encapsulates both the computational logic (the layer’s function) and the trainable parameters (weights and biases). When you compute a gradient, Zygote must traverse this complex struct to find the arrays to differentiate. This is often called “implicit parameter” handling, as the parameters are implicitly tied to the model object. While convenient, this can sometimes be fragile for complex models or when performing non-standard operations like differentiating with respect to hyperparameters.
Lux, in contrast, implements a stateless, functional design. A Lux layer is an immutable struct containing only the model’s architecture and hyperparameters. The trainable parameters (params) and non-trainable state (e.g., running means in BatchNorm, state) are stored separately, typically in a NamedTuple. The model’s forward pass is an explicit function model(x, params, state). When computing gradients, Zygote receives the params data structure directly, which is a much cleaner and more robust target for differentiation. This “explicit parameter” approach makes Lux highly composable and eliminates an entire class of mutation-related bugs, which is critical for advanced scientific machine learning (SciML) applications where models might be part of a differential equation solver.
When contrasted with mainstream Python frameworks, Flux.jl is philosophically very similar to PyTorch’s nn.Module API. Both are stateful, with the model object holding its own parameters (e.g., layer.weight in PyTorch). The call signature model(x) is identical, and the overall user experience is designed for convenience and familiarity. Keras takes this stateful abstraction even further with its high-level model.fit() API, though its underlying layers are conceptually similar to Flux and PyTorch.
Lux.jl, on the other hand, is philosophically aligned with Google’s JAX. JAX also enforces a functional, stateless paradigm where model logic (the apply function) is separate from the model parameters (“pytrees”). This explicit separation of parameters from the code that acts on them is what gives both Lux and JAX their power for complex gradient-based optimization tasks, such as meta-learning or taking derivatives through physics simulators. Therefore, the choice isn’t just about syntax; it’s about selecting a paradigm: the convenient, stateful approach of Flux/PyTorch or the robust, functional, and explicit approach of Lux/JAX.
MLJ collects hundreds of ML models of other packages under one roof. A brief MLJ intro is here. The tutorials page has dozens of worked examples.
In MLJ a model is an object that only serves as a container for the hyperparameters of the model. A machine is an object wrapping both a model and data and can contain information on the trained model; it does not fit the model by itself. However, it does check that the model is compatible with the scientific type of the data and will warn you otherwise.
The cheatsheet is also very useful. As our brief introduction to MLJ let’s consider a few key elements from the cheatsheet. Follow the MLJ tutorials for more detailed examples.
ArgumentError: Ambiguous model type name. Use pkg=... .
The model RidgeRegressor is provided by these packages:
["MLJScikitLearnInterface", "MLJLinearModels", "MultivariateStats"].
Stacktrace:
[1] handle(name::String; pkg::Nothing, interactive::Bool) @MLJModels~/.julia/packages/MLJModels/nxeCf/src/model_search.jl:111
[2] handle @~/.julia/packages/MLJModels/nxeCf/src/model_search.jl:97 [inlined]
[3] info(name::String; kwargs::@Kwargs{}) @MLJModels~/.julia/packages/MLJModels/nxeCf/src/model_search.jl:154
[4] top-level scope
@/work/julia-ml/Julia_ML_training/Julia_ML_training/unit5/unit_5.qmd:284
Manually calculated test statistic: 4.5466542394674425
Manually calculated p-value: 5.9493058655043084e-5
Two sample t-test (equal variance)
----------------------------------
Population details:
parameter of interest: Mean difference
value under h_0: 0
point estimate: 2.00881
95% confidence interval: (1.113, 2.905)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: <1e-04
Details:
number of observations: [20,18]
t-statistic: 4.5466542394674425
degrees of freedom: 36
empirical standard error: 0.44182145832893077
pvalue(EqualVarianceTTest(data1, data2, delta0))
5.9493058655043084e-5
4.3 Mixed Models
usingMixedModels, DataFrames, RDatasets# Load sleepstudy dataset from lme4 (same as in R)df =dataset("lme4", "sleepstudy")first(df, 5)# Fit a linear mixed model:# Reaction ~ Days + (Days | Subject)# Days: fixed effect, (Days | Subject): random slope/intercept by Subjectmodel =fit(MixedModel,@formula(Reaction ~1+ Days + (1+ Days | Subject)), df)println(model)
Linear mixed model fit by maximum likelihood
Reaction ~ 1 + Days + (1 + Days | Subject)
logLik -2 logLik AIC AICc BIC
-875.9697 1751.9393 1763.9393 1764.4249 1783.0971
Variance components:
Column Variance Std.Dev. Corr.
Subject (Intercept) 565.51067 23.78047
Days 32.68212 5.71683 +0.08
Residual 654.94145 25.59182
Number of obs: 180; levels of grouping factors: 18
Fixed-effects parameters:
──────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
──────────────────────────────────────────────────
(Intercept) 251.405 6.63226 37.91 <1e-99
Days 10.4673 1.50224 6.97 <1e-11
──────────────────────────────────────────────────
5 Selected topics from Optimization
Julia is a neat language for optimization. See for example Kochenderfer and Wheeler (2019).
The JuliaNLSolvers organization provides a few key packages where we’ll focus on Optim.jl which is “Univariate and multivariate optimization in Julia”. This is unconstrained continuous optimization.
The Jump.jl package is from a slightly different world of constrained (operations research style) optimization. See Lubin et al. (2023).
Note however that we don’t need to supply the gradient if we don’t have it. We can just let optimize use automatic differenatiation.
result =optimize(rosenbrock, [10.0, 10.0], BFGS()) # no gradient given so uses autodiffprintln("Gradient descent minimizer:", Optim.minimizer(result))println("Minimum value:", Optim.minimum(result))
Now here it is solved using the DSL (domain specific language) ofJuMP. An early blog post about DSLs is here.
usingJuMP, GLPKmodel = JuMP.Model(GLPK.Optimizer)@variable model x ≥0@variable model y ≥0@constraint model x + y ≤5@objective model Max (x +2y)optimize!(model)println("Optimal x = ", value(x))println("Optimal y = ", value(y))println("Optimal objective value = ", objective_value(model))
WARNING: using JuMP.@variables in module Notebook conflicts with an existing identifier.
Optimal x = 0.0
Optimal y = 5.0
Optimal objective value = 10.0
JuMP can do much more and interface with state of the art commerical mixed integer linear programming solves. There are many resources available for JuMP. See Materials for learning JuMP as a start.
A Machine Learning Unit in a University of Queensland Course (stay on the semester of that link - and not “current semester” which doesn’t have that unit).
7 Exercises
Consider this dataframe, df:
using DataFrames, Random, Distributions
Random.seed!(123) # for reproducibility
n = 100
μ = [5.0, 13.0, -2.0] # Means for x, y, z
Σ = [4.0 3.0 1.0; # 3×3 covariance matrix, making variables correlated
3.0 9.0 2.0;
1.0 2.0 6.0]
mvnorm = MvNormal(μ, Σ)
data = rand(mvnorm, n)
df = DataFrame(x = data[1, :], y = data[2, :], z = data[3, :])
Use AlgebraOfGraphics.jl to plot a scatter plot of y versus x, with the color of the points representing the value of z.
Consider the example Flux.jl code on the Flux.jl GitHub README. Try to run that code (also replacing model with the commented out model). Do the plotting with Makie instead of Plots.
Now consider the Flux quickstart page. Try and reproduce that code, disabling CUDA if not relevant (if you don’t have a CUDA GPU).
Move onto Lux.jl. Consider the “Native Julia & Zygote” example on the Lux.jl GitHub README page. Run that example.
Consider Lab 8 - Tree-based models from DataScienceTutorials.jl. Try to reproduce Lab 8. As you can see there are many other tutorials/labs, so if you prefer a different one, carry out that one.
Go to the documentation for HypothesisTests.jl. Choose some test that you either know or interests you, and try to run the code on some example data you generate.
Liquet, Benoit, Sarat Moka, and Yoni Nazarathy. 2024. Mathematical Engineering of Deep Learning. CRC Press. https://deeplearningmath.org/.
Lubin, Miles, Oscar Dowson, Joaquim Dias Garcia, Joey Huchette, Benoît Legat, and Juan Pablo Vielma. 2023. “JuMP 1.0: Recent Improvements to a Modeling Language for Mathematical Optimization.”Mathematical Programming Computation 15: 581–89. https://doi.org/10.1007/s12532-023-00239-3.
Nazarathy, Yoni, and Hayden Klok. 2021. Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence. Springer Nature. https://statisticswithjulia.org/.
Source Code
---title: "Unit 5 - Machine Learning, Statistics, and Optimization"engine: julia---In this unit we explore some deep learning, machine learning, statistics, and optimization libraries in Julia. We also use [Makie](https://docs.makie.org/stable/) as an alternative to the plots used in the previous units.# Makie and AlgebraOfGraphics`Makie.jl` is a powerful and high-performance plotting ecosystem for Julia. It stands out for its speed, interactivity, and a unified API across multiple backends, including GPU-accelerated plotting with `GLMakie.jl` and publication-quality vector graphics with `CairoMakie.jl`. While Makie provides a detailed, imperative API for fine-grained control, `AlgebraOfGraphics.jl` (AoG) is a declarative layer built on top of it. AoG implements a "grammar of graphics" inspired by R's `ggplot2`, allowing users to build complex statistical visualizations from tidy data with concise and composable code. This contrasts with other major players in the Julia ecosystem, such as `Plots.jl` which we used in most other units.## AlgebraOfGraphicsThe [AlgebraOfGraphics.jl](https://github.com/MakieOrg/AlgebraOfGraphics.jl) package comes with some data examples. See also this [useful Pumas AoG tutorial](https://tutorials.pumas.ai/html/PlottingInJulia/03-AoG-Stats.html).```{julia}usingAlgebraOfGraphicsdat = AlgebraOfGraphics.penguins()@showtypeof(dat)@showkeys(dat)@show [length(d) for d in dat];```Here is how we plot:```{julia}usingCairoMakiespec =data(dat) *mapping(:bill_length_mm =>"Bill length (mm)",:bill_depth_mm =>"Bill depth (mm)", color =:species =>"Species", row =:sex, col =:island, ) *(visual(Scatter, alpha =0.3) +linear())draw(spec)```Here is another example:```{julia}data(dat) *mapping(:bill_length_mm =>"Bill length (mm)", color =:species =>"Species", row =:sex, col =:island, ) * AlgebraOfGraphics.density(bandwidth=0.7) |> draw```And another example:```{julia}data(dat) *mapping(:bill_length_mm =>"Bill length (mm)",:bill_depth_mm =>"Bill depth (mm)", row =:sex, col =:island, ) * AlgebraOfGraphics.density() |> draw```Or a variant:```{julia}data(dat) *mapping(:bill_length_mm =>"Bill length (mm)",:bill_depth_mm =>"Bill depth (mm)", row =:sex, col =:island, ) * AlgebraOfGraphics.density() *visual(Contour) |> draw```The examples above illustrate the core "grammar" of AlgebraOfGraphics, where plots are constructed by combining different specification components. The process typically follows this pattern:1. `data()`: Specifies the source dataset, which is usually a table-like object.2. `mapping()`: Defines the aesthetic mappings. It links columns from the data (e.g., `:bill_length_mm`) to visual roles such as the x and y axes, `color`, or faceting dimensions (`row` and `col`).3. Visualization Layer: This determines *how* the data is represented. It can be a direct geometric object from `visual()` (like `visual(Scatter)` or `visual(Contour)`), or a statistical transformation like `linear()` or `density()`, which first processes the data and then plots the result.These components are chained together with the `*` operator to build a single plot specification. To overlay multiple layers on the same axes (for example, scatter points and a regression line), you combine them with the `+` operator, as seen in `visual(Scatter, ...) + linear()`. This entire chain creates a declarative plot object, which is then passed to the `draw()` function to be rendered into a final image.Here is another example:```{julia}spec =data(dat) *mapping(:bill_length_mm, color =:species) * (histogram(normalization=:pdf, bins =20)*visual(alpha=0.5) + AlgebraOfGraphics.density()*visual(linewidth=3))draw(spec, axis=(; title="Bill Length Distribution by Species"))```## General MakieSometimes we want to use Makie without AoG. Here is some Makie code that does not use AoG:```{julia}seconds =0:0.1:2measurements = [8.2, 8.4, 6.3, 9.5, 9.1, 10.5, 8.6, 8.2, 10.5, 8.5, 7.2,8.8, 9.7, 10.8, 12.5, 11.6, 12.1, 12.1, 15.1, 14.7, 13.1]f =Figure()ax =Axis(f[1, 1], title ="Experimental data and exponential fit", xlabel ="Time (seconds)", ylabel ="Value",)scatter!( ax, seconds, measurements, color =:tomato, label ="Measurements")lines!( ax, seconds,exp.(seconds) .+7, color =:tomato, linestyle =:dash, label ="f(x) = exp(x) + 7",)axislegend(position =:rb)f```The core philosophy of Makie revolves around a few key objects:* **`Figure`**: This is the top-level container or canvas for your entire visualization. It doesn't contain the plot data itself, but rather holds all the different components, such as axes, legends, and colorbars, arranging them in a grid-based layout. You create it once with `f = Figure()`.* **`Axis`**: This is the actual plotting area where data is visualized, complete with x/y axes, labels, and a title. An `Axis` is placed within the `Figure`'s layout system. The syntax `f[1, 1]` creates an `Axis` in the first row and first column of the figure's grid, making it easy to compose complex multi-plot figures.* Plotting Functions (`scatter!`, `lines!`): These functions add visual elements to a specified `Axis`.This layered approach separates the overall scene layout (`Figure`) from the individual plot contents (`Axis`). A key feature of Makie is its backend system. The same code for creating figures and axes works regardless of the output format. By starting your script with `using CairoMakie`, the figure is rendered into high-quality static formats like SVG, PDF, or PNG, perfect for publications. If you had used `using GLMakie` instead, the exact same code would produce a window that you can pan, zoom, and rotate in real-time, making it ideal for data exploration. Here is another example:```{julia}x =0:0.1:4πy_sin =sin.(x)y_cos =cos.(x)f =Figure(size = (800, 400))ax1 =Axis(f[1, 1], title ="Sine Wave", xlabel ="x", ylabel ="sin(x)",)lines!(ax1, x, y_sin, color =:blue, label ="sin")ax2 =Axis(f[1, 2], title ="Cosine Wave", xlabel ="x", ylabel ="cos(x)",)scatter!(ax2, x, y_cos, color =:red, markersize=4, label ="cos")Label(f[0, 1:2], "Side-by-Side Plots", fontsize =22, tellwidth=false)f```# Deep Learning (and a bit of SciML)Julia started with a few deep learning libraries until [Flux.jl](https://github.com/FluxML/Flux.jl) emerged. See also [Flux Docs](https://fluxml.ai/Flux.jl/stable/). More recently, an alternative, [Lux.jl](https://github.com/LuxDL/Lux.jl) emerged to work better with the [SciML](https://sciml.ai/) ecosystem. See also the [Lux Docs](https://lux.csail.mit.edu/stable/).Let us also note a very light-weight neural networks package, [SimpleChains.jl](https://github.com/PumasAI/SimpleChains.jl). We'll also use on of the SciML packages, [DiffEqFlux.jl](https://docs.sciml.ai/DiffEqFlux/stable/examples/neural_ode/). See @liquet2024mathematical as an introductory deep learning text.**Flux vs. Lux:** At its core, the distinction between Flux and Lux is a design philosophy choice between a **stateful** and a **stateless** API, which profoundly impacts how they interact with Julia's automatic differentiation (AD) systems like `Zygote.jl`. Flux embodies a stateful, object-oriented approach. A `Flux.Chain` or `Flux.Dense` object is a mutable struct that encapsulates both the computational logic (the layer's function) and the trainable parameters (weights and biases). When you compute a gradient, Zygote must traverse this complex struct to find the arrays to differentiate. This is often called "implicit parameter" handling, as the parameters are implicitly tied to the model object. While convenient, this can sometimes be fragile for complex models or when performing non-standard operations like differentiating with respect to hyperparameters.Lux, in contrast, implements a stateless, functional design. A Lux layer is an immutable struct containing only the model's architecture and hyperparameters. The trainable parameters (`params`) and non-trainable state (e.g., running means in `BatchNorm`, `state`) are stored separately, typically in a `NamedTuple`. The model's forward pass is an explicit function `model(x, params, state)`. When computing gradients, Zygote receives the `params` data structure directly, which is a much cleaner and more robust target for differentiation. This "explicit parameter" approach makes Lux highly composable and eliminates an entire class of mutation-related bugs, which is critical for advanced scientific machine learning (SciML) applications where models might be part of a differential equation solver.When contrasted with mainstream Python frameworks, **Flux.jl is philosophically very similar to PyTorch's `nn.Module` API**. Both are stateful, with the model object holding its own parameters (e.g., `layer.weight` in PyTorch). The call signature `model(x)` is identical, and the overall user experience is designed for convenience and familiarity. **Keras** takes this stateful abstraction even further with its high-level `model.fit()` API, though its underlying layers are conceptually similar to Flux and PyTorch.**Lux.jl, on the other hand, is philosophically aligned with Google's JAX**. JAX also enforces a functional, stateless paradigm where model logic (the `apply` function) is separate from the model parameters ("pytrees"). This explicit separation of parameters from the code that acts on them is what gives both Lux and JAX their power for complex gradient-based optimization tasks, such as meta-learning or taking derivatives through physics simulators. Therefore, the choice isn't just about syntax; it's about selecting a paradigm: the convenient, stateful approach of Flux/PyTorch or the robust, functional, and explicit approach of Lux/JAX.We'll use [MLDatasets.jl](https://github.com/JuliaML/MLDatasets.jl) and focus on the (quite simple) [MNIST Digits example](https://en.wikipedia.org/wiki/MNIST_database). The deep learning examples are in Jupyter notebooks.## Simple Linear Model ExampleThis notebook uses a **linear model** (!) to create a predictor of MNIST digits. It compares the one-vs-rest and one-vs-one approaches: [Basic_MNIST.ipynb](https://github.com/open-AIMS/Julia_ML_training/blob/main/unit5/Basic_MNIST.ipynb).## Flux exampleThis notebook uses `Flux.jl` and a simple dense neural network for MNIST: [Flux_MNIST.ipynb](https://github.com/open-AIMS/Julia_ML_training/blob/main/unit5/Flux_MNIST.ipynb).## Lux ExampleThis notebook uses `Lux.jl` and a simple dense neural network for MNIST: [Lux_MNIST.ipynb](https://github.com/open-AIMS/Julia_ML_training/blob/main/unit5/Lux_MNIST.ipynb).## DiffEqFlux ExampleThis example is taken from [here](https://docs.sciml.ai/DiffEqFlux/stable/examples/neural_ode/):```{julia}usingComponentArrays, Lux, DiffEqFlux, OrdinaryDiffEq, Optimization, OptimizationOptimJL,OptimizationOptimisers, Random, CairoMakierng =Xoshiro(0)u0 =Float32[2.0; 0.0]datasize =30tspan = (0.0f0, 1.5f0)tsteps =range(tspan[1], tspan[2]; length = datasize)functiontrueODEfunc(du, u, p, t) true_A = [-0.12.0; -2.0-0.1] du .= ((u .^3)'true_A)'endprob_trueode =ODEProblem(trueODEfunc, u0, tspan)ode_data =Array(solve(prob_trueode, Tsit5(); saveat = tsteps))dudt2 =Chain(x -> x .^3, Dense(2, 50, tanh), Dense(50, 2))p, st = Lux.setup(rng, dudt2)prob_neuralode =NeuralODE(dudt2, tspan, Tsit5(); saveat = tsteps)functionpredict_neuralode(p)Array(prob_neuralode(u0, p, st)[1])endfunctionloss_neuralode(p) pred =predict_neuralode(p) loss =sum(abs2, ode_data .- pred)return lossendfunctioncallback(state, l; doplot =false)print(l, ", ")if doplot pred =predict_neuralode(state.u) fig =Figure() ax =Axis(fig[1, 1], xlabel ="Time", ylabel ="Value")scatter!(ax, tsteps, ode_data[1, :]; label ="data")scatter!(ax, tsteps, pred[1, :]; label ="prediction")axislegend(ax)display(fig)endreturnfalseendpinit =ComponentArray(p)callback((; u = pinit), loss_neuralode(pinit); doplot =true)adtype = Optimization.AutoZygote()optf = Optimization.OptimizationFunction((x, p) ->loss_neuralode(x), adtype)optprob = Optimization.OptimizationProblem(optf, pinit)result_neuralode = Optimization.solve( optprob, OptimizationOptimisers.Adam(0.05); callback = callback, maxiters =300)optprob2 =remake(optprob; u0 = result_neuralode.u)result_neuralode2 = Optimization.solve( optprob2, Optim.BFGS(; initial_stepnorm =0.01); callback, allow_f_increases =false)callback((; u = result_neuralode2.u), loss_neuralode(result_neuralode2.u); doplot =true)```# General Machine LearningThe main general machine learning package in Julia is [MLJ - A Machine Learning Framework for Julia](https://juliaai.github.io/MLJ.jl/stable/). Less popular, and older (probably not to use) are [ScikitLearn.jl](https://github.com/cstjean/ScikitLearn.jl) and [Knet.jl](https://github.com/denizyuret/Knet.jl).MLJ collects hundreds of ML models of other packages under one roof. A brief MLJ intro is [here](https://juliaml.ai/). The [tutorials](https://juliaml.ai/tutorials) page has dozens of worked examples. In MLJ a **model** is an object that only serves as a container for the hyperparameters of the model. A **machine** is an object wrapping both a model and data and can contain information on the trained model; it does not fit the model by itself. However, it does check that the model is compatible with the scientific type of the data and will warn you otherwise.The [cheatsheet](https://juliaml.ai/mlj-cheatsheet) is also very useful. As our brief introduction to MLJ let's consider a few key elements from the cheatsheet. Follow the [MLJ tutorials](https://juliaml.ai/tutorials) for more detailed examples.#### See the MLJ version:```{julia}usingMLJMLJ_VERSION```#### Retrieves registry metadata for a specific model:```{julia}info("PCA")```#### Some models are in multiple packages:```{julia}info("RidgeRegressor")```#### So we specify the package:```{julia}info("RidgeRegressor", pkg="MultivariateStats")```#### We can retrieve the model document string for the classifier, without loading model code:```doc("DecisionTreeClassifier", pkg="DecisionTree") # try this yourself```#### List metadata of every registered model:```{julia}models()```#### lists models with a specific phrase in the model or package name:```{julia}models("tree")```#### An example of ingesting data:```{julia}usingRDatasetschanning =dataset("boot", "channing")y, X =unpack(channing, ==(:Exit); rng=123)``````{julia}train, valid, test =partition(eachindex(y), 0.7, 0.2, rng=1234) # for 70:20:10 ratio```#### Machine construction (supervised):```{julia}usingNearestNeighborModelsX, y =make_regression(1_000, 5) # synthetic data for regressionmodel =KNNRegressor(K=1)mach_supervised =machine(model, X, y)```#### Machine construction (unsupervised):```{julia}model =OneHotEncoder()mach_unsupervised =machine(model, X)```#### Fitting a machine (learning)```{julia}fit!(mach_supervised, rows=1:100, verbosity=2, force=false)``````{julia}fit!(mach_unsupervised, rows=1:100, verbosity=2, force=false)```#### Prediction```{julia}predict(mach_supervised, rows=1:100)```# Selected topics from StatisticsSee the [JuliaStats](https://juliastats.org/) organization. You can also see @nazarathy2021statistics. Let's touch on the following statistics packages:* [GLM.jl](https://github.com/JuliaStats/GLM.jl)* [HypothesisTests.jl](https://github.com/JuliaStats/HypothesisTests.jl)* [MixedModels.jl](https://github.com/JuliaStats/MixedModels.jl)* [MultivariateStats.jl](https://github.com/JuliaStats/MultivariateStats.jl)* [TimeSeries.jl](https://github.com/JuliaStats/TimeSeries.jl)## GLMHere is a basic GLM example:```{julia}usingDataFrames, AlgebraOfGraphics, CairoMakieusingGLM: lm, coef, @formula# Simulated dataset: Linear relationshipx =randn(100)y =0.7*x.^2+2x .+1+0.5*randn(100)df =DataFrame(x = x, y = y)# Fit a linear modelmodel =lm(@formula(y ~ x + x^2), df)coefs =coef(model)# Predicted linexs =range(minimum(df.x), maximum(df.x), length=100)ys = coefs[1] .+ coefs[2] .* xs + coefs[3] .* xs.^2df_pred =DataFrame(x = xs, y = ys)plt =data(df) *mapping(:x, :y) *visual(Scatter) +data(df_pred) *mapping(:x, :y) *visual(Lines)draw(plt)```## Hypothesis Tests```{julia}usingCSV, Distributions, HypothesisTestsdata1 = CSV.read("../data/machine1.csv", header=false, DataFrame)[:,1]data2 = CSV.read("../data/machine2.csv", header=false, DataFrame)[:,1]xBar1, s1, n1 =mean(data1), std(data1), length(data1)xBar2, s2, n2 =mean(data2), std(data2), length(data2)delta0 =0sP =sqrt( ( (n1-1)*s1^2+ (n2-1)*s2^2 ) / (n1 + n2 -2) )testStatistic = ( xBar1-xBar2 - delta0 ) / ( sP *sqrt( 1/n1 +1/n2) )pVal =2*ccdf(TDist(n1+n2 -2), abs(testStatistic))println("Manually calculated test statistic: ", testStatistic)println("Manually calculated p-value: ", pVal, "\n")println(EqualVarianceTTest(data1, data2, delta0))``````{julia}pvalue(EqualVarianceTTest(data1, data2, delta0))```## Mixed Models```{julia}usingMixedModels, DataFrames, RDatasets# Load sleepstudy dataset from lme4 (same as in R)df =dataset("lme4", "sleepstudy")first(df, 5)# Fit a linear mixed model:# Reaction ~ Days + (Days | Subject)# Days: fixed effect, (Days | Subject): random slope/intercept by Subjectmodel =fit(MixedModel,@formula(Reaction ~1+ Days + (1+ Days | Subject)), df)println(model)```# Selected topics from OptimizationJulia is a neat language for optimization. See for example @kochenderfer2019algorithms.The [JuliaNLSolvers](https://github.com/JuliaNLSolvers) organization provides a few key packages where we'll focus on [Optim.jl](https://github.com/JuliaNLSolvers/Optim.jl) which is "Univariate and multivariate optimization in Julia". This is unconstrained continuous optimization.The [Jump.jl](https://github.com/jump-dev/JuMP.jl) package is from a slightly different world of constrained (operations research style) optimization. See @Lubin2023. ## Some Optim.jl examplesHere is a [Rosenbrock function](https://en.wikipedia.org/wiki/Rosenbrock_function).$$f(x, y) = (1 - x)^2 + 5 \, (y - x^2)^2$$Observe that $f(x,y) \ge 0$ and $f(1,1) = 0$. Hence a minimizer is $(x,y) = (1,1)$.Here is the function in Julia:```{julia}rosenbrock(x, y) = (1- x)^2+5(y - x^2)^2```Let's first minimize via the gradient-free [Nelder–Mead method](https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method):```{julia}usingOptim# let's make another methodrosenbrock(x::AbstractVector) =rosenbrock(x...)result =optimize(rosenbrock, [10.0, 10.0], NelderMead())println("Gradient-free result:")println(" minimizer: ", Optim.minimizer(result))println(" minimum value: ", Optim.minimum(result))```Now let's use [BFGS](https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm) which requires gradient information. First what is the gradient? As an illustration let's use [Symbolics.jl](https://github.com/JuliaSymbolics/Symbolics.jl)```{julia}usingSymbolics@variables x1 x2rosenbrock_expr = (1- x1)^2+10* (x2 - x1^2)^2rosenbrock_gradient_expr = Symbolics.gradient(rosenbrock_expr, [x1, x2])```To see it nicely use [Latexify.jl](https://github.com/korsbo/Latexify.jl):```{julia}usingLatexifylatexify(rosenbrock_gradient_expr)```Here is a Julia function of the gradient using [`build_function`](https://docs.sciml.ai/Symbolics/stable/getting_started/#Building-Functions):```{julia}rosenbrock_gradient =eval(build_function(rosenbrock_gradient_expr, x1, x2)[1])methods(rosenbrock_gradient)``````{julia}@showrosenbrock_gradient(1, 1) # Needs to be 0 at a minimum@showrosenbrock_gradient(1.01, 1.01); # Needs to prbably not be 0``````{julia}functionrosenbrock_gradient_vec!(storage, x) gx1, gx2 =rosenbrock_gradient(x...) storage[1] = gx1 storage[2] = gx2endresult =optimize(rosenbrock, rosenbrock_gradient_vec!, [10.0, 10.0], BFGS())println("Gradient descent minimizer:", Optim.minimizer(result))println("Minimum value:", Optim.minimum(result))```Note however that we don't need to supply the gradient if we don't have it. We can just let `optimize` use automatic differenatiation. ```{julia}result =optimize(rosenbrock, [10.0, 10.0], BFGS()) # no gradient given so uses autodiffprintln("Gradient descent minimizer:", Optim.minimizer(result))println("Minimum value:", Optim.minimum(result))```Note that in all these cases, we can also inspect the result of `optimize`:```{julia}result```Note also that [Optimization.jl](https://github.com/SciML/Optimization.jl) wraps Optim.jl and other packages. See the [Optimization.jl docs](https://docs.sciml.ai/Optimization/stable/).## A very simple example with JuMPLet's consider this linear programming problem:$$\begin{align*}\text{Maximize} \quad & x + 2y \\\text{subject to} \quad & x + y \leq 5 \\ & x \geq 0 \\ & y \geq 0\end{align*}$$Here is a manual illustration of this problem:```{julia}vertices_x, vertices_y = [0, 5, 0], [0, 0, 5]fig =Figure(size = (600, 600))ax =Axis(fig[1,1]; xlabel="x", ylabel="y", title="Feasible Region and Objective")poly!(ax, vertices_x, vertices_y, color = (:dodgerblue, 0.3), strokecolor=:black, strokewidth=1, label="Feasible Region")lines!(ax, [0,5], [5,0], color=:black, linewidth=2, label="x + y = 5")lines!(ax, [0,5.5],[0,0], color=:red, linestyle=:dash, label="y = 0")lines!(ax, [0,0],[0,5.5], color=:green, linestyle=:dash, label="x = 0")for c in0:2:10 xs =0:0.1:5 ys_obj = (c .- xs)./2 mask = (ys_obj .>=0) .& (ys_obj .<=5)lines!(ax, xs[mask], ys_obj[mask], color=:purple, linestyle=:dot)endarrows!(ax, [1.0], [2.0], [0.5], [1.0], color =:purple, linewidth=3, arrowsize=20)text!(ax, "Objective↑", position = (1.6, 3), color=:purple)axislegend(ax; position=:rt)fig```Now here it is solved using the DSL (domain specific language) ofJuMP. An early blog post about DSLs is [here](https://julialang.org/blog/2017/08/dsl/).```{julia}usingJuMP, GLPKmodel = JuMP.Model(GLPK.Optimizer)@variable model x ≥0@variable model y ≥0@constraint model x + y ≤5@objective model Max (x +2y)optimize!(model)println("Optimal x = ", value(x))println("Optimal y = ", value(y))println("Optimal objective value = ", objective_value(model))```JuMP can do much more and interface with state of the art commerical mixed integer linear programming solves. There are many resources available for JuMP. See [Materials for learning JuMP](https://jump.dev/pages/learn/) as a start.# Additional online resources* A [Machine Learning Fundamentals](https://tutorials.pumas.ai/html/AIDD/01-machine_learning_fundamentals.html) tutorial by A [PumasAI](https://pumas.ai/).* A [Machine Learning Unit](https://courses.smp.uq.edu.au/MATH2504/2023/lectures_html/lecture-unit-8.html) in a University of Queensland Course (stay on the semester of that link - and not "current semester" which doesn't have that unit).# Exercises1. Consider this dataframe, `df`:```using DataFrames, Random, DistributionsRandom.seed!(123) # for reproducibilityn = 100μ = [5.0, 13.0, -2.0] # Means for x, y, zΣ = [4.0 3.0 1.0; # 3×3 covariance matrix, making variables correlated 3.0 9.0 2.0; 1.0 2.0 6.0]mvnorm = MvNormal(μ, Σ)data = rand(mvnorm, n)df = DataFrame(x = data[1, :], y = data[2, :], z = data[3, :])```Use AlgebraOfGraphics.jl to plot a scatter plot of `y` versus `x`, with the color of the points representing the value of `z`.2. Consider the example `Flux.jl` code on the [Flux.jl GitHub README](https://github.com/FluxML/Flux.jl). Try to run that code (also replacing `model` with the commented out `model`). Do the plotting with `Makie` instead of `Plots`.2. Now consider the [Flux quickstart page](https://fluxml.ai/Flux.jl/stable/guide/models/quickstart/). Try and reproduce that code, disabling CUDA if not relevant (if you don't have a CUDA GPU).2. Move onto `Lux.jl`. Consider the "Native Julia & Zygote" example on the [Lux.jl GitHub README](https://github.com/LuxDL/Lux.jl) page. Run that example. 2. Now visit the [examples directory of Lux.jl](https://github.com/LuxDL/Lux.jl/tree/main/examples). Find one or two examples and try to reproduce them.2. Consider [Lab 8 - Tree-based models](https://juliaai.github.io/DataScienceTutorials.jl/isl/lab-8/) from [DataScienceTutorials.jl](https://github.com/JuliaAI/DataScienceTutorials.jl). Try to reproduce Lab 8. As you can see there are many other tutorials/labs, so if you prefer a different one, carry out that one.2. Consider the [Incremental Training with MLJFlux](https://fluxml.ai/MLJFlux.jl/dev/common_workflows/incremental_training/notebook/) tutorial. Execute the code in this tutorial.2. Consider the [Statistics with Julia book code example](https://github.com/h-Klok/StatsWithJuliaBook/blob/master/8_chapter/multiLinReg.jl) dealing with linear regression. Reproduce this code and study the output by inspecting `model`.2. Go to the [documentation for HypothesisTests.jl](https://juliastats.org/HypothesisTests.jl/stable/). Choose some test that you either know or interests you, and try to run the code on some example data you generate.2. Reproduce the code from the [MixedModels.jl quickstart](https://juliastats.org/MixedModels.jl/dev/).2. Study the [tips and tricks of Optim.jl](https://julianlsolvers.github.io/Optim.jl/v0.9.3/user/tipsandtricks/). Try to reproduce some of the code.2. Study the [Getting started with JuMP](https://jump.dev/JuMP.jl/stable/tutorials/getting_started/getting_started_with_JuMP/) page and try to reproduce the code there.