Unit 5 - Machine Learning, Statistics, and Optimization

In this unit we explore some deep learning, machine learning, statistics, and optimization libraries in Julia. We also use Makie as an alternative to the plots used in the previous units.

1 Makie and AlgebraOfGraphics

Makie.jl is a powerful and high-performance plotting ecosystem for Julia. It stands out for its speed, interactivity, and a unified API across multiple backends, including GPU-accelerated plotting with GLMakie.jl and publication-quality vector graphics with CairoMakie.jl. While Makie provides a detailed, imperative API for fine-grained control, AlgebraOfGraphics.jl (AoG) is a declarative layer built on top of it. AoG implements a “grammar of graphics” inspired by R’s ggplot2, allowing users to build complex statistical visualizations from tidy data with concise and composable code. This contrasts with other major players in the Julia ecosystem, such as Plots.jl which we used in most other units.

1.1 AlgebraOfGraphics

The AlgebraOfGraphics.jl package comes with some data examples. See also this useful Pumas AoG tutorial.

using AlgebraOfGraphics
dat = AlgebraOfGraphics.penguins()
@show typeof(dat)
@show keys(dat)
@show [length(d) for d in dat];

typeof(dat) = @NamedTuple{species::Vector{String}, island::Vector{String}, bill_length_mm::Vector{Float64}, bill_depth_mm::Vector{Float64}, flipper_length_mm::Vector{Int64}, body_mass_g::Vector{Int64}, sex::Vector{String}}
keys(dat) = (:species, :island, :bill_length_mm, :bill_depth_mm, :flipper_length_mm, :body_mass_g, :sex)
[length(d) for d = dat] = [333, 333, 333, 333, 333, 333, 333]

Here is how we plot:

using CairoMakie

spec =
    data(dat) *
    mapping(
        :bill_length_mm => "Bill length (mm)",
        :bill_depth_mm => "Bill depth (mm)",
        color = :species => "Species",
        row = :sex,
        col = :island,
    ) *(visual(Scatter, alpha = 0.3) + linear())

draw(spec)

Here is another example:

data(dat) * mapping(
                    :bill_length_mm => "Bill length (mm)",
                    color = :species => "Species",
                    row = :sex,
                    col = :island,
                    ) * AlgebraOfGraphics.density(bandwidth=0.7) |> draw

And another example:

data(dat) * mapping(
                    :bill_length_mm => "Bill length (mm)",
                    :bill_depth_mm => "Bill depth (mm)",
                    row = :sex,
                    col = :island,
                    ) * AlgebraOfGraphics.density() |> draw

Or a variant:

data(dat) * mapping(
                    :bill_length_mm => "Bill length (mm)",
                    :bill_depth_mm => "Bill depth (mm)",
                    row = :sex,
                    col = :island,
                    ) * AlgebraOfGraphics.density() * visual(Contour) |> draw

The examples above illustrate the core “grammar” of AlgebraOfGraphics, where plots are constructed by combining different specification components. The process typically follows this pattern:

data(): Specifies the source dataset, which is usually a table-like object.
mapping(): Defines the aesthetic mappings. It links columns from the data (e.g., :bill_length_mm) to visual roles such as the x and y axes, color, or faceting dimensions (row and col).
Visualization Layer: This determines how the data is represented. It can be a direct geometric object from visual() (like visual(Scatter) or visual(Contour)), or a statistical transformation like linear() or density(), which first processes the data and then plots the result.

These components are chained together with the * operator to build a single plot specification. To overlay multiple layers on the same axes (for example, scatter points and a regression line), you combine them with the + operator, as seen in visual(Scatter, ...) + linear(). This entire chain creates a declarative plot object, which is then passed to the draw() function to be rendered into a final image.

Here is another example:

spec = data(dat) *
       mapping(:bill_length_mm, color = :species) *
       (histogram(normalization=:pdf, bins = 20)* visual(alpha=0.5) + AlgebraOfGraphics.density()*visual(linewidth=3))

draw(spec, axis=(; title="Bill Length Distribution by Species"))

1.2 General Makie

Sometimes we want to use Makie without AoG. Here is some Makie code that does not use AoG:


seconds = 0:0.1:2
measurements = [8.2, 8.4, 6.3, 9.5, 9.1, 10.5, 8.6, 8.2, 10.5, 8.5, 7.2,
        8.8, 9.7, 10.8, 12.5, 11.6, 12.1, 12.1, 15.1, 14.7, 13.1]

f = Figure()
ax = Axis(f[1, 1],
    title = "Experimental data and exponential fit",
    xlabel = "Time (seconds)",
    ylabel = "Value",
)
scatter!(
    ax,
    seconds,
    measurements,
    color = :tomato,
    label = "Measurements"
)
lines!(
    ax,
    seconds,
    exp.(seconds) .+ 7,
    color = :tomato,
    linestyle = :dash,
    label = "f(x) = exp(x) + 7",
)
axislegend(position = :rb)
f

The core philosophy of Makie revolves around a few key objects:

Figure: This is the top-level container or canvas for your entire visualization. It doesn’t contain the plot data itself, but rather holds all the different components, such as axes, legends, and colorbars, arranging them in a grid-based layout. You create it once with f = Figure().
Axis: This is the actual plotting area where data is visualized, complete with x/y axes, labels, and a title. An Axis is placed within the Figure’s layout system. The syntax f[1, 1] creates an Axis in the first row and first column of the figure’s grid, making it easy to compose complex multi-plot figures.
Plotting Functions (scatter!, lines!): These functions add visual elements to a specified Axis.

This layered approach separates the overall scene layout (Figure) from the individual plot contents (Axis). A key feature of Makie is its backend system. The same code for creating figures and axes works regardless of the output format. By starting your script with using CairoMakie, the figure is rendered into high-quality static formats like SVG, PDF, or PNG, perfect for publications. If you had used using GLMakie instead, the exact same code would produce a window that you can pan, zoom, and rotate in real-time, making it ideal for data exploration.

Here is another example:

x = 0:0.1:4π
y_sin = sin.(x)
y_cos = cos.(x)

f = Figure(size = (800, 400))

ax1 = Axis(f[1, 1],
    title = "Sine Wave",
    xlabel = "x",
    ylabel = "sin(x)",
)
lines!(ax1, x, y_sin, color = :blue, label = "sin")

ax2 = Axis(f[1, 2],
    title = "Cosine Wave",
    xlabel = "x",
    ylabel = "cos(x)",
)
scatter!(ax2, x, y_cos, color = :red, markersize=4, label = "cos")

Label(f[0, 1:2], "Side-by-Side Plots", fontsize = 22, tellwidth=false)
f

2 Deep Learning (and a bit of SciML)

Julia started with a few deep learning libraries until Flux.jl emerged. See also Flux Docs. More recently, an alternative, Lux.jl emerged to work better with the SciML ecosystem. See also the Lux Docs.

Let us also note a very light-weight neural networks package, SimpleChains.jl. We’ll also use on of the SciML packages, DiffEqFlux.jl. See Liquet, Moka, and Nazarathy (2024) as an introductory deep learning text.

Flux vs. Lux: At its core, the distinction between Flux and Lux is a design philosophy choice between a stateful and a stateless API, which profoundly impacts how they interact with Julia’s automatic differentiation (AD) systems like Zygote.jl. Flux embodies a stateful, object-oriented approach. A Flux.Chain or Flux.Dense object is a mutable struct that encapsulates both the computational logic (the layer’s function) and the trainable parameters (weights and biases). When you compute a gradient, Zygote must traverse this complex struct to find the arrays to differentiate. This is often called “implicit parameter” handling, as the parameters are implicitly tied to the model object. While convenient, this can sometimes be fragile for complex models or when performing non-standard operations like differentiating with respect to hyperparameters.

Lux, in contrast, implements a stateless, functional design. A Lux layer is an immutable struct containing only the model’s architecture and hyperparameters. The trainable parameters (params) and non-trainable state (e.g., running means in BatchNorm, state) are stored separately, typically in a NamedTuple. The model’s forward pass is an explicit function model(x, params, state). When computing gradients, Zygote receives the params data structure directly, which is a much cleaner and more robust target for differentiation. This “explicit parameter” approach makes Lux highly composable and eliminates an entire class of mutation-related bugs, which is critical for advanced scientific machine learning (SciML) applications where models might be part of a differential equation solver.

When contrasted with mainstream Python frameworks, Flux.jl is philosophically very similar to PyTorch’s nn.Module API. Both are stateful, with the model object holding its own parameters (e.g., layer.weight in PyTorch). The call signature model(x) is identical, and the overall user experience is designed for convenience and familiarity. Keras takes this stateful abstraction even further with its high-level model.fit() API, though its underlying layers are conceptually similar to Flux and PyTorch.

Lux.jl, on the other hand, is philosophically aligned with Google’s JAX. JAX also enforces a functional, stateless paradigm where model logic (the apply function) is separate from the model parameters (“pytrees”). This explicit separation of parameters from the code that acts on them is what gives both Lux and JAX their power for complex gradient-based optimization tasks, such as meta-learning or taking derivatives through physics simulators. Therefore, the choice isn’t just about syntax; it’s about selecting a paradigm: the convenient, stateful approach of Flux/PyTorch or the robust, functional, and explicit approach of Lux/JAX.

We’ll use MLDatasets.jl and focus on the (quite simple) MNIST Digits example. The deep learning examples are in Jupyter notebooks.

2.1 Simple Linear Model Example

This notebook uses a linear model (!) to create a predictor of MNIST digits. It compares the one-vs-rest and one-vs-one approaches: Basic_MNIST.ipynb.

2.2 Flux example

This notebook uses Flux.jl and a simple dense neural network for MNIST: Flux_MNIST.ipynb.

2.3 Lux Example

This notebook uses Lux.jl and a simple dense neural network for MNIST: Lux_MNIST.ipynb.

2.4 DiffEqFlux Example

This example is taken from here:

using ComponentArrays, Lux, DiffEqFlux, OrdinaryDiffEq, Optimization, OptimizationOptimJL,
      OptimizationOptimisers, Random, CairoMakie

rng = Xoshiro(0)
u0 = Float32[2.0; 0.0]
datasize = 30
tspan = (0.0f0, 1.5f0)
tsteps = range(tspan[1], tspan[2]; length = datasize)

function trueODEfunc(du, u, p, t)
    true_A = [-0.1 2.0; -2.0 -0.1]
    du .= ((u .^ 3)'true_A)'
end

prob_trueode = ODEProblem(trueODEfunc, u0, tspan)
ode_data = Array(solve(prob_trueode, Tsit5(); saveat = tsteps))

dudt2 = Chain(x -> x .^ 3, Dense(2, 50, tanh), Dense(50, 2))
p, st = Lux.setup(rng, dudt2)
prob_neuralode = NeuralODE(dudt2, tspan, Tsit5(); saveat = tsteps)

function predict_neuralode(p)
    Array(prob_neuralode(u0, p, st)[1])
end

function loss_neuralode(p)
    pred = predict_neuralode(p)
    loss = sum(abs2, ode_data .- pred)
    return loss
end

function callback(state, l; doplot = false)
    print(l, ", ")
    if doplot
        pred = predict_neuralode(state.u)
        fig = Figure()
        ax = Axis(fig[1, 1], xlabel = "Time", ylabel = "Value")
        scatter!(ax, tsteps, ode_data[1, :]; label = "data")
        scatter!(ax, tsteps, pred[1, :]; label = "prediction")
        axislegend(ax)
        display(fig)
    end
    return false
end

pinit = ComponentArray(p)
callback((; u = pinit), loss_neuralode(pinit); doplot = true)

adtype = Optimization.AutoZygote()
optf = Optimization.OptimizationFunction((x, p) -> loss_neuralode(x), adtype)
optprob = Optimization.OptimizationProblem(optf, pinit)

result_neuralode = Optimization.solve(
    optprob, OptimizationOptimisers.Adam(0.05); callback = callback, maxiters = 300)

optprob2 = remake(optprob; u0 = result_neuralode.u)
result_neuralode2 = Optimization.solve(
    optprob2, Optim.BFGS(; initial_stepnorm = 0.01); callback, allow_f_increases = false)

callback((; u = result_neuralode2.u), loss_neuralode(result_neuralode2.u); doplot = true)

WARNING: using ComponentArrays.Axis in module Notebook conflicts with an existing identifier.
115.86056, 115.86056, 134.13478, 99.27806, 92.631744, 89.32249, 87.03288, 84.992805, 82.56437, 79.733955, 76.605064, 72.43982, 67.24375, 62.426205, 56.62334, 52.968777, 50.59937, 48.60809, 47.050667, 45.781147, 44.217678, 42.224262, 39.132874, 37.385983, 35.354633, 30.657675, 27.552977, 24.289154, 28.888393, 26.439344, 23.452856, 24.173042, 20.335749, 16.750864, 19.882008, 13.474715, 15.677546, 14.592738, 10.787688, 10.283253, 10.47135, 6.976977, 8.148048, 7.684682, 5.0752974, 7.853937, 4.6504197, 5.6793876, 5.8493757, 4.663749, 5.11066, 5.335164, 4.249895, 4.745072, 4.877847, 4.1434326, 4.3647695, 4.431662, 3.7633865, 3.9526281, 3.8927193, 3.6582022, 3.482847, 3.6429076, 3.1337278, 3.026495, 3.1401317, 2.884744, 2.7306743, 2.8843727, 2.5295622, 2.488722, 2.5253596, 2.3218603, 2.2727168, 2.2997744, 2.0985456, 2.069312, 2.0395, 1.9122494, 1.9510022, 1.8424008, 1.7759846, 1.7676568, 1.6780218, 1.6742858, 1.5994627, 1.5557271, 1.5346466, 1.4773846, 1.4721413, 1.4207536, 1.3910285, 1.3510042, 1.3411163, 1.2909039, 1.2796357, 1.2438791, 1.2203825, 1.1836278, 1.1660525, 1.1538435, 1.1320317, 1.1010377, 1.0794004, 1.0631014, 1.0553845, 1.0295786, 1.0133698, 0.99885875, 0.9842892, 0.9666816, 0.9613096, 0.9507694, 0.93365717, 0.9284953, 0.9069326, 0.90321034, 0.8856393, 0.879971, 0.8599536, 0.8462339, 0.8355609, 0.84087324, 0.8173478, 0.80280185, 0.78141195, 0.77342945, 0.76979727, 0.7483904, 0.7353541, 0.7295765, 0.7214811, 0.7112537, 0.69648445, 0.6832652, 0.6723806, 0.6670661, 0.65291715, 0.63783807, 0.6266792, 0.63179326, 0.61964136, 0.60361636, 0.6026382, 0.5850269, 0.57379824, 0.5699867, 0.5539907, 0.5460671, 0.5296639, 0.5232015, 0.5225693, 0.5192741, 0.5131874, 0.48888302, 0.48784995, 0.47871432, 0.46344092, 0.4669458, 0.4565755, 0.4488549, 0.44148058, 0.44882107, 0.42548576, 0.43642145, 0.43590072, 0.4119142, 0.43060243, 0.41323835, 0.39990056, 0.4253479, 0.3849039, 0.39344123, 0.38493973, 0.3653167, 0.38206127, 0.353074, 0.35787612, 0.35500547, 0.34035003, 0.33399943, 0.32762185, 0.33065453, 0.32440192, 0.31472772, 0.3190172, 0.30963275, 0.30287227, 0.29831418, 0.29639074, 0.28874856, 0.29036063, 0.28648403, 0.2829639, 0.2731951, 0.27705285, 0.29097742, 0.26612306, 0.25417143, 0.25803947, 0.24864127, 0.249155, 0.25415653, 0.23944299, 0.23744342, 0.2420137, 0.22449794, 0.22979261, 0.2321689, 0.21591376, 0.22628528, 0.22349681, 0.21692131, 0.20226137, 0.21796982, 0.205704, 0.20392878, 0.19666553, 0.18788384, 0.18807001, 0.18086141, 0.17958675, 0.18903327, 0.19421081, 0.18690945, 0.17050318, 0.16996078, 0.19524816, 0.2552351, 0.22599345, 0.17719351, 0.15787858, 0.1851592, 0.21344554, 0.16388732, 0.15500925, 0.21472119, 0.19194318, 0.14128166, 0.16334958, 0.17855924, 0.1493615, 0.1472201, 0.13651636, 0.14861795, 0.13756898, 0.15237975, 0.13183147, 0.14330631, 0.12348657, 0.14159173, 0.12714928, 0.11740889, 0.17089419, 0.18422517, 0.15555565, 0.12045679, 0.20886058, 0.24334925, 0.13514352, 0.14265928, 0.21140467, 0.13450685, 0.134889, 0.18795334, 0.109340444, 0.14491211, 0.19715367, 0.12983625, 0.11954098, 0.19701083, 0.11142535, 0.121585265, 0.15633593, 0.11355085, 0.10851991, 0.11992353, 0.0953752, 0.09401859, 0.14190839, 0.11044401, 0.088110834, 0.12628628, 0.08605299, 0.14973553, 0.30578187, 0.11835455, 0.14985424, 0.39052835, 0.09176212, 0.27156615, 0.38147852, 0.11263011, 0.5215897, 0.36455545, 0.24102668, 0.36766487, 0.12849475, 0.2701803, 0.11366359, 0.08605299, 0.08605346, 0.08208099, 0.08203324, 0.08071, 0.079000525, 0.06256753, 0.039137863, 0.034342945, 0.0330068, 0.029055059, 0.02905285, 0.02905285, 0.029023279, 0.029023279, 0.029018687, 0.028973049, 0.028973049, 0.02853902, 0.027750617, 0.025740305, 0.023225943, 0.02196586, 0.021556687, 0.020449013, 0.0190624, 0.01741115, 0.01586616, 0.01576992, 0.014503866, 0.013820826, 0.013656317, 0.013373488, 0.012766233, 0.012722027, 0.012293727, 0.011933266, 0.011610765, 0.010612816, 0.009992124, 0.009781772, 0.009768074, 0.009254279, 0.009104394, 0.008982644, 0.008873009, 0.008614004, 0.00856949, 0.008373931, 0.0082250405, 0.007934544, 0.0076639983, 0.0076155746, 0.0075750737, 0.0074945134, 0.0071252296, 0.0071037184, 0.0068901354, 0.0061981, 0.0061859163, 0.0061170436, 0.0060347216, 0.005966138, 0.005963063, 0.0059527047, 0.005734471, 0.005575459, 0.0055513526, 0.005462523, 0.005462523, 0.005462523, 0.005462523,

false

3 General Machine Learning

The main general machine learning package in Julia is MLJ - A Machine Learning Framework for Julia. Less popular, and older (probably not to use) are ScikitLearn.jl and Knet.jl.

MLJ collects hundreds of ML models of other packages under one roof. A brief MLJ intro is here. The tutorials page has dozens of worked examples.

In MLJ a model is an object that only serves as a container for the hyperparameters of the model. A machine is an object wrapping both a model and data and can contain information on the trained model; it does not fit the model by itself. However, it does check that the model is compatible with the scientific type of the data and will warn you otherwise.

The cheatsheet is also very useful. As our brief introduction to MLJ let’s consider a few key elements from the cheatsheet. Follow the MLJ tutorials for more detailed examples.

See the MLJ version:

using MLJ
MLJ_VERSION

v"0.20.8"

Retrieves registry metadata for a specific model:

info("PCA")

(name = "PCA",
 package_name = "MultivariateStats",
 is_supervised = false,
 abstract_type = MLJModelInterface.Unsupervised,
 constructor = nothing,
 deep_properties = (),
 docstring = "```\nPCA\n```\n\nA model type for constructing a pca, ...",
 fit_data_scitype =
     Tuple{ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}}},
 human_name = "pca",
 hyperparameter_ranges = (nothing, nothing, nothing, nothing),
 hyperparameter_types =
     ("Int64", "Symbol", "Float64", "Union{Nothing, Real, Vector{Float64}}"),
 hyperparameters = (:maxoutdim, :method, :variance_ratio, :mean),
 implemented_methods =
     [:clean!, :fit, :fitted_params, :inverse_transform, :transform],
 inverse_transform_scitype =
     ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}},
 is_pure_julia = true,
 is_wrapper = false,
 iteration_parameter = nothing,
 load_path = "MLJMultivariateStatsInterface.PCA",
 package_license = "MIT",
 package_url = "https://github.com/JuliaStats/MultivariateStats.jl",
 package_uuid = "6f286f6a-111f-5878-ab1e-185364afe411",
 predict_scitype = ScientificTypesBase.Unknown,
 prediction_type = :unknown,
 reporting_operations = (),
 reports_feature_importances = false,
 supports_class_weights = false,
 supports_online = false,
 supports_training_losses = false,
 supports_weights = false,
 target_in_fit = false,
 transform_scitype =
     ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}},
 input_scitype =
     ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}},
 target_scitype = ScientificTypesBase.Unknown,
 output_scitype =
     ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}})

Some models are in multiple packages:

info("RidgeRegressor")

ArgumentError: Ambiguous model type name. Use pkg=... .
The model RidgeRegressor is provided by these packages:
 ["MLJScikitLearnInterface", "MLJLinearModels", "MultivariateStats"].

Stacktrace:
 [1] handle(name::String; pkg::Nothing, interactive::Bool)
   @ MLJModels ~/.julia/packages/MLJModels/nxeCf/src/model_search.jl:111
 [2] handle
   @ ~/.julia/packages/MLJModels/nxeCf/src/model_search.jl:97 [inlined]
 [3] info(name::String; kwargs::@Kwargs{})
   @ MLJModels ~/.julia/packages/MLJModels/nxeCf/src/model_search.jl:154
 [4] top-level scope
   @ /work/julia-ml/Julia_ML_training/Julia_ML_training/unit5/unit_5.qmd:284

So we specify the package:

info("RidgeRegressor", pkg="MultivariateStats")

(name = "RidgeRegressor",
 package_name = "MultivariateStats",
 is_supervised = true,
 abstract_type = MLJModelInterface.Deterministic,
 constructor = nothing,
 deep_properties = (),
 docstring = "```\nRidgeRegressor\n```\n\nA model type for construct...",
 fit_data_scitype =
     Tuple{ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}}, AbstractVector{ScientificTypesBase.Continuous}},
 human_name = "ridge regressor",
 hyperparameter_ranges = (nothing, nothing),
 hyperparameter_types = ("Union{Real, AbstractVecOrMat}", "Bool"),
 hyperparameters = (:lambda, :bias),
 implemented_methods = [:clean!, :fit, :fitted_params, :predict],
 inverse_transform_scitype = ScientificTypesBase.Unknown,
 is_pure_julia = true,
 is_wrapper = false,
 iteration_parameter = nothing,
 load_path = "MLJMultivariateStatsInterface.RidgeRegressor",
 package_license = "MIT",
 package_url = "https://github.com/JuliaStats/MultivariateStats.jl",
 package_uuid = "6f286f6a-111f-5878-ab1e-185364afe411",
 predict_scitype = AbstractVector{ScientificTypesBase.Continuous},
 prediction_type = :deterministic,
 reporting_operations = (),
 reports_feature_importances = false,
 supports_class_weights = false,
 supports_online = false,
 supports_training_losses = false,
 supports_weights = false,
 target_in_fit = true,
 transform_scitype = ScientificTypesBase.Unknown,
 input_scitype =
     ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}},
 target_scitype = AbstractVector{ScientificTypesBase.Continuous},
 output_scitype = ScientificTypesBase.Unknown)

We can retrieve the model document string for the classifier, without loading model code:

doc("DecisionTreeClassifier", pkg="DecisionTree") # try this yourself

List metadata of every registered model:

models()

234-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :constructor, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :target_in_fit, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}:
 (name = ABODDetector, package_name = OutlierDetectionNeighbors, ... )
 (name = ABODDetector, package_name = OutlierDetectionPython, ... )
 (name = ARDRegressor, package_name = MLJScikitLearnInterface, ... )
 (name = AdaBoostClassifier, package_name = MLJScikitLearnInterface, ... )
 (name = AdaBoostRegressor, package_name = MLJScikitLearnInterface, ... )
 (name = AdaBoostStumpClassifier, package_name = DecisionTree, ... )
 (name = AffinityPropagation, package_name = Clustering, ... )
 (name = AffinityPropagation, package_name = MLJScikitLearnInterface, ... )
 (name = AgglomerativeClustering, package_name = MLJScikitLearnInterface, ... )
 (name = AutoEncoder, package_name = BetaML, ... )
 ⋮
 (name = TomekUndersampler, package_name = Imbalance, ... )
 (name = UnivariateBoxCoxTransformer, package_name = MLJModels, ... )
 (name = UnivariateDiscretizer, package_name = MLJModels, ... )
 (name = UnivariateFillImputer, package_name = MLJModels, ... )
 (name = UnivariateStandardizer, package_name = MLJModels, ... )
 (name = UnivariateTimeTypeToContinuous, package_name = MLJModels, ... )
 (name = XGBoostClassifier, package_name = XGBoost, ... )
 (name = XGBoostCount, package_name = XGBoost, ... )
 (name = XGBoostRegressor, package_name = XGBoost, ... )

lists models with a specific phrase in the model or package name:

models("tree")

44-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :constructor, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :target_in_fit, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}:
 (name = ABODDetector, package_name = OutlierDetectionNeighbors, ... )
 (name = BaggingClassifier, package_name = MLJScikitLearnInterface, ... )
 (name = BaggingRegressor, package_name = MLJScikitLearnInterface, ... )
 (name = COFDetector, package_name = OutlierDetectionNeighbors, ... )
 (name = DBSCAN, package_name = Clustering, ... )
 (name = DNNDetector, package_name = OutlierDetectionNeighbors, ... )
 (name = DecisionTreeClassifier, package_name = BetaML, ... )
 (name = DecisionTreeClassifier, package_name = DecisionTree, ... )
 (name = DecisionTreeRegressor, package_name = BetaML, ... )
 (name = DecisionTreeRegressor, package_name = DecisionTree, ... )
 ⋮
 (name = RandomForestRegressor, package_name = BetaML, ... )
 (name = RandomForestRegressor, package_name = DecisionTree, ... )
 (name = RandomForestRegressor, package_name = MLJScikitLearnInterface, ... )
 (name = SMOTENC, package_name = Imbalance, ... )
 (name = SRRegressor, package_name = SymbolicRegression, ... )
 (name = StableForestClassifier, package_name = SIRUS, ... )
 (name = StableForestRegressor, package_name = SIRUS, ... )
 (name = StableRulesClassifier, package_name = SIRUS, ... )
 (name = StableRulesRegressor, package_name = SIRUS, ... )

An example of ingesting data:

using RDatasets
channing = dataset("boot", "channing")
y, X = unpack(channing, ==(:Exit); rng=123)

(Int32[1000, 971, 953, 1093, 996, 1019, 932, 895, 1041, 1045  …  1006, 1028, 1003, 948, 1023, 926, 848, 1063, 989, 906], 462×4 DataFrame
 Row │ Sex     Entry  Time   Cens  
     │ Cat…    Int32  Int32  Int32 
─────┼─────────────────────────────
   1 │ Female    863    137      0
   2 │ Female    922     49      0
   3 │ Male      953      0      0
   4 │ Female   1003     90      1
   5 │ Male      893    103      0
   6 │ Female   1007     12      1
   7 │ Female    834     98      0
   8 │ Male      866     29      0
  ⋮  │   ⋮       ⋮      ⋮      ⋮
 456 │ Female    935     13      0
 457 │ Female    934     89      0
 458 │ Female    895     31      0
 459 │ Female    829     19      0
 460 │ Female   1024     39      1
 461 │ Female    852    137      0
 462 │ Male      898      8      0
                   447 rows omitted)

train, valid, test = partition(eachindex(y), 0.7, 0.2, rng=1234) # for 70:20:10 ratio

([287, 249, 295, 224, 454, 190, 368, 140, 177, 118  …  27, 191, 13, 317, 41, 263, 112, 84, 200, 461], [103, 88, 104, 274, 366, 223, 17, 426, 259, 286  …  22, 227, 85, 169, 121, 115, 149, 160, 316, 293], [283, 180, 347, 361, 11, 37, 289, 193, 337, 18  …  395, 229, 330, 142, 7, 211, 86, 120, 131, 290])

Machine construction (supervised):

using NearestNeighborModels
X, y = make_regression(1_000, 5) # synthetic data for regression
model = KNNRegressor(K=1)
mach_supervised = machine(model, X, y)

untrained Machine; caches model-specific representations of data
  model: KNNRegressor(K = 1, …)
  args: 
    1:  Source @767 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
    2:  Source @286 ⏎ AbstractVector{ScientificTypesBase.Continuous}

Machine construction (unsupervised):

model = OneHotEncoder()
mach_unsupervised = machine(model, X)

untrained Machine; caches model-specific representations of data
  model: OneHotEncoder(features = Symbol[], …)
  args: 
    1:  Source @966 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}

Fitting a machine (learning)

fit!(mach_supervised, rows=1:100, verbosity=2, force=false)

[ Info: Training machine(KNNRegressor(K = 1, …), …).

trained Machine; caches model-specific representations of data
  model: KNNRegressor(K = 1, …)
  args: 
    1:  Source @767 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
    2:  Source @286 ⏎ AbstractVector{ScientificTypesBase.Continuous}

fit!(mach_unsupervised, rows=1:100, verbosity=2, force=false)

[ Info: Training machine(OneHotEncoder(features = Symbol[], …), …).

trained Machine; caches model-specific representations of data
  model: OneHotEncoder(features = Symbol[], …)
  args: 
    1:  Source @966 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}

Prediction

predict(mach_supervised, rows=1:100)

100-element Vector{Float64}:
  0.49475789263607484
 -0.3123867525332733
 -0.2435290476384374
  0.19082356787823562
  0.16718424929116968
  0.005267407872798613
 -0.27416937708675215
 -0.2372265499875274
  0.6692490419147414
  0.4434811989216849
  ⋮
 -0.32457772034400606
 -0.295700488173373
  0.1828006035389508
 -0.9794116996923148
  0.45411671428993605
  0.0541118513140998
  0.7160715077281082
  0.8652856458015692
  0.5530351960411644

4 Selected topics from Statistics

See the JuliaStats organization. You can also see Nazarathy and Klok (2021). Let’s touch on the following statistics packages:

4.1 GLM

Here is a basic GLM example:

using DataFrames, AlgebraOfGraphics, CairoMakie
using GLM: lm, coef, @formula

# Simulated dataset: Linear relationship
x = randn(100)
y = 0.7*x.^2 + 2x .+ 1 + 0.5*randn(100)
df = DataFrame(x = x, y = y)

# Fit a linear model
model = lm(@formula(y ~ x + x^2), df)
coefs = coef(model)

# Predicted line
xs = range(minimum(df.x), maximum(df.x), length=100)
ys = coefs[1] .+ coefs[2] .* xs + coefs[3] .* xs.^2

df_pred = DataFrame(x = xs, y = ys)

plt = data(df) * mapping(:x, :y) * visual(Scatter) +
      data(df_pred) * mapping(:x, :y) * visual(Lines)

draw(plt)

4.2 Hypothesis Tests

using CSV, Distributions, HypothesisTests

data1 = CSV.read("../data/machine1.csv", header=false, DataFrame)[:,1]
data2 = CSV.read("../data/machine2.csv", header=false, DataFrame)[:,1]
xBar1, s1, n1 = mean(data1), std(data1), length(data1)
xBar2, s2, n2 = mean(data2), std(data2), length(data2)
delta0 = 0

sP = sqrt( ( (n1-1)*s1^2 + (n2-1)*s2^2 ) / (n1 + n2 - 2) )
testStatistic = ( xBar1-xBar2 - delta0 ) / ( sP * sqrt( 1/n1 + 1/n2) )
pVal = 2*ccdf(TDist(n1+n2 -2), abs(testStatistic))

println("Manually calculated test statistic: ", testStatistic)
println("Manually calculated p-value: ", pVal, "\n")
println(EqualVarianceTTest(data1, data2, delta0))

Manually calculated test statistic: 4.5466542394674425
Manually calculated p-value: 5.9493058655043084e-5

Two sample t-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          2.00881
    95% confidence interval: (1.113, 2.905)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-04

Details:
    number of observations:   [20,18]
    t-statistic:              4.5466542394674425
    degrees of freedom:       36
    empirical standard error: 0.44182145832893077

pvalue(EqualVarianceTTest(data1, data2, delta0))

5.9493058655043084e-5

4.3 Mixed Models

using MixedModels, DataFrames, RDatasets

# Load sleepstudy dataset from lme4 (same as in R)
df = dataset("lme4", "sleepstudy")
first(df, 5)

# Fit a linear mixed model:
# Reaction ~ Days + (Days | Subject)
# Days: fixed effect, (Days | Subject): random slope/intercept by Subject
model = fit(MixedModel,
    @formula(Reaction ~ 1 + Days + (1 + Days | Subject)),
    df
)

println(model)

Linear mixed model fit by maximum likelihood
 Reaction ~ 1 + Days + (1 + Days | Subject)
   logLik   -2 logLik     AIC       AICc        BIC    
  -875.9697  1751.9393  1763.9393  1764.4249  1783.0971

Variance components:
            Column    Variance Std.Dev.   Corr.
Subject  (Intercept)  565.51067 23.78047
         Days          32.68212  5.71683 +0.08
Residual              654.94145 25.59182
 Number of obs: 180; levels of grouping factors: 18

  Fixed-effects parameters:
──────────────────────────────────────────────────
                Coef.  Std. Error      z  Pr(>|z|)
──────────────────────────────────────────────────
(Intercept)  251.405      6.63226  37.91    <1e-99
Days          10.4673     1.50224   6.97    <1e-11
──────────────────────────────────────────────────

5 Selected topics from Optimization

Julia is a neat language for optimization. See for example Kochenderfer and Wheeler (2019).

The JuliaNLSolvers organization provides a few key packages where we’ll focus on Optim.jl which is “Univariate and multivariate optimization in Julia”. This is unconstrained continuous optimization.

The Jump.jl package is from a slightly different world of constrained (operations research style) optimization. See Lubin et al. (2023).

5.1 Some Optim.jl examples

Here is a Rosenbrock function.

\[ f(x, y) = (1 - x)^2 + 5 \, (y - x^2)^2 \]

Observe that \(f(x,y) \ge 0\) and \(f(1,1) = 0\). Hence a minimizer is \((x,y) = (1,1)\).

Here is the function in Julia:

rosenbrock(x, y) =  (1 - x)^2 + 5(y - x^2)^2

rosenbrock (generic function with 1 method)

Let’s first minimize via the gradient-free Nelder–Mead method:

using Optim

# let's make another method
rosenbrock(x::AbstractVector) = rosenbrock(x...)

result = optimize(rosenbrock, [10.0, 10.0], NelderMead())

println("Gradient-free result:")
println(" minimizer: ", Optim.minimizer(result))
println(" minimum value: ", Optim.minimum(result))

Gradient-free result:
 minimizer: [0.9999603237941868, 0.9999139955858123]
 minimum value: 1.7955517264105925e-9

Now let’s use BFGS which requires gradient information. First what is the gradient? As an illustration let’s use Symbolics.jl

using Symbolics

@variables x1 x2
rosenbrock_expr = (1 - x1)^2 + 10 * (x2 - x1^2)^2
rosenbrock_gradient_expr = Symbolics.gradient(rosenbrock_expr, [x1, x2])

2-element Vector{Num}:
 -2(1 - x1) - 40x1*(x2 - (x1^2))
                 20(x2 - (x1^2))

To see it nicely use Latexify.jl:

using Latexify
latexify(rosenbrock_gradient_expr)

\[\begin{equation} \left[ \begin{array}{c} - 2 \left( 1 - \mathtt{x1} \right) - 40 \mathtt{x1} \left( \mathtt{x2} - \mathtt{x1}^{2} \right) \\ 20 \left( \mathtt{x2} - \mathtt{x1}^{2} \right) \\ \end{array} \right] \end{equation}\]

Here is a Julia function of the gradient using build_function:

rosenbrock_gradient = eval(build_function(rosenbrock_gradient_expr, x1, x2)[1])
methods(rosenbrock_gradient)

# 1 method for anonymous function #8:

(::Main.Notebook.var"#8#9")(x1, x2) in Main.Notebook

@show rosenbrock_gradient(1, 1) # Needs to be 0 at a minimum
@show rosenbrock_gradient(1.01, 1.01); # Needs to prbably not be 0

rosenbrock_gradient(1, 1) = [0, 0]
rosenbrock_gradient(1.01, 1.01) = [0.4280399999999999, -0.20199999999999996]

function rosenbrock_gradient_vec!(storage, x)
    gx1, gx2 = rosenbrock_gradient(x...)
    storage[1] = gx1
    storage[2] = gx2
end

result = optimize(rosenbrock, rosenbrock_gradient_vec!, [10.0, 10.0], BFGS())

println("Gradient descent minimizer:", Optim.minimizer(result))
println("Minimum value:", Optim.minimum(result))

Gradient descent minimizer:[1.0000000000002782, 1.000000000000422]
Minimum value:1.6763954906954623e-25

Note however that we don’t need to supply the gradient if we don’t have it. We can just let optimize use automatic differenatiation.

result = optimize(rosenbrock, [10.0, 10.0], BFGS()) # no gradient given so uses autodiff

println("Gradient descent minimizer:", Optim.minimizer(result))
println("Minimum value:", Optim.minimum(result))

Gradient descent minimizer:[0.9999999998087856, 0.9999999997482931]
Minimum value:1.2200398486028414e-19

Note that in all these cases, we can also inspect the result of optimize:

result

 * Status: success

 * Candidate solution
    Final objective value:     1.220040e-19

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 3.80e-07 ≰ 0.0e+00
    |x - x'|/|x'|          = 3.80e-07 ≰ 0.0e+00
    |f(x) - f(x')|         = 6.53e-14 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 5.35e+05 ≰ 0.0e+00
    |g(x)|                 = 2.26e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    33
    f(x) calls:    95
    ∇f(x) calls:   95

Note also that Optimization.jl wraps Optim.jl and other packages. See the Optimization.jl docs.

5.2 A very simple example with JuMP

Let’s consider this linear programming problem:

\[ \begin{align*} \text{Maximize} \quad & x + 2y \\ \text{subject to} \quad & x + y \leq 5 \\ & x \geq 0 \\ & y \geq 0 \end{align*} \]

Here is a manual illustration of this problem:

vertices_x, vertices_y = [0, 5, 0], [0, 0, 5]

fig = Figure(size = (600, 600))
ax = Axis(fig[1,1]; xlabel="x", ylabel="y", title="Feasible Region and Objective")
poly!(ax, vertices_x, vertices_y, color = (:dodgerblue, 0.3), strokecolor=:black, strokewidth=1, label="Feasible Region")
lines!(ax, [0,5], [5,0], color=:black, linewidth=2, label="x + y = 5")
lines!(ax, [0,5.5],[0,0], color=:red, linestyle=:dash, label="y = 0")
lines!(ax, [0,0],[0,5.5], color=:green, linestyle=:dash, label="x = 0")
for c in 0:2:10
    xs = 0:0.1:5
    ys_obj = (c .- xs)./2
    mask = (ys_obj .>= 0) .& (ys_obj .<= 5)
    lines!(ax, xs[mask], ys_obj[mask], color=:purple, linestyle=:dot)
end
arrows!(ax, [1.0], [2.0], [0.5], [1.0], color = :purple, linewidth=3, arrowsize=20)
text!(ax, "Objective↑", position = (1.6, 3), color=:purple)

axislegend(ax; position=:rt)
fig

Now here it is solved using the DSL (domain specific language) ofJuMP. An early blog post about DSLs is here.

using JuMP, GLPK

model = JuMP.Model(GLPK.Optimizer)

@variable model x ≥ 0
@variable model y ≥ 0
@constraint model x + y ≤ 5
@objective model  Max (x + 2y)

optimize!(model)
println("Optimal x = ", value(x))
println("Optimal y = ", value(y))
println("Optimal objective value = ", objective_value(model))

WARNING: using JuMP.@variables in module Notebook conflicts with an existing identifier.
Optimal x = 0.0
Optimal y = 5.0
Optimal objective value = 10.0

JuMP can do much more and interface with state of the art commerical mixed integer linear programming solves. There are many resources available for JuMP. See Materials for learning JuMP as a start.

6 Additional online resources

A Machine Learning Fundamentals tutorial by A PumasAI.
A Machine Learning Unit in a University of Queensland Course (stay on the semester of that link - and not “current semester” which doesn’t have that unit).

7 Exercises

Consider this dataframe, df:

using DataFrames, Random, Distributions

Random.seed!(123)  # for reproducibility

n = 100
μ = [5.0, 13.0, -2.0]  # Means for x, y, z
Σ = [4.0  3.0  1.0;    # 3×3 covariance matrix, making variables correlated
      3.0  9.0  2.0;
      1.0  2.0  6.0]
mvnorm = MvNormal(μ, Σ)
data = rand(mvnorm, n)
df = DataFrame(x = data[1, :], y = data[2, :], z = data[3, :])

Use AlgebraOfGraphics.jl to plot a scatter plot of y versus x, with the color of the points representing the value of z.

Consider the example Flux.jl code on the Flux.jl GitHub README. Try to run that code (also replacing model with the commented out model). Do the plotting with Makie instead of Plots.
Now consider the Flux quickstart page. Try and reproduce that code, disabling CUDA if not relevant (if you don’t have a CUDA GPU).
Move onto Lux.jl. Consider the “Native Julia & Zygote” example on the Lux.jl GitHub README page. Run that example.
Now visit the examples directory of Lux.jl. Find one or two examples and try to reproduce them.
Consider Lab 8 - Tree-based models from DataScienceTutorials.jl. Try to reproduce Lab 8. As you can see there are many other tutorials/labs, so if you prefer a different one, carry out that one.
Consider the Incremental Training with MLJFlux tutorial. Execute the code in this tutorial.
Consider the Statistics with Julia book code example dealing with linear regression. Reproduce this code and study the output by inspecting model.
Go to the documentation for HypothesisTests.jl. Choose some test that you either know or interests you, and try to run the code on some example data you generate.
Reproduce the code from the MixedModels.jl quickstart.
Study the tips and tricks of Optim.jl. Try to reproduce some of the code.
Study the Getting started with JuMP page and try to reproduce the code there.

References

Kochenderfer, Mykel J, and Tim A Wheeler. 2019. Algorithms for Optimization. Mit Press. https://mitpress.mit.edu/9780262039420/algorithms-for-optimization/.

Liquet, Benoit, Sarat Moka, and Yoni Nazarathy. 2024. Mathematical Engineering of Deep Learning. CRC Press. https://deeplearningmath.org/.

Lubin, Miles, Oscar Dowson, Joaquim Dias Garcia, Joey Huchette, Benoît Legat, and Juan Pablo Vielma. 2023. “JuMP 1.0: Recent Improvements to a Modeling Language for Mathematical Optimization.” Mathematical Programming Computation 15: 581–89. https://doi.org/10.1007/s12532-023-00239-3.

Nazarathy, Yoni, and Hayden Klok. 2021. Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence. Springer Nature. https://statisticswithjulia.org/.

A Crash Course on the Julia Language and Ecosystem

1 Makie and AlgebraOfGraphics

1.1 AlgebraOfGraphics

1.2 General Makie

2 Deep Learning (and a bit of SciML)

2.1 Simple Linear Model Example

2.2 Flux example

2.3 Lux Example

2.4 DiffEqFlux Example

3 General Machine Learning

See the MLJ version:

Retrieves registry metadata for a specific model:

Some models are in multiple packages:

So we specify the package:

We can retrieve the model document string for the classifier, without loading model code:

List metadata of every registered model:

lists models with a specific phrase in the model or package name:

An example of ingesting data:

Machine construction (supervised):

Machine construction (unsupervised):

Fitting a machine (learning)

Prediction

4 Selected topics from Statistics

4.1 GLM

4.2 Hypothesis Tests

4.3 Mixed Models

5 Selected topics from Optimization

5.1 Some Optim.jl examples

5.2 A very simple example with JuMP

6 Additional online resources

7 Exercises

References