Accessing eReefs data from the AIMS server

Basic access with OPeNDAP

Learn the basics of extracting eReefs data from the AIMS server with OPeNDAP in .

In this tutorial we will look at how to access eReefs data directly from the AIMS THREDDS server in R.

This server hosts aggregated eReefs model data in netCDF file format and offers access to the data files via OPeNDAP, HTTP Server, and Web Map Service (WMS). While we could download the data files manually via the HTTPServer link, this approach is cumbersome when downloading multiple files, given their large size. Thankfully, OPeNDAP provides a way to access the data files over the internet and extract only the data we want.

For example, say we want the daily mean surface temperature at a single location for the last 30 days. If we were to download the 30 individual daily aggregated netCDF files, with each file ~ 350 Mb, this would require us to download over 10 Gb of data just to get 300 numbers. The vast majority of this data would be irrelevant to our needs as the netCDF files contain data for a range of variables, at a range of depths, for many, many locations. However, with OPeNDAP, we can extract the daily mean values directly from the server without downloading any unneeded data.

Motivating problem

We will extract the daily mean water temperature for the 10th of December 2022 at 1.5 m depth across the entire scope of the eReefs model. We will then save and plot this data. This example will introduce the basics of how to connect to files on the server and extract the data we want.

R packages

library(RNetCDF) # working with netcdf files (incl. via OPeNDAP)
library(raster) # creating and manipuling rasters
library(rgdal) # package for geospatial analysis

While the ncdf4 package is commonly used to work with NetCDF files in R, it does not offer compatibility with OPeNDAP for Windows (only Mac and Linux). For this reason we will use the RNetCDF package which offers similar functionality and Windows compatibility with OPeNDAP. Note that if you are using Mac or Linux and wish to use ncdf4, the functions used herein have obvious analogues; for example ncdf4::nc_open() vs. RNetCDF::open.nc().

Connect to a file on the server

First we need to find the right NetCDF file on the server. The available eReefs data NetCDF files are listed in the AIMS THREDDS Server catalogue. We will navigate to the eReefs 4 km Hydrodynamic Model daily aggregated data for the month of December 2022 and copy the OPeNDAP data URL.

input_file <- "https://thredds.ereefs.aims.gov.au/thredds/dodsC/ereefs/gbr4_v2/daily-monthly/EREEFS_AIMS-CSIRO_gbr4_v2_hydro_daily-monthly-2022-12.nc"

We can then open a connection to this file using the RNetCDF::open.nc function.

dailyAggDec22.nc <- open.nc(input_file)

If you wish to download NetCDF files from the server you can click the HTTPServer link instead of OPeNDAP. The file can then be loaded into R by specifying the path: open.nc("<path to downloaded file>").

Extract data

Now that we have an open connection to a file on the server we need to extract the daily mean temperature at 1.5m depth for the 10th of December.

From the summary output above we can see that the variable corresponding to temperature is: \(\texttt{ temp(longitude, latitude, k, time)}\).

The dimensions for temperature are in brackets. This means that there is a temperature value for every combination of longitude, latitude, depth (k) and time. We can now see why these NetCDF files are so large.

To extract data from the file we will use the function

RNetCDF::var.get.nc(ncfile, variable, start=NA, count=NA, ...)

We need to give the function:

  • ncfile: a NetCDF file connection; in our case dailyAggDec22.nc.
  • variable: the name or id of the data variable we wish to extract; in our case "temp".
  • start: a vector of indices of where to start getting data, one for each dimension of the variable. Since we have \(\texttt{temp(longitude, latitude, k, time)}\) we need to tell the function where to start getting data along each of the four dimensions.
  • count: similar to start, but specifying the number of temperature values to extract along each dimension.

Let’s look at how to construct our start and count vectors.

The default values of start and count are NA, in which case all data for the given variable will be extracted.

Depth: Starting with depth is easy because we have a constant value of interest (1.5 m). The index k corresponds to different depths as shown in the table below, where we see that for the 4km models k=16 maps to a depth of 1.5 m.

Table of eReefs depths corresponding to index k
Index k (R) Index k (Python) Hydrodynamic 1km model Hydrodynamic & BioGeoChemical 4km models
1 0 -140.00 -145.00
2 1 -120.00 -120.00
3 2 -103.00 -103.00
4 3 -88.00 -88.00
5 4 -73.00 -73.00
6 5 -60.00 -60.00
7 6 -49.00 -49.00
8 7 -39.50 -39.50
9 8 -31.00 -31.00
10 9 -24.00 -23.75
11 10 -18.00 -17.75
12 11 -13.00 -12.75
13 12 -9.00 -8.80
14 13 -5.25 -5.55
15 14 -2.35 -3.00
16 15 -0.50 -1.50
17 16 NA -0.50

Time: Since we have the daily aggregated data for December 2022, and are interested only in a single day (the 10th), time is also a constant value. From the summary output we can see we have 31 time indexs, these correspond to the day of the month, therefore we want time=10.

Longitude and latitude: We want temperatures for every available longitude and latitude so we can plot the data across the entire spatial range of the eReefs model. Therefore we want to start at index 1 and count for the entire length of latitude and longitude. To get the lengths we could note the values from the summary output, where we see \(\texttt{longitude = 491}\) and \(\texttt{latitude = 723}\). However we could also get the lengths programatically.

lon <- var.get.nc(dailyAggDec22.nc, "longitude")
lat <- var.get.nc(dailyAggDec22.nc, "latitude")
data.frame(length(lon), length(lat))
  length.lon. length.lat.
1         491         723

Within the eReefs netCDF files, the dimensions \(\texttt{longitude, latitude, k, time}\) have corresponding variables longitude, latitude, zc, time (see summary output). Note that we would extract the dimension \(\texttt{k}\) variable with var.get.nc(..., variable = "zc").

Now we are ready to construct our start and count vectors and extract the data.

# SETUP START AND COUNT VECTORS
# Recall the order of the dimensions: (lon, lat, k , time)
# We start at lon=1, lat=1, k=16, time=10 and get temps for
# every lon and lat while holding depth and time constant
lon_st <- 1
lat_st <- 1
depth_st <- 16  # index k = 16 --> depth = 1.5 m
time_st <- 10   # index time = 10 --> 10th day of month

lon_ct <- length(lon) # get temps for all lons and lats
lat_ct <- length(lat)
time_ct <- 1  # Hold time and depth constant
depth_ct <- 1

start_vector <- c(lon_st, lat_st, depth_st, time_st)  
count_vector <- c(lon_ct, lat_ct, time_ct, depth_ct)

# EXTRACT DATA
temps_10Dec22_1p5m <- var.get.nc(
  ncfile = dailyAggDec22.nc, 
  variable = "temp",
  start = start_vector, 
  count = count_vector
)

# Get the size of our extracted data
dims <- dim(temps_10Dec22_1p5m)
data.frame(nrows = dims[1], ncols = dims[2])
  nrows ncols
1   491   723

Close file connection

Now that our extracted data is stored in memory, we should close the connection to the NetCDF file on the server.

close.nc(dailyAggDec22.nc)

Save the data

Now that we have our extracted data we may wish to save it for future use. To do this we first convert the data from its current form as a matrix array into a raster, and then save the raster as a netCDF file (GeoTIFF and other formats are also possible).

When converting extracted eReefs data to rasters we need to apply the transpose t() and flip() functions in order to get the correct orientation.

temps_raster <- temps_10Dec22_1p5m |>
  t() |>   # transpose temps matrix
  raster(  # create raster
    xmn = min(lon), xmx = max(lon), 
    ymn = min(lat), ymx = max(lat), 
    crs = CRS("+init=epsg:4326")
  ) |>
  flip(direction = 'y') # flip the raster
Warning in CPL_crs_from_input(x): GDAL Message 1: +init=epsg:XXXX syntax is
deprecated. It might return a CRS with a non-EPSG compliant axis order.
temps_raster
class      : RasterLayer 
dimensions : 723, 491, 354993  (nrow, ncol, ncell)
resolution : 0.0299389, 0.02995851  (x, y)
extent     : 142.1688, 156.8688, -28.69602, -7.036022  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : memory
names      : layer 
values     : 22.08588, 32.48817  (min, max)

In the code chunk above we used pipes |>. Pipes are very useful when passing a dataset through a sequence of functions. In the code above we take our extracted temps, transpose them, turn them into a raster, and then flip the raster. The final result is saved to temps_raster.

Now we have our raster (in the correct orientation), saving it is easy.

# Save the raster file in the /data subdirectory
# Note the '.nc' file extension since we are going to save as netCDF
save_file <- "data/ereefsDailyMeanWaterTemperature_10Dec2022_Depth1p5m.nc"

writeRaster(
  x = temps_raster, # what to save
  filename = save_file, # where to save it
  format = "CDF", # what format to save it as
  overwrite = TRUE # whether to replace any existing file with the same name
)

The raster package uses the ncdf4 package to deal with netCDF files. You can install this package by running install.packages("ncdf4") from an R console.

To prove to ourselves that this worked, lets read the file back in and plot it.

# Read back in the saved netCDF file as a raster
saved_temps <- raster(save_file)

# Plot the data
plot(saved_temps)

Figure 1: Extracted eReefs daily aggregated mean temperature at 1.5m depth for 10 December 2022.

Hooray! We can now see our saved data plotted in Figure 1.