I recently had the opportunity to give a seminar as part of the Oxford ML and Physics seminar series. I talked about probabilistic reasoning, and why it is not just about uncertainty but can be a superpower enabling us to tackle complex problems without training data.

Do give it a watch! I would also highly recommend browsing the Oxford ML and Physics Seminars YouTube channel, which is a treasure trove of fascinating talks on a huge variety of topics, from generative modelling of quantum states to seasonal sea ice forecasting.

The full abstract for my talk:

In both atmospheric…

In Part I of this series, we discussed the mathematics underpinning Gaussian process (GP) and Gaussian random field (GRF) models, how they can incorporate observations of either points or spatial averages, and how these two kinds of observation have differing effects on the model output.

As we’ll see, things get even more interesting when we combine both types of observation.

Why might we want to do this? Let’s look at a motivating situation. Suppose you have an autonomous vehicle recording a stream of observations relating to the rainfall at its current location. These tell you a lot about the situation…

Gaussian processes (GPs) are a probabilistic model used for a wide range of ML and stats. They appear in many different contexts including model emulation, spatial interpolation, Bayesian optimisation, and generative modelling. In a spatial context, they are often referred to as Gaussian random fields.

Most of the time, like any other ML method, the input data used to train¹ these models consists of a bunch of isolated examples. Having access to the function values at these input points, the challenge is then to predict the values at unobserved locations. So far, so familiar.

But these models are more flexible…

Imagine you are a scientist, studying a particular kind of event: perhaps a volcanic eruption or a lightning strike. You have some data in the form of a time series, which you are confident you can use to detect the event you care about. You can even describe with some confidence the characteristic signature by which the event will reveal itself in the data. Problem solved… right?

Unfortunately, real time series are often noisy and difficult to interpret. Even when we know what to look for, separating the signal from the noise can be a challenge. …

Working in the Informatics Lab often involves working with very large multidimensional datasets. The Pangeo ecosystem has great tools for working with this kind of data (such as xarray and Iris); however, getting to the point of being able to use these tools can be a painful process.

One solution to this problem is a library called Zarr, which is great at providing clean and intuitive cloud-native data access. However, not all the datasets we work with are stored as Zarr. A lot of the datasets we use at the Met Office are stored as NetCDF files. Converting these datasets…