Research + Projects - Dan Steinberg

This page presents a brief summary of some research and projects I have done to date, starting with newer work at the top.

Active Generation - for protein engineering

Active generation as implemented by variational search distributions (VSD).

In traditional protein engineering, researchers often rely on random mutagenesis or directed evolution to optimise the function of a protein. For example, designing enzymes to break down plastics. This can be resource-intensive and time-consuming due to the immense combinatorial possibilities of amino acid sequences. Active generation addresses this challenge by combining generative models of protein sequences with Bayesian optimisation techniques that identify which sequences, when tested, would provide the most valuable information to refine the predictive and generative models, and thereby more efficiently optimise the protein’s desired function.

In Variational Search Distributions (VSD) we apply variational inference to the problem of active generation, and introduce a flexible framework for designing sequences such as proteins, with formal guarantees. Software for VSD can be found here.

Causal Inference - Machine learning as a tool for evidence-based policy

Graphical representation of a relationships assumed in a simple causal model.

Machine learning (ML) can be a useful tool for observational causal inference studies, one of the cornerstones of evidence-based policy. ML can help us capture complex relationships in the data, thereby helping mitigate bias from model mis-specification. Also, use of regularisation in machine learning can lead to causal estimates with less error compared to unbiased methods when we have many related confounding factors in our data. I helped to write a blog post on this subject, and at Gradient Institute we have used machine learning for observational studies such as linking youth well-being to academic success. Reporting non-linear causal effects requires a new methodology, software for which we developed and can be found here.

Algorithmic Fairness - Fair Regression Algorithms

A simulated dataset depicting an unfair prediction under the “separation” and “sufficiency” fairness criteria.

Algorithmic fairness involves expressing notions such as equity, equality, or reasonable treatment, as quantifiable measures that a machine learning algorithm can optimise. Mathematising these concepts, so they can be inferred from data is challenging, as is deciding on the balance between fairness and other objectives such as accuracy in a particular application. My research in this area along with others at the Gradient Institute has thus far focused on regression algorithms. Measuring the fairness of a regression algorithm is difficult compared to the classification case for many popular fairness criteria. Similarly, adjusting the predictions of a regressor is more complex than doing so for a classifier, and so our research has been targeting these areas. Here you can read more about measurement, and adjusting regression algorithms.

Landshark - Large-scale Spatial Inference with Tensorflow

The predictive entropy (uncertainty) of the concentration of an element in soils in Western Australia.

Landshark is a set of python command line tools that for supervised learning problems on large spatial raster datasets. It solves problems in which the user has a set of target point measurements, such as geochemistry, soil classification, or depth to basement, and wants to relate those to a number of raster covariates, like satellite imagery or geophysics, to predict the targets on the raster grid.

Landshark fills a particular niche: where we want to efficiently learn models with very large numbers of training points and/or very large covariate images using TensorFlow. Landshark is particularly useful for the case when the training data itself will not fit in memory, and must be streamed to a minibatch stochastic gradient descent algorithm for model learning.

Please see the Landshark project page for more information.

Aboleth - A TensorFlow Framework for Bayesian Deep Learning

Depiction of a Bayesian Neural Net that is easily constructed using Aboleth.

I am one of the primary creators of Aboleth, a bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation.

The purpose of Aboleth is to provide a set of high performance and light weight components for building Bayesian neural nets and approximate (deep) Gaussian process computational graphs. We aim for minimal abstraction over pure TensorFlow, so you can still assign parts of the computational graph to different hardware, use your own data feeds/queues, and manage your own sessions etc.

The project page is on github.

Revrand - Scalable Bayesian Generalised Linear Models

revrand uses recent advances in large scale kernel methods to approximate kernel machines, such as Gaussian processes, with linear models. Using this technology we can harness the inferential power of kernel machines while exploiting the scalability of linear models.

revrand also uses recent advances in variational inference to accurately approximate fully Bayesian posteriors for non-conjugate models, such as generalised linear models. In this way it can provide comprehensive measures of uncertainty in its predictions.

I am the project creator and primary contributor to revrand, a software library implements Bayesian linear models (Bayesian linear regression) and generalised linear models. A few features of this library are:

A basis functions/feature composition framework for combining basis functions like radial basis functions, sigmoidal basis functions, polynomial basis functions etc.
Basis functions that can be used to approximate Gaussian processes with shift invariant covariance functions (e.g. square exponential) when used with linear models.
Non-Gaussian likelihoods with Bayesian generalised linear models using a modified version of the nonparametric variational inference algorithm with large scale learning using stochastic gradients (ADADELTA, Adam and others).

The project page is on github.

Extended and Unscented Kitchen Sinks

Example results of the extended kitchen sinks (EKS) algorithm on an interpreted seismic inversion problem, where we wish to infer the below ground structure of the Earth from sound wave reflection times. The inferred rock-type layer boundaries (left) and seismic velocities (right) are shown in blue, indicating the predictive means and standard deviation envelopes. Draws from the MCMC inversion are overlaid in dotted black.

In this work we extended our Bayesian nonparametric algorithms for inverse problems, the unscented and extended Gaussian processes, to work with multiple outputs and over large datasets. The new algorithms are called unscented and extended kitchen sinks (EKS and UKS) since they use the random kitchen sink (or basis function) approximation for scaling kernel machines. This approximation allows us to straightforwardly enable the EKS and UKS to work in multiple output scenarios as well, enabling these algorithms to be useful for a wide variety of complex nonlinear inversion problems, such as geophysical inversions.

The impact of computerisation and automation on future employment

Weighted probability of job loss through computerisation and automation in local government areas of Australia.

This work is a qualitative study into the susceptibility of jobs in Australia to computerisation and automation over the next 10 to 15 years. The methodology and initial data used is based on the much-cited paper by Frey and Osborne, which studied this same problem for the United States (US) and, more recently, for the United Kingdom (UK). The key to this work is trying to understand and quantify the impact of emerging technology on jobs and employment in areas such as artificial intelligence, robotics and machine learning.

The results show that 40 per cent of jobs in Australia have a high probability of being susceptible to computerisation and automation in the next 10 to 15 years. Jobs in administration and some services are particularly susceptible, as are regions that have historically associated with the mining industry. Jobs in the professions, in technical and creative industries, and in personal service areas (health for example) are least susceptible to automation. The report can be found here.

Nonparametric Bayesian Inverse Problems

Example of learning the unscented Gaussian process (UGP) with a non-differentiable nonlinear function (forward model) in the likelihood - a polynomial with one term in a signum function. Here only the black dots are seen by the algorithm, and the nonlinear function transforming the blue line to the green is known, but not its inverse. The aim is to estimate the latent function (blue line) from the black dots, without knowing the inverse function. In this figure we show the predictive distributions of the latent function (red dashed line and standard deviation bounds) and of the observations (green line and standard deviation bounds).

Nonlinear inversion problems, where we wish to infer the latent inputs to a system given observations of its output and the system’s forward-model, have a long history in the natural sciences, dynamical modeling and estimation. An example is the robot-arm inverse kinematics problem, where we wish to infer how to drive the robot’s joints (i.e. joint torques) in order to place the end-effector in a particular position, given we can measure its position and know the forward kinematics of the arm. Most of the existing algorithms either estimate the system inputs at a particular point in time like the Levenberg-Marquardt algorithm, or in a recursive manner such as the extended and unscented Kalman filters (EKF, UKF). In many inversion problems we have a continuous process; a smooth trajectory of a robot arm for example. Non-parametric regression techniques like Gaussian processes seem applicable, and have been used in linear inversion problems.

In this work we present two new methods for inference in Gaussian process (GP) models with general nonlinear likelihoods. Inference is based on a variational framework where a Gaussian posterior is assumed and the likelihood is linearized about the variational posterior mean using either a Taylor series expansion or statistical linearization. We show that the parameter updates obtained by these algorithms are equivalent to the state update equations in the iterative extended and unscented Kalman filters respectively, hence we refer to our algorithms as extended and unscented GPs. The unscented GP treats the likelihood as a ‘black-box’ by not requiring its derivative for inference, so it also applies to non-differentiable likelihood models. We evaluate the performance of our algorithms on a number of synthetic inversion problems and a binary classification dataset. See our NIPS spotlight paper for more details.

Unsupervised Scene “Understanding”

Sample images belonging to image clusters found by an algorithm that can use both whole-image features and distributions of objects to describe images. The image clusters are shown row-wise.

The corresponding learned segment clusters to the images in the previous figure. The composition and proportions of these segment clusters (coloured regions) are fairly consistent within an image cluster.

For very large scientific datasets with many image classes and objects, producing the ground-truth data for supervised (trained) algorithms can represent a substantial, and potentially expensive, human effort. In these situations there is scope for the use of unsupervised approaches, such as clustering, which can model collections of images and automatically summarise their content without human training.

To explore how modelling context effects clustering results, I derived several new algorithms that simultaneously cluster images and segments (super-pixels) within images. These algorithms also model collections of photos such as photo albums. Images are defined by whole-scene descriptors and the distribution of “objects” (segment clusters) within them. The images and segments are clustered using this joint representation, which is also more interpretable by people. The intuition behind this approach is that by knowing something about the type of scene (image cluster), object detection (segment clustering) can be improved. That is, we are likely to find trees in a forest. Additionally, by knowing about the distribution and co-occurrence of objects in an image, we have a better idea of the type of scene (cows and grass most likely make a rural scene).

These algorithms for unsupervised scene understanding outperform other unsupervised algorithms for segment and scene clustering. This is because of how they model context. These algorithms were even found to be competitive with state of the art supervised and semi-supervised approaches to scene understanding, as well as being scalable to larger datasets. See my ICCV paper, CVIU article and my thesis (ch. 5 & 6) for more information.

Clustering Images Over Many Datasets

Large image collections are frequently partitioned into distinct but related groups, such as photo albums from distinct environments that contain similar scenes. For example, a hiking holiday album may contain many images of forests and maybe a few villages. Whereas a conference trip album may have many urban scenes and images of people, with perhaps a few images of park-land. These groups, or albums, may be thought of as providing context for the images they contain.

I have formulated and applied a latent Dirichlet allocation-like algorithm to this problem. It shares image clusters between groups or albums, and keeps the proportion of clusters (mixture-weights) specific to each group, thereby modelling the context of the group. By doing this, the algorithm is actually better at finding clusters, and is often faster when dealing with large datasets, than regular mixture model based approaches. See my thesis (ch.4) for more information.

Here 10,300 images from 12 holiday photo albums are clustered. Shown are the most and least “likely” images from seven clusters (out of 23). Also shown are the most frequent five tags from Flickr associated with the clusters. The algorithms that could model these photo albums found more self-consistent clusters than the algorithms that count not, such as regular mixture models. This took less than a minute to run. Again, these are entirely unsupervised algorithms

Clustering Images of the Seafloor

I have applied a Bayesian non-parametric algorithm, the variational Dirichlet process (with Gaussian clusters), to clustering large quantities of seafloor imagery (obtained from an autonomous underwater vehicle or AUV) in an unsupervised manner. The algorithm has the attractive property that it does not require knowledge of the number of clusters to be specified, which enables truly autonomous sensor data abstraction. The underlying image representation uses descriptors for colour, texture and 3D structure that are obtained from stereo cameras. This approach consistently produces easily recognisable clusters that approximately correspond to different habitat types. These clusters are useful in observing spatial patterns, focusing expert analysis on subsets of seafloor imagery, aiding mission planning, and potentially informing real time adaptive sampling. See my ISRR paper for more details.

An example of images from an AUV survey that have been clustered. This survey has 10,000 images within it. Some sample images belonging to each of the 6 clusters found by the algorithm are shown row-wise. The algorithm used only took a few seconds to obtain these results, and needed no human generated training data. That is, the algorithm found these image clusters with no human input.

Top-down mosaic of the survey.

Top-down view image locations coloured by image cluster labels.