In this section, we will take a look at the basic GPy package and attempt to use the library to perform Regression. We first import the necessary packages. Note that we change the default plotting engine to plotly so that we get an interactive experience.
import GPy
GPy.plotting.change_plotting_library('plotly')
import numpy as np
from IPython.display import display
We now experiment with 1 dimentional models. We will first investigate the effects of RBF kernels. We will declare some convenient functions to help us define our experiments. RBF Kernels are also known as squared exponential kernels. They are the de-factor kernel for GPs and SVM, due to its universal and robust nature. The main reason for this behavior is its infinite dimentional mapping to some infinite dimentional space.
The kernel for RBF is given:
$$ K(x,x') = \sigma^2 \exp\Big(-\frac{(x-x')^2}{2l^2}\Big) $$where the following are the parameters:
There are no hyperparameters in RBF kernel.
The RBF kernel is infinite-dimentional, we can simply expand the kernel. Without loss of generality, we can simply the kernel to $K(x,x') = \exp(-(x-x')^2)$. Next we simply expand the kernel using Taylor series of $e^x$, at $x=0$.
$$ \begin{aligned} \exp(-(x-x')^2) &= \exp(-x^2)\times \exp(-{x'}^2)\times \exp(2xx')\\ &= \exp(-x^2) \exp(-{x'}^2) \Bigg[1+2xx' + \frac{2^2{x'}^2x^2}{2!} + \cdots \Bigg] \\ &= \exp(-x^2) \exp(-{x'}^2) \sum^{\infty}_{k=0}\frac{2^k{x'}^kx^k}{k!}\\ &= \sum^{\infty}_{k=0}\Bigg[\sqrt{\frac{2^{k}}{k!}}\exp(-x^2)x^k \times \sqrt{\frac{2^{k}}{k!}}\exp(-{x'}^2){x'}^k\Bigg] \\ \end{aligned} $$We observe that it is taking a dot product over an infinite dimentions, in other words, the RBF kernel is formed by taking an infinite sum over polynomial kernels. This explains the power behind RBF kernels. RBF kernels map the current vector into an infinite dimentional space and compute the distance between each vector pairing. We also note that even though this is an infinite dimentional space, each variable is highly constrainted, so the solution space is not exactly totally unbounded.
We aim to look at the behavior of RBF kernels and investigate the effect of tweaking some of the parameters. We start by declaring some helper functions. We will use $f(x) = \sin(x)$ and gaussian noise in our experiments.
f = lambda x: np.sin(x)
g_noise = lambda sigma,n : np.random.normal(0, sigma, (n,1))
r_space = lambda l,h,n : np.random.uniform(l,h,(n,1))
def experiment(n, lower, upper, noise_sigma, f=f):
X = np.random.uniform(lower, upper, (n,1))
y = f(X) + g_noise(noise_sigma, n)
return X, y
We first experiment by allowing the GP to learn from the function $f$ with values generated from the range $[-2\pi, 2\pi]$. The $\sigma$ for the gaussian noise is set at $0.01$ which is quite small, relative to the $f(x)$ values.
X,y = experiment(100, (-2)*np.pi, (2)*np.pi, 0.01)
m = GPy.models.GPRegression(X,y, GPy.kern.RBF(input_dim=1))
display(m)
GPy.plotting.show(m.plot(), filename='RBF_001')
m.optimize_restarts(num_restarts = 10)
display(m)
GPy.plotting.show(m.plot(), filename='RBF_001_Trained')
We observe that the RBF kernel managed to fit the curve well despite the noise. We also observe that the model learnt the noise with the same variance that we had set it to be $0.01 ^ 2 = 0.0001$. We observe that the kernel did not learn the true function, as we observe that values that are on the curve at about $x=8$ is errornous, going up to about $1.5$, whereas we know that the function $f(x)$ can never produce $1.5$. However, as the lengthscale goes, the interpretation is fairly accurate, we cannot extrapolate much further than $2.4$. We now try the same experiment with more noise, at $0.1$.
X,y = experiment(100, (-2)*np.pi, (2)*np.pi, 0.1)
m = GPy.models.GPRegression(X,y, GPy.kern.RBF(input_dim=1))
display(m)
GPy.plotting.show(m.plot(), filename='RBF_01')
m.optimize_restarts(num_restarts = 10)
display(m)
GPy.plotting.show(m.plot(), filename='RBF_01_Trained')
We observe that the GP still does a good job fitting the data within the range of data, but the lengthscale is reduced drastically. Moreover we observe that the confidence bounds are wider now. It covers the range of which the values are at. We had dried a smooth function and now we will try a non-smooth function $f(x)=|x|$
X,y = experiment(100, -5, 5, 0.1, lambda x:np.abs(x))
m = GPy.models.GPRegression(X,y, GPy.kern.RBF(input_dim=1))
display(m)
GPy.plotting.show(m.plot(), filename='RBF_ABS_01')
m.optimize_restarts(num_restarts = 10)
display(m)
GPy.plotting.show(m.plot(), filename='RBF_ABS_01_Trained')
We see that the RBF kernel cannot generalize well for the absolute function. It performs poorly extrapolating the values beyond the range of the inputs. In short, RBF kernels and its derivatives work very well for interpolation of smooth function, but it may not generalize well beyond the lengthscale.