Deep QEPs and DSPPs w/ Multiple Outputs¶

Introduction¶

In this example, we will demonstrate how to construct deep QEPs that can model vector-valued functions (e.g. multitask/multi-output QEPs).

This tutorial can also be used to construct multitask deep sigma point processes by replacing DeepQEPLayer/DeepQEP/DeepApproximateMLL with DSPPLayer/DSPP/DeepPredictiveLogLikelihood.

[2]:

import os
import torch
import tqdm
import math
import qpytorch
from torch.nn import Linear
from qpytorch.means import ConstantMean, LinearMean
from qpytorch.kernels import MaternKernel, ScaleKernel
from qpytorch.variational import VariationalStrategy, CholeskyVariationalDistribution, \
    LMCVariationalStrategy
from qpytorch.distributions import MultivariateQExponential
from qpytorch.models.deep_qeps import DeepQEPLayer, DeepQEP
from qpytorch.mlls import DeepApproximateMLL, VariationalELBO
from qpytorch.likelihoods import MultitaskQExponentialLikelihood
from matplotlib import pyplot as plt

smoke_test = ('CI' in os.environ)
%matplotlib inline

Set up training data¶

In the next cell, we set up the training data for this example. We’ll be using 100 regularly spaced points on [0,1] which we evaluate the function on and add Gaussian noise to get the training labels.

We’ll have four functions - all of which are some sort of sinusoid. Our train_targets will actually have two dimensions: with the second dimension corresponding to the different tasks.

[3]:

train_x = torch.linspace(0, 1, 100)

train_y = torch.stack([
    torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,
    torch.cos(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,
    torch.sin(train_x * (2 * math.pi)) + 2 * torch.cos(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,
    -torch.cos(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,
], -1)

train_x = train_x.unsqueeze(-1)

Structure of a multitask deep QEP¶

The layers of a multitask deep QEP will look identical to the layers of a single-output deep QEP.

[4]:

# Here's a simple standard layer
POWER = 1.0
class DQEPHiddenLayer(DeepQEPLayer):
    def __init__(self, input_dims, output_dims, num_inducing=128, linear_mean=True):
        self.power = torch.tensor(POWER)
        inducing_points = torch.randn(output_dims, num_inducing, input_dims)
        batch_shape = torch.Size([output_dims])

        variational_distribution = CholeskyVariationalDistribution(
            num_inducing_points=num_inducing,
            batch_shape=batch_shape,
            power=self.power
        )
        variational_strategy = VariationalStrategy(
            self,
            inducing_points,
            variational_distribution,
            learn_inducing_locations=True
        )

        super().__init__(variational_strategy, input_dims, output_dims)
        self.mean_module = ConstantMean() if linear_mean else LinearMean(input_dims)
        self.covar_module = ScaleKernel(
            MaternKernel(nu=2.5, batch_shape=batch_shape, ard_num_dims=input_dims),
            batch_shape=batch_shape, ard_num_dims=None
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return MultivariateQExponential(mean_x, covar_x, power=self.power)

The main body of the deep QEP will look very similar to the single-output deep QEP, with a few changes.

Most importantly - the last layer will have output_dims=num_tasks, rather than output_dims=None. As a result, the output of the model will be a MultitaskMultivariateQExponential rather than a standard MultivariateQExponential distribution.

There are two other small changes, which are noted in the comments.

[5]:

num_tasks = train_y.size(-1)
num_hidden_dgp_dims = 3


class MultitaskDeepQEP(DeepQEP):
    def __init__(self, train_x_shape):
        hidden_layer = DQEPHiddenLayer(
            input_dims=train_x_shape[-1],
            output_dims=num_hidden_dgp_dims,
            linear_mean=True
        )
        last_layer = DQEPHiddenLayer(
            input_dims=hidden_layer.output_dims,
            output_dims=num_tasks,
            linear_mean=False
        )

        super().__init__()

        self.hidden_layer = hidden_layer
        self.last_layer = last_layer

        # We're going to use a multitask likelihood instead of the standard QExponentialLikelihood
        self.likelihood = MultitaskQExponentialLikelihood(num_tasks=num_tasks)

    def forward(self, inputs):
        hidden_rep1 = self.hidden_layer(inputs)
        output = self.last_layer(hidden_rep1)
        return output

    def predict(self, test_x):
        with torch.no_grad():

            # The output of the model is a multitask QEP, where both the data points
            # and the tasks are jointly distributed
            # To compute the marginal predictive NLL of each data point,
            # we will call `to_data_uncorrelated_dist`,
            # which removes the data cross-covariance terms from the distribution.
            preds = model.likelihood(model(test_x)).to_data_uncorrelated_dist()

        return preds.mean.mean(0), preds.variance.mean(0)


model = MultitaskDeepQEP(train_x.shape)

Training and making predictions¶

This code should look similar to the DQEP training code.

[6]:

model.train()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
mll = DeepApproximateMLL(VariationalELBO(model.likelihood, model, num_data=train_y.size(0)))

num_epochs = 1 if smoke_test else 200
epochs_iter = tqdm.notebook.tqdm(range(num_epochs), desc="Epoch")
for i in epochs_iter:
    optimizer.zero_grad()
    output = model(train_x)
    loss = -mll(output, train_y)
    epochs_iter.set_postfix(loss=loss.item())
    loss.backward()
    optimizer.step()

[7]:

# Make predictions
model.eval()
with torch.no_grad(), qpytorch.settings.fast_pred_var():
    test_x = torch.linspace(0, 1, 51).unsqueeze(-1)
    mean, var = model.predict(test_x)
    lower = mean - 2 * var.sqrt()
    upper = mean + 2 * var.sqrt()

# Plot results
fig, axs = plt.subplots(1, num_tasks, figsize=(4 * num_tasks, 3))
for task, ax in enumerate(axs):
    ax.plot(train_x.squeeze(-1).detach().numpy(), train_y[:, task].detach().numpy(), 'k*')
    ax.plot(test_x.squeeze(-1).numpy(), mean[:, task].numpy(), 'b')
    ax.fill_between(test_x.squeeze(-1).numpy(), lower[:, task].numpy(), upper[:, task].numpy(), alpha=0.5)
    ax.set_ylim([-3, 3])
    ax.legend(['Observed Data', 'Mean', 'Confidence'])
    ax.set_title(f'Task {task + 1}')
fig.tight_layout()
None

/Users/shiweilan/miniconda/envs/qpytorch/lib/python3.10/site-packages/linear_operator/utils/interpolation.py:71: UserWarning: torch.sparse.SparseTensor(indices, values, shape, *, device=) is deprecated.  Please use torch.sparse_coo_tensor(indices, values, shape, dtype=, device=). (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_new.cpp:620.)
  summing_matrix = cls(summing_matrix_indices, summing_matrix_values, size)

../../_images/examples_05_Deep_QExponential_Processes_DQEP_Multitask_Regression_10_1.png