{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Variational QEPs w/ Multiple Outputs\n", "\n", "## Introduction\n", "\n", "In this example, we will demonstrate how to construct approximate/variational QEPs that can model vector-valued functions (e.g. multitask/multi-output QEPs).\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import math\n", "import torch\n", "import qpytorch\n", "import tqdm\n", "from matplotlib import pyplot as plt\n", "\n", "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set up training data\n", "\n", "In the next cell, we set up the training data for this example. We'll be using 100 regularly spaced points on [0,1] which we evaluate the function on and add Gaussian noise to get the training labels.\n", "\n", "We'll have four functions - all of which are some sort of sinusoid. Our `train_targets` will actually have two dimensions: with the second dimension corresponding to the different tasks." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch.Size([100]) torch.Size([100, 4])\n" ] } ], "source": [ "train_x = torch.linspace(0, 1, 100)\n", "\n", "train_y = torch.stack([\n", " torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,\n", " torch.cos(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,\n", " torch.sin(train_x * (2 * math.pi)) + 2 * torch.cos(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,\n", " -torch.cos(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,\n", "], -1)\n", "\n", "print(train_x.shape, train_y.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define a multitask model\n", "\n", "We are going to construct a batch variational QEP - using a `CholeskyVariationalDistribution` and a `VariationalStrategy`. Each of the batch dimensions is going to correspond to one of the outputs. In addition, we will wrap the variational strategy to make the output appear as a `MultitaskMultivariateQExponential` distribution. Here are the changes that we'll need to make:\n", "\n", "1. Our inducing points will need to have shape `4 x m x 1` (where `m` is the number of inducing points). This ensures that we learn a different set of inducing points for each output dimension.\n", "1. The `CholeskyVariationalDistribution`, mean module, and covariance modules will all need to include a `batch_shape=torch.Size([4])` argument. This ensures that we learn a different set of variational parameters and hyperparameters for each output dimension.\n", "1. The `VariationalStrategy` object should be wrapped by a variational strategy that handles multitask models. We describe them below:\n", "\n", "\n", "### Types of Variational Multitask Models\n", "\n", "The most general purpose multitask model is the **Linear Model of Coregionalization** (LMC), which assumes that each output dimension (task) is the linear combination of some latent functions $\\mathbf g(\\cdot) = [g^{(1)}(\\cdot), \\ldots, g^{(Q)}(\\cdot)]$:\n", "\n", "$$ f_\\text{task}(\\mathbf x) = \\sum_{i=1}^Q a^{(i)} g^{(i)}(\\mathbf x), $$\n", "\n", "where $a^{(i)}$ are learnable parameters." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "num_latents = 3\n", "num_tasks = 4\n", "POWER = 1.0\n", "\n", "class MultitaskQEPModel(qpytorch.models.ApproximateQEP):\n", " def __init__(self, num_latents, num_tasks):\n", " self.power = torch.tensor(POWER)\n", " # Let's use a different set of inducing points for each latent function\n", " inducing_points = torch.rand(num_latents, 16, 1)\n", " \n", " # We have to mark the CholeskyVariationalDistribution as batch\n", " # so that we learn a variational distribution for each task\n", " variational_distribution = qpytorch.variational.CholeskyVariationalDistribution(\n", " inducing_points.size(-2), batch_shape=torch.Size([num_latents]), power=self.power\n", " )\n", " \n", " # We have to wrap the VariationalStrategy in a LMCVariationalStrategy\n", " # so that the output will be a MultitaskMultivariateNormal rather than a batch output\n", " variational_strategy = qpytorch.variational.LMCVariationalStrategy(\n", " qpytorch.variational.VariationalStrategy(\n", " self, inducing_points, variational_distribution, learn_inducing_locations=True\n", " ),\n", " num_tasks=num_tasks,\n", " num_latents=num_latents,\n", " latent_dim=-1\n", " )\n", " \n", " super().__init__(variational_strategy)\n", " \n", " # The mean and covariance modules should be marked as batch\n", " # so we learn a different set of hyperparameters\n", " self.mean_module = qpytorch.means.ConstantMean(batch_shape=torch.Size([num_latents]))\n", " self.covar_module = qpytorch.kernels.ScaleKernel(\n", " qpytorch.kernels.RBFKernel(batch_shape=torch.Size([num_latents])),\n", " batch_shape=torch.Size([num_latents])\n", " )\n", " \n", " def forward(self, x):\n", " # The forward function should be written as if we were dealing with each output\n", " # dimension in batch\n", " mean_x = self.mean_module(x)\n", " covar_x = self.covar_module(x)\n", " return qpytorch.distributions.MultivariateQExponential(mean_x, covar_x, power=self.power)\n", "\n", "\n", "model = MultitaskQEPModel(num_latents, num_tasks)\n", "likelihood = qpytorch.likelihoods.MultitaskQExponentialLikelihood(num_tasks=num_tasks, power=model.power)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With all of the `batch_shape` arguments - it may look like we're learning a batch of QEPs. However, `LMCVariationalStrategy` objects convert this batch_dimension into a (non-batch) MultitaskMultivariateQExponential." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([100, 4])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "likelihood(model(train_x)).rsample().shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The LMC model allows there to be linear dependencies between outputs/tasks. Alternatively, if we want independent output dimensions, we can replace `LMCVariationalStrategy` with `UncorrelatedMultitaskVariationalStrategy`:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "class UncorrelatedMultitaskQEPModel(qpytorch.models.ApproximateQEP):\n", " def __init__(self, num_tasks):\n", " self.power = torch.tensor(POWER)\n", " # Let's use a different set of inducing points for each task\n", " inducing_points = torch.rand(num_tasks, 16, 1)\n", " \n", " # We have to mark the CholeskyVariationalDistribution as batch\n", " # so that we learn a variational distribution for each task\n", " variational_distribution = qpytorch.variational.CholeskyVariationalDistribution(\n", " inducing_points.size(-2), batch_shape=torch.Size([num_tasks]), power=self.power\n", " )\n", " \n", " variational_strategy = qpytorch.variational.UncorrelatedMultitaskVariationalStrategy(\n", " qpytorch.variational.VariationalStrategy(\n", " self, inducing_points, variational_distribution, learn_inducing_locations=True\n", " ),\n", " num_tasks=num_tasks,\n", " )\n", " \n", " super().__init__(variational_strategy)\n", " \n", " # The mean and covariance modules should be marked as batch\n", " # so we learn a different set of hyperparameters\n", " self.mean_module = qpytorch.means.ConstantMean(batch_shape=torch.Size([num_tasks]))\n", " self.covar_module = qpytorch.kernels.ScaleKernel(\n", " qpytorch.kernels.RBFKernel(batch_shape=torch.Size([num_tasks])),\n", " batch_shape=torch.Size([num_tasks])\n", " )\n", " \n", " def forward(self, x):\n", " # The forward function should be written as if we were dealing with each output\n", " # dimension in batch\n", " mean_x = self.mean_module(x)\n", " covar_x = self.covar_module(x)\n", " return qpytorch.distributions.MultivariateQExponential(mean_x, covar_x, power=self.power)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that all the batch sizes for `UncorrelatedMultitaskVariationalStrategy` are now `num_tasks` rather than `num_latents`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Output modes\n", "\n", "By default, `LMCVariationalStrategy` and `UncorrelatedMultitaskVariationalStrategy` produce vector-valued outputs. In other words, they return a `MultitaskMultivariateQExponential` distribution -- containing all task values for each input.\n", "\n", "This is similar to the ExactQEP model described in the [multitask QEP regression tutorial](../03_Multitask_Exact_QEPs/Multitask_QEP_Regression.ipynb)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MultitaskMultivariateQExponential torch.Size([100, 4])\n" ] } ], "source": [ "output = model(train_x)\n", "print(output.__class__.__name__, output.event_shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, if each input is only associated **with a single task**, passing in the `task_indices` argument will specify which task to return for each input. The result will be a standard `MultivariateNormal` distribution -- where each output corresponds to each input's specified task.\n", "\n", "This is similar to the ExactQEP model described in the [Hadamard multitask GP regression tutorial](../03_Multitask_Exact_QEPs/Hadamard_Multitask_QEP_Regression.ipynb)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MultivariateQExponential torch.Size([5])\n" ] } ], "source": [ "x = train_x[..., :5]\n", "task_indices = torch.LongTensor([0, 1, 3, 2, 2])\n", "output = model(x, task_indices=task_indices)\n", "print(output.__class__.__name__, output.event_shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train the model\n", "\n", "This code should look similar to the SVGP training code" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/4m/ffmpvs751fg60zck9593w5sc0000gn/T/ipykernel_49594/1717849314.py:20: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0\n", "Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`\n", " epochs_iter = tqdm.tqdm_notebook(range(num_epochs), desc=\"Epoch\")\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ddb288e9f0d141e8adc89d2a0e63d7de", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Epoch: 0%| | 0/500 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Set into eval mode\n", "model.eval()\n", "likelihood.eval()\n", "\n", "# Initialize plots\n", "fig, axs = plt.subplots(1, num_tasks, figsize=(4 * num_tasks, 3))\n", "\n", "# Make predictions\n", "with torch.no_grad(), qpytorch.settings.fast_pred_var():\n", " test_x = torch.linspace(0, 1, 51)\n", " predictions = likelihood(model(test_x))\n", " mean = predictions.mean\n", " lower, upper = predictions.confidence_region()\n", " \n", "for task, ax in enumerate(axs):\n", " # Plot training data as black stars\n", " ax.plot(train_x.detach().numpy(), train_y[:, task].detach().numpy(), 'k*')\n", " # Predictive mean as blue line\n", " ax.plot(test_x.numpy(), mean[:, task].numpy(), 'b')\n", " # Shade in confidence \n", " ax.fill_between(test_x.numpy(), lower[:, task].numpy(), upper[:, task].numpy(), alpha=0.5)\n", " ax.set_ylim([-3, 3])\n", " ax.legend(['Observed Data', 'Mean', 'Confidence'])\n", " ax.set_title(f'Task {task + 1}')\n", "\n", "fig.tight_layout()\n", "None" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.12" } }, "nbformat": 4, "nbformat_minor": 4 }