{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deep Q-Exponential Processes\n",
    "## Introduction\n",
    "\n",
    "In this notebook, we provide a QPyTorch implementation of [deep q-exponential processes](https://openreview.net/pdf?id=gV82fQTpqq), where training and inference is performed using the method of Salimbeni et al., 2017 (https://arxiv.org/abs/1705.08933) adapted to CG-based inference.\n",
    "\n",
    "We'll be training a simple two layer deep QEP on the `elevators` UCI dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "env: CUDA_VISIBLE_DEVICES=0\n"
     ]
    }
   ],
   "source": [
    "%set_env CUDA_VISIBLE_DEVICES=0\n",
    "\n",
    "import torch\n",
    "import tqdm\n",
    "import qpytorch\n",
    "from qpytorch.means import ConstantMean, LinearMean\n",
    "from qpytorch.kernels import RBFKernel, ScaleKernel\n",
    "from qpytorch.variational import VariationalStrategy, CholeskyVariationalDistribution\n",
    "from qpytorch.distributions import MultivariateQExponential\n",
    "from qpytorch.models import ApproximateQEP, QEP\n",
    "from qpytorch.mlls import VariationalELBO, AddedLossTerm\n",
    "from qpytorch.likelihoods import QExponentialLikelihood\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from qpytorch.models.deep_qeps import DeepQEPLayer, DeepQEP\n",
    "from qpytorch.mlls import DeepApproximateMLL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading Data\n",
    "\n",
    "For this example notebook, we'll be using the `elevators` UCI dataset used in the paper. Running the next cell downloads a copy of the dataset that has already been scaled and normalized appropriately. For this notebook, we'll simply be splitting the data using the first 80% of the data as training and the last 20% as testing.\n",
    "\n",
    "**Note**: Running the next cell will attempt to download a ~400 KB dataset file to the current directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.request\n",
    "import os\n",
    "from scipy.io import loadmat\n",
    "from math import floor\n",
    "\n",
    "\n",
    "# this is for running the notebook in our testing framework\n",
    "smoke_test = ('CI' in os.environ)\n",
    "\n",
    "\n",
    "if not smoke_test and not os.path.isfile('../elevators.mat'):\n",
    "    print('Downloading \\'elevators\\' UCI dataset...')\n",
    "    urllib.request.urlretrieve('https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk', '../elevators.mat')\n",
    "\n",
    "\n",
    "if smoke_test:  # this is for running the notebook in our testing framework\n",
    "    X, y = torch.randn(1000, 3), torch.randn(1000)\n",
    "else:\n",
    "    data = torch.Tensor(loadmat('../elevators.mat')['data'])\n",
    "    X = data[:, :-1]\n",
    "    X = X - X.min(0)[0]\n",
    "    X = 2 * (X / X.max(0)[0]) - 1\n",
    "    y = data[:, -1]\n",
    "\n",
    "\n",
    "train_n = int(floor(0.8 * len(X)))\n",
    "train_x = X[:train_n, :].contiguous()\n",
    "train_y = y[:train_n].contiguous()\n",
    "\n",
    "test_x = X[train_n:, :].contiguous()\n",
    "test_y = y[train_n:].contiguous()\n",
    "\n",
    "if torch.cuda.is_available():\n",
    "    train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.utils.data import TensorDataset, DataLoader\n",
    "train_dataset = TensorDataset(train_x, train_y)\n",
    "train_loader = DataLoader(train_dataset, batch_size=1024, shuffle=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Defining QEP layers\n",
    "\n",
    "In QPyTorch, defining a QEP involves extending one of our abstract QEP models and defining a `forward` method that returns the prior. For deep QEPs, things are similar, but there are two abstract QEP models that must be overwritten: one for hidden layers and one for the deep QEP model itself.\n",
    "\n",
    "In the next cell, we define an example deep QEP hidden layer. This looks very similar to every other variational QEP you might define. However, there are a few key differences:\n",
    "\n",
    "1. Instead of extending `ApproximateQEP`, we extend `DeepQEPLayer`.\n",
    "2. `DeepQEPLayers` need a number of input dimensions, a number of output dimensions, and a number of samples. This is kind of like a linear layer in a standard neural network -- `input_dims` defines how many inputs this hidden layer will expect, and `output_dims` defines how many hidden QEPs to create outputs for.\n",
    "\n",
    "In this particular example, we make a particularly fancy `DeepQEPLayer` that has \"skip connections\" with previous layers, similar to a ResNet."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "POWER = 1.0\n",
    "class ToyDeepQEPHiddenLayer(DeepQEPLayer):\n",
    "    def __init__(self, input_dims, output_dims, num_inducing=128, mean_type='constant'):\n",
    "        self.power = torch.tensor(POWER)\n",
    "        if output_dims is None:\n",
    "            inducing_points = torch.randn(num_inducing, input_dims)\n",
    "            batch_shape = torch.Size([])\n",
    "        else:\n",
    "            inducing_points = torch.randn(output_dims, num_inducing, input_dims)\n",
    "            batch_shape = torch.Size([output_dims])\n",
    "\n",
    "        variational_distribution = CholeskyVariationalDistribution(\n",
    "            num_inducing_points=num_inducing,\n",
    "            batch_shape=batch_shape,\n",
    "            power=self.power\n",
    "        )\n",
    "\n",
    "        variational_strategy = VariationalStrategy(\n",
    "            self,\n",
    "            inducing_points,\n",
    "            variational_distribution,\n",
    "            learn_inducing_locations=True\n",
    "        )\n",
    "\n",
    "        super(ToyDeepQEPHiddenLayer, self).__init__(variational_strategy, input_dims, output_dims)\n",
    "\n",
    "        if mean_type == 'constant':\n",
    "            self.mean_module = ConstantMean(batch_shape=batch_shape)\n",
    "        else:\n",
    "            self.mean_module = LinearMean(input_dims)\n",
    "        self.covar_module = ScaleKernel(\n",
    "            RBFKernel(batch_shape=batch_shape, ard_num_dims=input_dims),\n",
    "            batch_shape=batch_shape, ard_num_dims=None\n",
    "        )\n",
    "\n",
    "    def forward(self, x):\n",
    "        mean_x = self.mean_module(x)\n",
    "        covar_x = self.covar_module(x)\n",
    "        return MultivariateQExponential(mean_x, covar_x, power=self.power)\n",
    "\n",
    "    def __call__(self, x, *other_inputs, **kwargs):\n",
    "        \"\"\"\n",
    "        Overriding __call__ isn't strictly necessary, but it lets us add concatenation based skip connections\n",
    "        easily. For example, hidden_layer2(hidden_layer1_outputs, inputs) will pass the concatenation of the first\n",
    "        hidden layer's outputs and the input data to hidden_layer2.\n",
    "        \"\"\"\n",
    "        if len(other_inputs):\n",
    "            if isinstance(x, qpytorch.distributions.MultitaskMultivariateQExponential):\n",
    "                x = x.rsample()\n",
    "\n",
    "            processed_inputs = [\n",
    "                inp.unsqueeze(0).expand(qpytorch.settings.num_likelihood_samples.value(), *inp.shape)\n",
    "                for inp in other_inputs\n",
    "            ]\n",
    "\n",
    "            x = torch.cat([x] + processed_inputs, dim=-1)\n",
    "\n",
    "        return super().__call__(x, are_samples=bool(len(other_inputs)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the deep QEP\n",
    "\n",
    "Now that we've defined a class for our hidden layers and a class for our output layer, we can build our deep QEP. To do this, we create a `Module` whose forward is simply responsible for forwarding through the various layers.\n",
    "\n",
    "This also allows for various network connectivities easily. For example calling,\n",
    "```\n",
    "hidden_rep2 = self.second_hidden_layer(hidden_rep1, inputs)\n",
    "```\n",
    "in forward would cause the second hidden layer to use both the output of the first hidden layer and the input data as inputs, concatenating the two together."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_hidden_dims = 2 if smoke_test else 10\n",
    "\n",
    "\n",
    "class DeepQEP(DeepQEP):\n",
    "    def __init__(self, train_x_shape):\n",
    "        hidden_layer = ToyDeepQEPHiddenLayer(\n",
    "            input_dims=train_x_shape[-1],\n",
    "            output_dims=num_hidden_dims,\n",
    "            mean_type='linear',\n",
    "        )\n",
    "        \n",
    "        last_layer = ToyDeepQEPHiddenLayer(\n",
    "            input_dims=hidden_layer.output_dims,\n",
    "            output_dims=None,\n",
    "            mean_type='constant',\n",
    "        )\n",
    "        \n",
    "        super().__init__()\n",
    "        \n",
    "        self.hidden_layer = hidden_layer\n",
    "        self.last_layer = last_layer\n",
    "        self.likelihood = QExponentialLikelihood(power=torch.tensor(POWER))\n",
    "    \n",
    "    def forward(self, inputs):\n",
    "        hidden_rep1 = self.hidden_layer(inputs)\n",
    "        output = self.last_layer(hidden_rep1)\n",
    "        return output\n",
    "    \n",
    "    def predict(self, test_loader):\n",
    "        with torch.no_grad():\n",
    "            mus = []\n",
    "            variances = []\n",
    "            lls = []\n",
    "            for x_batch, y_batch in test_loader:\n",
    "                preds = self.likelihood(self(x_batch))\n",
    "                mus.append(preds.mean)\n",
    "                variances.append(preds.variance)\n",
    "                lls.append(model.likelihood.log_marginal(y_batch, model(x_batch)))\n",
    "        \n",
    "        return torch.cat(mus, dim=-1), torch.cat(variances, dim=-1), torch.cat(lls, dim=-1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = DeepQEP(train_x.shape)\n",
    "if torch.cuda.is_available():\n",
    "    model = model.cuda()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Objective function (approximate marginal log likelihood/ELBO)\n",
    "\n",
    "Because deep QEPs use some amounts of internal sampling (even in the stochastic variational setting), we need to handle the objective function (e.g. the ELBO) in a slightly different way. To do this, wrap the standard objective function (e.g. `~qpytorch.mlls.VariationalELBO`) with a `qpytorch.mlls.DeepApproximateMLL`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training/Testing\n",
    "\n",
    "The training loop for a deep QEP looks similar to a standard QEP model with stochastic variational inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f0a99adb9a52415a83f11bff024c17da",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Epoch:   0%|          | 0/10 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d9e34879eaa04e088f43f64b9eb82323",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Minibatch:   0%|          | 0/13 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d5bc133d5d6e4944a08be60e966ad5eb",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Minibatch:   0%|          | 0/13 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7a50adf5e2324f28b812b54c6a6363ce",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Minibatch:   0%|          | 0/13 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "dab214423dcd44099a840a8bf7a48f15",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Minibatch:   0%|          | 0/13 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f128c96a5bb7467ea574dd7dd41df715",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Minibatch:   0%|          | 0/13 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f9806ded4bb74fbd8552b46fd9bf3664",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Minibatch:   0%|          | 0/13 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "518e5f4a507342d2b1db0589676169f8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Minibatch:   0%|          | 0/13 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0437edb76261499c916f1bafa26c4e9d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Minibatch:   0%|          | 0/13 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "8aec125aefa2468d906099261c4ac3a3",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Minibatch:   0%|          | 0/13 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "cff00e46148a49528d591b9a9d4b4759",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Minibatch:   0%|          | 0/13 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# this is for running the notebook in our testing framework\n",
    "num_epochs = 1 if smoke_test else 10\n",
    "num_samples = 3 if smoke_test else 10\n",
    "\n",
    "\n",
    "optimizer = torch.optim.Adam([\n",
    "    {'params': model.parameters()},\n",
    "], lr=0.01)\n",
    "mll = DeepApproximateMLL(VariationalELBO(model.likelihood, model, train_x.shape[-2]))\n",
    "\n",
    "epochs_iter = tqdm.notebook.tqdm(range(num_epochs), desc=\"Epoch\")\n",
    "for i in epochs_iter:\n",
    "    # Within each iteration, we will go over each minibatch of data\n",
    "    minibatch_iter = tqdm.notebook.tqdm(train_loader, desc=\"Minibatch\", leave=False)\n",
    "    for x_batch, y_batch in minibatch_iter:\n",
    "        with qpytorch.settings.num_likelihood_samples(num_samples):\n",
    "            optimizer.zero_grad()\n",
    "            output = model(x_batch)\n",
    "            loss = -mll(output, y_batch)\n",
    "            loss.backward()\n",
    "            optimizer.step()\n",
    "\n",
    "            minibatch_iter.set_postfix(loss=loss.item())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output distribution of a deep QEP in this framework is actually a mixture of `num_samples` Gaussians for each output. We get predictions the same way with all QPyTorch models, but we do currently need to do some reshaping to get the means and variances in a reasonable form.\n",
    "\n",
    "Note that you may have to do more epochs of training than this example to get optimal performance; however, the performance on this particular dataset is pretty good after 10."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RMSE: 0.162799671292305, NLL: 0.09779592603445053\n"
     ]
    }
   ],
   "source": [
    "import qpytorch\n",
    "import math\n",
    "\n",
    "\n",
    "test_dataset = TensorDataset(test_x, test_y)\n",
    "test_loader = DataLoader(test_dataset, batch_size=1024)\n",
    "\n",
    "model.eval()\n",
    "predictive_means, predictive_variances, test_lls = model.predict(test_loader)\n",
    "\n",
    "rmse = torch.mean(torch.pow(predictive_means.mean(0) - test_y, 2)).sqrt()\n",
    "print(f\"RMSE: {rmse.item()}, NLL: {-test_lls.mean().item()}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}