Multi-output GP with multi-inputs?

问题

I am trying to implement a multi-output GP in GPFlow with multi-dimensional input data.

I have seen from this issue in GPflow that a multi-dimensional input is possible by 'define a multidimensional base kernel and then apply the coregion on top of that'.

I have written the following code, I know for isotopic data (all outputs are obtained) one can use something alternatively like described in this notebook but here as I need to try ICM so let's continue with the code below.

However, when I try running the following code:

from gpflow.gpr import GPR
import gpflow
import numpy as np
from gpflow.kernels import Coregion


def f(x):
    def _y(_x):
        function_sum = 0
        for i in np.arange(0, len(_x) - 1):
            function_sum += (1 - _x[i]) ** 2 + 100 * ((_x[i + 1] - _x[i] ** 2) ** 2)
        return function_sum
    return np.atleast_2d([_y(_x) for _x in (np.atleast_2d(x))]).T


isotropic_X = np.random.rand(100, 2) * 4 - 2
Y1 = f(isotropic_X)
Y2 = f(isotropic_X) + np.random.normal(loc=2000, size=(100,1))
Y3 = f(isotropic_X) + np.random.normal(loc=-2000, size=(100,1))

# a Coregionalization kernel. The base kernel is Matern, and acts on the first ([0]) data dimension.
# the 'Coregion' kernel indexes the outputs, and actos on the second ([1]) data dimension
k1 = gpflow.kernels.Matern32(2)
coreg = Coregion(1, output_dim=3, rank=1, active_dims=[3]) # gpflow.kernels.Coregion(2, output_dim=2, rank=1)
coreg.W = np.random.rand(3, 1)
kern = k1 * coreg

# Augment the time data with ones or zeros to indicate the required output dimension
X_augmented = np.vstack((np.hstack((isotropic_X, np.zeros(shape=(isotropic_X.shape[0], 1)))),
                         np.hstack((isotropic_X, np.ones(shape=(isotropic_X.shape[0], 1)))),
                        np.hstack((isotropic_X, 2 * np.ones(shape=(isotropic_X.shape[0], 1))))))

# Augment the Y data to indicate which likeloihood we should use
Y_augmented = np.vstack((np.hstack((Y1, np.zeros(shape=(Y1.shape[0], 1)))),
                         np.hstack((Y2, np.ones(shape=(Y2.shape[0], 1)))),
                         np.hstack((Y3, 2 * np.ones(shape=(Y3.shape[0], 1))))))

# now buld the GP model as normal
m = GPR(X_augmented, Y_augmented, kern=kern)
m.optimize()

print(m.predict_f(np.array([[0.2, 0.2, 0], [0.4, 0.4, 0]])))

It returns me something like:

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Traceback (most recent call last):
  File "C:\Users\Administrator\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
    return fn(*args)
  File "C:\Users\Administrator\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Users\Administrator\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = 3 is not in [0, 3)
     [[{{node name.build_likelihood/name.kern.K/name.kern.coregion.K/GatherV2}}]]

So my questions are:
- What is this problem and how to enable multi-output GP with multi-dimension input
- I didn't quite get the workflow of gpflow with coregion, from this multi-output gp slide, The ICM returns output GP from a additive form of a latent process $u$ sampled from a GP parameterized by its weight $W$. But in the gpflow notebook demo I can't see any latent process of that and the notebooks says 'The 'Coregion' kernel indexes the outputs, and acts on the last ([1]) data dimension (indices) of the augmented X values', which is quite different than the slides, I am really confused about these different descriptions, any hint on these?

回答1:

The issue is simply with your offset indexing: the coregionalisation kernel should be

coreg = Coregion(input_dim=1, output_dim=3, rank=1, active_dims=[2])

Because active_dims=[2] indexes the third column.

Thanks for providing a fully reproducible example! I managed to run your code and succesfully optimize the model using a few steps of AdamOptimizer and then ScipyOptimizer, to a log-likelihood value of -2023.4.

来源：https://stackoverflow.com/questions/57361754/multi-output-gp-with-multi-inputs

标签

gpflow