\NewEnviron

todoenv[1][]inline, caption=2do, #1inline, caption=2do, #1todo: inline, caption=2do, #1 \BODY

Physics-Informed Gaussian Process Regression
Generalizes Linear PDE Solvers

\nameMarvin Pförtner1 \emailmarvin.pfoertner@uni-tuebingen.de
\nameIngo Steinwart2 \emailingo.steinwart@mathematik.uni-stuttgart.de
\namePhilipp Hennig1 \emailphilipp.hennig@uni-tuebingen.de
\nameJonathan Wenger1 \emailjonathan.wenger@uni-tuebingen.de
\addr1University of Tübingen, Tübingen AI Center
2University of Stuttgart
Abstract

Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for initialization. Such solvers are often embedded into larger scientific models with a downstream application and thus error quantification plays a key role. However, by ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error. In this work, we approach this problem in a principled fashion by interpreting solving linear PDEs as physics-informed Gaussian process (GP) regression. Our framework is based on a key generalization of the Gaussian process inference theorem to observations made via an arbitrary bounded linear operator. Crucially, this probabilistic viewpoint allows to (1) quantify the inherent discretization error; (2) propagate uncertainty about the model parameters to the solution; and (3) condition on noisy measurements. Demonstrating the strength of this formulation, we prove that it strictly generalizes methods of weighted residuals, a central class of PDE solvers including collocation, finite volume, pseudospectral, and (generalized) Galerkin methods such as finite element and spectral methods. This class can thus be directly equipped with a structured error estimate. In summary, our results enable the seamless integration of mechanistic models as modular building blocks into probabilistic models by blurring the boundaries between numerical analysis and Bayesian inference.

Keywords: physics-informed machine learning, probabilistic numerics, partial differential equations, method of weighted residuals, Galerkin methods, Gaussian processes, bounded linear operators

1 Introduction

Partial differential equations (PDEs) are powerful mechanistic models of static and dynamic systems with continuous spatial interactions (Borthwick, 2018). They are widely used in the natural sciences, especially in physics, and in applied fields like engineering, medicine and finance. Linear PDEs form a subclass describing physical phenomena such as heat diffusion (Fourier, 1822), electromagnetism (Maxwell, 1865), and continuum mechanics (Lautrup, 2005). Additionally, they are used in applications as diverse as computer graphics (Kazhdan et al., 2006), medical imaging (Holder, 2005), or option pricing (Black and Scholes, 1973).

Scientific inference with PDEs

Given a mechanistic model of a (physical) system in the form of a linear PDE 𝒟[𝒖]=f𝒟delimited-[]𝒖𝑓\mathcal{D}[{\bm{u}}]=fcaligraphic_D [ bold_italic_u ] = italic_f, where 𝒟𝒟\mathcal{D}caligraphic_D is a linear differential operator mapping between vector spaces of functions, the system can be simulated by solving the PDE subject to a set of linear boundary conditions (BC), given by a linear operator \mathcal{B}caligraphic_B and a function g𝑔gitalic_g defined on the boundary of the domain, s.t. [𝒖]=gdelimited-[]𝒖𝑔\mathcal{B}[{\bm{u}}]=gcaligraphic_B [ bold_italic_u ] = italic_g (Evans, 2010). For instance, given all material parameters and heat sources involved, a PDE can describe the temperature distribution in an electronic component, while the boundary conditions describe the heat flux out of the component at the surface. Since hardly any practically relevant PDE can be solved analytically (Borthwick, 2018), in practice, specialized numerical methods relying on discretization are employed. Often such solvers are embedded into larger scientific models, where model parameters are inferred from measurement and downstream analyses depend on the resulting simulation. For example, we would like to model whether said electronic component hits critical temperature thresholds during operation to assess its longevity.

Challenges when solving PDEs

When performing scientific inference with PDEs via numerical simulation, one is faced with three fundamental challenges.

  1. (C1)

    Limited computation. Any numerically computed solution 𝒖^𝒖^𝒖𝒖\hat{{\bm{u}}}\approx{\bm{u}}over^ start_ARG bold_italic_u end_ARG ≈ bold_italic_u suffers from approximation error. In practice, a sufficiently accurate simulation often requires vast amounts of computational resources.

  2. (C2)

    Partially-known physics. While the underlying physical mechanism is encoded in the formulation of the PDE, in practice, its exact parameters and boundary conditions are often unknown. For example, the position and strength of heat sources f𝑓fitalic_f within the aforementioned electric component are only approximately known. Similarly, material parameters like thermal conductivity, which define 𝒟𝒟\mathcal{D}caligraphic_D, can often only be estimated. Finally, the initial or boundary conditions [𝒖]=gdelimited-[]𝒖𝑔\mathcal{B}[{\bm{u}}]=gcaligraphic_B [ bold_italic_u ] = italic_g are also only partially known. For example, how much heat an electrical component dissipates via its surface.

  3. (C3)

    Error propagation. Limited computation and partially-known physics inevitably introduce error into the simulation. This resulting bias can fundamentally alter conclusions drawn from downstream analysis steps, in particular if these are sensitive to input variability. For example, an electronic component may be deemed safe based on the simulation, although its true internal temperature hits safety-critical levels repeatedly.

Solving PDEs as a learning problem

The challenges of scientific inference with PDEs are fundamentally issues of partial information. Here, we interpret solving a PDE as a learning problem, specifically as physics-informed regression, in the spirit of probabilistic numerics (Hennig et al., 2015; Cockayne et al., 2019b; Oates and Sullivan, 2019; Owhadi et al., 2019; Hennig et al., 2022). By leveraging the tools of Bayesian inference, we can tackle the challenges (C1), (C2) and (C3). As illustrated in figure 1(a), we model the solution of the PDE with a Gaussian process, which we condition on observations of the boundary conditions, the PDE itself and any physical measurements:

  • Encoding prior knowledge. We can efficiently leverage any available computation by encoding inductive bias about the solution of the PDE. For example, we can identify the solution space by “partial derivative counting”. Moreover, since PDEs typically model physical systems, expert knowledge is often available. This includes known physical properties of the system such as symmetries, as well as more subjective estimates from previous experience with similar systems or computationally cheap approximations.

  • Conditioning on the boundary conditions. The linear boundary conditions can be interpreted as measurements of the solution of the PDE on the boundary. By conditioning on (some of) these measurements, we are not limited to satisfying the boundary conditions exactly, but can directly model uncertain constraints without having to resort to point estimates. Instead, we propagate the uncertainty to the solution estimate. This also allows us to handle cases where we do not have a functional form g𝑔gitalic_g of the constraints, but only a discrete set of constraints at boundary points.

  • Conditioning on the PDE. Conditioning a probability measure over the solution on the analytic “observation” that the PDE holds is generally intractable. In the spirit of classic approaches for solving PDEs, we relax the PDE-constraint by requiring only a finite number of projections of the associated PDE residual onto carefully chosen test functions to be zero. This choice of projections defines the discretization and allows for control over the amount of expended computation. The resulting posterior quantifies the algorithm’s uncertainty within a whole set of solution candidates.

  • Conditioning on measurements. Finally, we can also condition on direct measurements of the solution itself. This is especially useful if parameters of the differential operator or boundary conditions are uncertain, or if the computational budget is restrictive.

The resulting posterior belief quantifies the uncertainty about the true solution induced by limited computation and partially-known physics (see figure 1(b)). By quantifying this error probabilistically, we can propagate it to any downstream analysis or decision. For example, to project the longevity of a newly designed electrical component, we want to simulate how likely it will hit a critical temperature threshold during operation. Given our posterior belief, we can simply compute the marginal probability instead of performing Monte-Carlo sampling, which would require repeated PDE solves at significant computational expense.

Refer to caption Prior Refer to caption Boundary Conditions Refer to caption PDE Refer to caption Measurements
(a) Learning to solve the Poisson equation. A problem-specific Gaussian process prior uu{\mathrm{u}}roman_u is conditioned on partially-known physics, given by uncertain boundary conditions (BC) and a linear PDE, as well as on noisy physical measurements from experiment. The boundary conditions and the right-hand side of the PDE are not known but inferred from a small set of noise-corrupted measurements. The plots juxtapose the belief u\nonscript|\nonscriptconditionalu\nonscript\nonscript{\mathrm{u}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\cdotsroman_u | ⋯ with the true solution usuperscript𝑢u^{\star}italic_u start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT of the latent boundary value problem.
Refer to caption
(b) Uncertainty quantification. Marginal posterior standard deviation after conditioning on uncertain boundary conditions, a linear PDE, and noisy (physical) measurements.
Refer to caption
(c) Generalization of Classical Solvers. For certain priors our framework reproduces any method of weighted residuals, e.g. the finite element method, in its posterior mean.
Figure 1: A physics-informed Gaussian process framework for the solution of linear PDEs.
Contribution

We introduce a probabilistic learning framework for the solution of (systems of) linear PDEs. Our framework can be viewed as physics-informed Gaussian process regression. It is based on a crucial generalization of a popular result on conditioning GPs on linear observations to observations made via an arbitrary bounded linear operator with values in nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT (theorem 1). This enables combined quantification of uncertainty from the inherent discretization error, uncertain initial or boundary conditions, as well as noisy measurements of the solution. While connections between GP inference and the solution of PDEs were made in the past (see Section 3.5), corresponding methods have largely focused on estimating strong solutions by leveraging finite difference or collocation schemes. In contrast, our framework applies to both weak and strong formulations and generalizes a significantly broader class of existing numerical methods. Our approach is a strict probabilistic generalization of methods of weighted residuals (corollary 4), including collocation, finite volume, (pseudo)spectral, and (generalized) Galerkin methods such as finite element methods. The resulting probabilistic methods thus have the same convergence properties as their classic counterparts, while providing a structured error estimate. Moreover, the probabilistic viewpoint allows to incorporate partially-known physics and (noisy) experimental measurements.

2 Background

2.1 Linear Partial Differential Equations

A linear partial differential equation (PDE) is an equation of the form

𝒟[𝒖]=f,𝒟delimited-[]𝒖𝑓\mathcal{D}[{\bm{u}}]=f,caligraphic_D [ bold_italic_u ] = italic_f , (2.1)

where 𝒟:𝕌𝕍:𝒟𝕌𝕍\mathcal{D}\colon{\mathbb{U}}\to{\mathbb{V}}caligraphic_D : blackboard_U → blackboard_V is a linear differential operator (see definition 23) between a Banach space 𝕌𝕌{\mathbb{U}}blackboard_U of dsuperscriptsuperscript𝑑\mathbb{R}^{d^{\prime}}blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT-valued functions and a Banach space 𝕍𝕍{\mathbb{V}}blackboard_V of real-valued functions on a common open and bounded domain 𝔻d𝔻superscript𝑑{\mathbb{D}}\subset\mathbb{R}^{d}blackboard_D ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and f𝕍𝑓𝕍f\in{\mathbb{V}}italic_f ∈ blackboard_V is the right-hand side function. For simplicity of exposition, we will often focus on the case d=1superscript𝑑1d^{\prime}=1italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1, in which case we write u𝑢uitalic_u instead of 𝒖𝒖{\bm{u}}bold_italic_u. Systems modeled by linear PDEs are often further constrained by linear boundary conditions (BCs) [𝒖]=gdelimited-[]𝒖𝑔\mathcal{B}[{\bm{u}}]=gcaligraphic_B [ bold_italic_u ] = italic_g describing the behavior of the system on the boundary 𝔻𝔻\partial{\mathbb{D}}∂ blackboard_D of the domain, where \mathcal{B}caligraphic_B is a linear operator mapping functions 𝒖𝕌𝒖𝕌{\bm{u}}\in{\mathbb{U}}bold_italic_u ∈ blackboard_U onto functions [𝒖]:𝔻:delimited-[]𝒖𝔻\mathcal{B}[{\bm{u}}]\colon\partial{\mathbb{D}}\to\mathbb{R}caligraphic_B [ bold_italic_u ] : ∂ blackboard_D → blackboard_R defined on the boundary and g:𝔻:𝑔𝔻g\colon\partial{\mathbb{D}}\to\mathbb{R}italic_g : ∂ blackboard_D → blackboard_R. Common types of boundary conditions for d=1superscript𝑑1d^{\prime}=1italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 are:

  • Dirichlet: Specify the values of the solution on the boundary, i.e. [u]=u|𝔻delimited-[]𝑢evaluated-at𝑢𝔻\mathcal{B}[u]=u|_{\partial{\mathbb{D}}}caligraphic_B [ italic_u ] = italic_u | start_POSTSUBSCRIPT ∂ blackboard_D end_POSTSUBSCRIPT.

  • Neumann: Specify the exterior normal derivative on the boundary, i.e. [u](𝒙)𝜼(𝒙)u(𝒙)delimited-[]𝑢𝒙subscript𝜼𝒙𝑢𝒙\mathcal{B}[u]({\bm{x}})\coloneqq\partial_{{\bm{\eta}}({\bm{x}})}u\left({\bm{x% }}\right)caligraphic_B [ italic_u ] ( bold_italic_x ) ≔ ∂ start_POSTSUBSCRIPT bold_italic_η ( bold_italic_x ) end_POSTSUBSCRIPT italic_u ( bold_italic_x ), where 𝜼(𝒙)𝜼𝒙{\bm{\eta}}({\bm{x}})bold_italic_η ( bold_italic_x ) is the exterior normal vector at each point of the boundary.

A PDE and a set of boundary conditions is referred to as a boundary value problem (BVP). A prototypical example of a linear PDE, used in thermodynamics, electrostatics and Newtonian gravity, is the Poisson equation Δu=fΔ𝑢𝑓-\Delta u=f- roman_Δ italic_u = italic_f, where Δu=i=1d2uxi2Δ𝑢superscriptsubscript𝑖1𝑑superscript2𝑢superscriptsubscript𝑥𝑖2\Delta u=\sum_{i=1}^{d}\frac{\partial^{2}u}{\partial{x}_{i}^{2}}roman_Δ italic_u = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_u end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG is the Laplacian.

2.1.1 Weak Formulation

Many models of physical phenomena are expressed as functions 𝒖𝒖{\bm{u}}bold_italic_u, which are not (continuously) differentiable or even continuous (Evans, 2010; Borthwick, 2018; von Harrach, 2021). In other words, they are not strong solutions to any PDE. There are also PDEs derived from established physical principles, which do not admit strong solutions at all. To address this, one can weaken the notion of differentiability leading to the concept of weak solutions. Many of the aforementioned physical phenomena are in fact weak solutions. As an example111Our exposition is a strongly abbreviated version of Evans (2010, Section 6.1.2). , consider the weak formulation of the stationary heat equation for non-homogeneous media

div(κu)=q˙V.div𝜅𝑢subscript˙𝑞𝑉-\operatorname{div}\left(\kappa\nabla u\right)=\dot{q}_{V}.- roman_div ( italic_κ ∇ italic_u ) = over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT . (2.2)

Assume that uC2(𝔻)𝑢superscript𝐶2𝔻u\in C^{2}({\mathbb{D}})italic_u ∈ italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_D ), κC1(𝔻)𝜅superscript𝐶1𝔻\kappa\in C^{1}({\mathbb{D}})italic_κ ∈ italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( blackboard_D ), and q˙VC0(𝔻)subscript˙𝑞𝑉superscript𝐶0𝔻\dot{q}_{V}\in C^{0}({\mathbb{D}})over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ∈ italic_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( blackboard_D ). If u𝑢uitalic_u is a solution to equation 2.2, then we can integrate both sides of the equation against a test function vCc(𝔻)𝑣superscriptsubscript𝐶𝑐𝔻v\in C_{c}^{\infty}\left({\mathbb{D}}\right)italic_v ∈ italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( blackboard_D ), i.e. an infinitely smooth function with compact support (see definition 24), which results in

𝔻div(κu)(𝒙)v(𝒙)d𝒙=𝔻q˙V(𝒙)v(𝒙)d𝒙.subscript𝔻div𝜅𝑢𝒙𝑣𝒙differential-d𝒙subscript𝔻subscript˙𝑞𝑉𝒙𝑣𝒙differential-d𝒙-\int_{{\mathbb{D}}}\operatorname{div}\left(\kappa\nabla u\right)\left({\bm{x}% }\right)v({\bm{x}})\,\mathrm{d}{\bm{x}}=\int_{{\mathbb{D}}}\dot{q}_{V}({\bm{x}% })v({\bm{x}})\,\mathrm{d}{\bm{x}}.- ∫ start_POSTSUBSCRIPT blackboard_D end_POSTSUBSCRIPT roman_div ( italic_κ ∇ italic_u ) ( bold_italic_x ) italic_v ( bold_italic_x ) roman_d bold_italic_x = ∫ start_POSTSUBSCRIPT blackboard_D end_POSTSUBSCRIPT over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( bold_italic_x ) italic_v ( bold_italic_x ) roman_d bold_italic_x .

Since both u𝑢uitalic_u and v𝑣vitalic_v are sufficiently differentiable, we can apply integration by parts (Green’s first identity) to the first integral to obtain

𝔻κ(𝒙)u(𝒙),v(𝒙)d𝒙B[u,v]=𝔻q˙V(𝒙)v(𝒙)d𝒙,subscriptsubscript𝔻𝜅𝒙𝑢𝒙𝑣𝒙differential-d𝒙absent𝐵𝑢𝑣subscript𝔻subscript˙𝑞𝑉𝒙𝑣𝒙differential-d𝒙\underbrace{\int_{{\mathbb{D}}}\langle\kappa({\bm{x}})\nabla u\left({\bm{x}}% \right),\nabla v\left({\bm{x}}\right)\rangle\,\mathrm{d}{\bm{x}}}_{\eqqcolon B% [u,v]}=\int_{{\mathbb{D}}}\dot{q}_{V}({\bm{x}})v({\bm{x}})\,\mathrm{d}{\bm{x}},under⏟ start_ARG ∫ start_POSTSUBSCRIPT blackboard_D end_POSTSUBSCRIPT ⟨ italic_κ ( bold_italic_x ) ∇ italic_u ( bold_italic_x ) , ∇ italic_v ( bold_italic_x ) ⟩ roman_d bold_italic_x end_ARG start_POSTSUBSCRIPT ≕ italic_B [ italic_u , italic_v ] end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT blackboard_D end_POSTSUBSCRIPT over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( bold_italic_x ) italic_v ( bold_italic_x ) roman_d bold_italic_x , (2.3)

since v|𝔻=0evaluated-at𝑣𝔻0v|_{\partial{\mathbb{D}}}=0italic_v | start_POSTSUBSCRIPT ∂ blackboard_D end_POSTSUBSCRIPT = 0. This expression does not require u𝑢uitalic_u to be twice differentiable. Rather, u𝑢uitalic_u only needs to be once weakly differentiable (see Evans 2010, Section 5.2.1) with (u)iL2(𝔻)subscript𝑢𝑖subscript𝐿2𝔻(\nabla u)_{i}\in L_{2}\left({\mathbb{D}}\right)( ∇ italic_u ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( blackboard_D ). Intuitively speaking, a weak derivative of a (classically non-differentiable) function “behaves like a derivative” when integrated against a smooth test function. These relaxed requirements on u𝑢uitalic_u are exactly the defining properties of the Sobolev space H1(𝔻)superscript𝐻1𝔻H^{1}\left({\mathbb{D}}\right)italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( blackboard_D ), i.e. it suffices that uH1(𝔻)𝑢superscript𝐻1𝔻u\in H^{1}\left({\mathbb{D}}\right)italic_u ∈ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( blackboard_D ). Similarly, we can weaken all other assumptions to vH01(𝔻)𝑣subscriptsuperscript𝐻10𝔻v\in H^{1}_{0}\left({\mathbb{D}}\right)italic_v ∈ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( blackboard_D ), q˙VL2(𝔻)subscript˙𝑞𝑉subscript𝐿2𝔻\dot{q}_{V}\in L_{2}\left({\mathbb{D}}\right)over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ∈ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( blackboard_D ) and κL(𝔻)𝜅subscript𝐿𝔻\kappa\in L_{\infty}\left({\mathbb{D}}\right)italic_κ ∈ italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( blackboard_D ). Then, for uH1(𝔻)𝑢superscript𝐻1𝔻u\in H^{1}\left({\mathbb{D}}\right)italic_u ∈ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( blackboard_D ) and vH01(𝔻)𝑣subscriptsuperscript𝐻10𝔻v\in H^{1}_{0}\left({\mathbb{D}}\right)italic_v ∈ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( blackboard_D ), equation 2.3 is equivalent to

𝒟w[u]=fw,superscript𝒟𝑤delimited-[]𝑢superscript𝑓𝑤\mathcal{D}^{w}[u]=f^{w},caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT [ italic_u ] = italic_f start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , (2.4)

where 𝒟w:H1(𝔻)H01(𝔻),uB[u,]:superscript𝒟𝑤formulae-sequencesuperscript𝐻1𝔻subscriptsuperscript𝐻10superscript𝔻maps-to𝑢𝐵𝑢\mathcal{D}^{w}\colon H^{1}\left({\mathbb{D}}\right)\to H^{1}_{0}\left({% \mathbb{D}}\right)^{\prime},u\mapsto B[u,\cdot]caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT : italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( blackboard_D ) → italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( blackboard_D ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_u ↦ italic_B [ italic_u , ⋅ ] and fw=q˙V,L2(𝔻)H01(𝔻)superscript𝑓𝑤subscriptsubscript˙𝑞𝑉subscript𝐿2𝔻subscriptsuperscript𝐻10superscript𝔻f^{w}=\langle\dot{q}_{V},\cdot\rangle_{L_{2}\left({\mathbb{D}}\right)}\in H^{1% }_{0}\left({\mathbb{D}}\right)^{\prime}italic_f start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT = ⟨ over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , ⋅ ⟩ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( blackboard_D ) end_POSTSUBSCRIPT ∈ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( blackboard_D ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Here, H01(𝔻)subscriptsuperscript𝐻10superscript𝔻H^{1}_{0}\left({\mathbb{D}}\right)^{\prime}italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( blackboard_D ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT denotes the continuous dual space of H01(𝔻)subscriptsuperscript𝐻10𝔻H^{1}_{0}\left({\mathbb{D}}\right)italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( blackboard_D ). We define a weak solution of equation 2.2 as uH1(𝔻)𝑢superscript𝐻1𝔻u\in H^{1}\left({\mathbb{D}}\right)italic_u ∈ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( blackboard_D ) such that equation 2.4, known as the weak or variational formulation, holds.

Definition 1.

A weak formulation of a linear PDE 𝒟[𝐮]=f𝒟delimited-[]𝐮𝑓\mathcal{D}[{\bm{u}}]=fcaligraphic_D [ bold_italic_u ] = italic_f is an equation of the form

𝒟w[𝒖]=fw,superscript𝒟𝑤delimited-[]𝒖superscript𝑓𝑤\mathcal{D}^{w}[{\bm{u}}]=f^{w},caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT [ bold_italic_u ] = italic_f start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , (2.5)

where 𝒟w:𝕌𝕍:superscript𝒟𝑤𝕌superscript𝕍\mathcal{D}^{w}\colon{\mathbb{U}}\to{\mathbb{V}}^{\prime}caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT : blackboard_U → blackboard_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a linear operator induced by the differential operator 𝒟𝒟\mathcal{D}caligraphic_D and fw𝕍superscript𝑓𝑤superscript𝕍f^{w}\in{\mathbb{V}}^{\prime}italic_f start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ∈ blackboard_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a linear functional induced by the right-hand side f𝑓fitalic_f. A solution to equation 2.5 is called a weak solution of the PDE. In this context, 𝒟[𝐮]=f𝒟delimited-[]𝐮𝑓\mathcal{D}[{\bm{u}}]=fcaligraphic_D [ bold_italic_u ] = italic_f is called the strong formulation of the PDE and any solution to it is called a strong or classical solution.

11todo: 1Explain why we use nonstandard notation

2.1.2 Methods of Weighted Residuals

Unfortunately, linear PDEs both in weak and strong formulation are in general not analytically solvable, so approximate solutions are sought instead. Methods of weighted residuals (MWR) constitute a large family of popular numerical approximation schemes for linear PDEs, including collocation, finite volume, (pseudo)spectral, and (generalized) Galerkin methods such as finite-element methods (Fletcher, 1984). Loosely speaking, MWRs interpret a linear PDE as a root-finding problem for the associated PDE residual, i.e. 𝒟[𝒖]f=0.𝒟delimited-[]𝒖𝑓0\mathcal{D}[{\bm{u}}]-f=0.caligraphic_D [ bold_italic_u ] - italic_f = 0 . Finding the solution of such a system of an uncountably infinite number of equations with infinitely many unknowns is generally intractable. To render the problem tractable, we reduce the number of equations by “projecting” onto nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT using a finite number of continuous linear test functionals l(1),,l(n)𝕍superscript𝑙1superscript𝑙𝑛superscript𝕍l^{(1)},\dotsc,l^{(n)}\in{\mathbb{V}}^{\prime}italic_l start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_l start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ∈ blackboard_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, i.e. we use that the residual being zero implies

l(i)[𝒟[𝒖]f]=(l(i)𝒟)[𝒖]l(i)[f]=0superscript𝑙𝑖delimited-[]𝒟delimited-[]𝒖𝑓superscript𝑙𝑖𝒟delimited-[]𝒖superscript𝑙𝑖delimited-[]𝑓0l^{(i)}[\mathcal{D}[{\bm{u}}]-f]=(l^{(i)}\circ\mathcal{D})[{\bm{u}}]-l^{(i)}[f% ]=0italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ caligraphic_D [ bold_italic_u ] - italic_f ] = ( italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∘ caligraphic_D ) [ bold_italic_u ] - italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ italic_f ] = 0 (2.6)

for all i=1,,n𝑖1𝑛i=1,\dotsc,nitalic_i = 1 , … , italic_n. This is a relaxation of the original problem, since the above is not an equivalence but only an implication.222 This means that equation 2.6 will generally have infinitely many solutions and needs regularization to have a unique solution. A common choice for the test functionals appearing in a large class of MWRs is the integral l(i)[v]𝔻ψ(i)(x)v(x)dx,superscript𝑙𝑖delimited-[]𝑣subscript𝔻superscript𝜓𝑖𝑥𝑣𝑥differential-d𝑥l^{(i)}[v]\coloneqq\int_{{\mathbb{D}}}\psi^{(i)}(x)v(x)\,\mathrm{d}x,italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ italic_v ] ≔ ∫ start_POSTSUBSCRIPT blackboard_D end_POSTSUBSCRIPT italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_x ) italic_v ( italic_x ) roman_d italic_x , where ψ(i)𝕍superscript𝜓𝑖𝕍\psi^{(i)}\in{\mathbb{V}}italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ blackboard_V is a so-called test function. In this case, the test functionals define a weighted average of the current residual, giving rise to the name of the method.

To reduce the number of unknowns, MWRs also often approximate the unknown solution function 𝒖𝒖{\bm{u}}bold_italic_u via finite linear combinations of trial functions ϕ(1),,ϕ(m)𝕌,superscriptbold-italic-ϕ1superscriptbold-italic-ϕ𝑚𝕌{\bm{\phi}}^{(1)},\dotsc,{\bm{\phi}}^{(m)}\in{\mathbb{U}},bold_italic_ϕ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ∈ blackboard_U , i.e.

𝒖𝒖^i=1mciϕ(i),𝒖^𝒖superscriptsubscript𝑖1𝑚subscript𝑐𝑖superscriptbold-italic-ϕ𝑖{\bm{u}}\approx\hat{{\bm{u}}}\coloneqq\sum_{i=1}^{m}{c}_{i}{\bm{\phi}}^{(i)},bold_italic_u ≈ over^ start_ARG bold_italic_u end_ARG ≔ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , (2.7)

where 𝒄m𝒄superscript𝑚{\bm{c}}\in\mathbb{R}^{m}bold_italic_c ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT is the coordinate vector of 𝒖^^𝒖\hat{{\bm{u}}}over^ start_ARG bold_italic_u end_ARG in the finite-dimensional subspace 𝕌^span(ϕ(1),,ϕ(m))𝕌^𝕌spansuperscriptbold-italic-ϕ1superscriptbold-italic-ϕ𝑚𝕌\hat{{\mathbb{U}}}\coloneqq\operatorname{span}\left({\bm{\phi}}^{(1)},\dotsc,{% \bm{\phi}}^{(m)}\right)\subset{\mathbb{U}}over^ start_ARG blackboard_U end_ARG ≔ roman_span ( bold_italic_ϕ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) ⊂ blackboard_U. By substituting equation 2.7 into equation 2.6, we arrive at a linear system 𝑫^𝒄=𝒇^,^𝑫𝒄^𝒇\hat{{\bm{D}}}{\bm{c}}=\hat{{\bm{f}}},over^ start_ARG bold_italic_D end_ARG bold_italic_c = over^ start_ARG bold_italic_f end_ARG , where D^ijl(i)[𝒟[ϕ(j)]]subscript^𝐷𝑖𝑗superscript𝑙𝑖delimited-[]𝒟delimited-[]superscriptbold-italic-ϕ𝑗\hat{{D}}_{ij}\coloneqq l^{(i)}[\mathcal{D}[{\bm{\phi}}^{(j)}]]over^ start_ARG italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ≔ italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ caligraphic_D [ bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ] ] and f^il(i)[f].subscript^𝑓𝑖superscript𝑙𝑖delimited-[]𝑓\hat{{f}}_{i}\coloneqq l^{(i)}[f].over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ italic_f ] . Hence, the approximate solution function obtained from this method is given by

𝒖MWR=i=1mciMWRϕ(i),where𝒄MWR=𝑫^1𝒇^formulae-sequencesuperscript𝒖MWRsuperscriptsubscript𝑖1𝑚subscriptsuperscript𝑐MWR𝑖superscriptbold-italic-ϕ𝑖wheresuperscript𝒄MWRsuperscript^𝑫1^𝒇{\bm{u}}^{\mathrm{MWR}}=\sum_{i=1}^{m}{c}^{\mathrm{MWR}}_{i}{\bm{\phi}}^{(i)},% \qquad\text{where}\qquad{\bm{c}}^{\mathrm{MWR}}=\hat{{\bm{D}}}^{-1}\hat{{\bm{f% }}}bold_italic_u start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , where bold_italic_c start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT = over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_f end_ARG (2.8)

assuming that 𝑫^^𝑫\hat{{\bm{D}}}over^ start_ARG bold_italic_D end_ARG is invertible. Above, we implicitly assume that the trial functions ϕ(i)superscriptbold-italic-ϕ𝑖{\bm{\phi}}^{(i)}bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT satisfy the boundary conditions, i.e. we describe so-called interior methods.333 By stacking the residuals corresponding to the PDE and the boundary conditions, the approach outlined here can be used to realized mixed methods, which solve the boundary value problem without requiring that 𝒖^^𝒖\hat{{\bm{u}}}over^ start_ARG bold_italic_u end_ARG fulfills the boundary conditions by construction.

The procedure outlined above can also be applied to approximate weak solutions to linear PDEs by simply substituting 𝒟𝒟w𝒟superscript𝒟𝑤\mathcal{D}\leftarrow\mathcal{D}^{w}caligraphic_D ← caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT, ffw𝑓superscript𝑓𝑤f\leftarrow f^{w}italic_f ← italic_f start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT, and 𝕍𝕍𝕍superscript𝕍{\mathbb{V}}\leftarrow{\mathbb{V}}^{\prime}blackboard_V ← blackboard_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. In this case, it is customary to employ test functionals l(i)𝕍′′superscript𝑙𝑖superscript𝕍′′l^{(i)}\in{\mathbb{V}}^{\prime\prime}italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ blackboard_V start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT induced by test functions ψ(i)𝕍superscript𝜓𝑖𝕍\psi^{(i)}\in{\mathbb{V}}italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ blackboard_V such that l(i)[𝒟w[𝒖]]=𝒟w[𝒖][ψ(i)]superscript𝑙𝑖delimited-[]superscript𝒟𝑤delimited-[]𝒖superscript𝒟𝑤delimited-[]𝒖delimited-[]superscript𝜓𝑖l^{(i)}[\mathcal{D}^{w}[{\bm{u}}]]=\mathcal{D}^{w}[{\bm{u}}][\psi^{(i)}]italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT [ bold_italic_u ] ] = caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT [ bold_italic_u ] [ italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ] and l(i)[fw]=fw[ψ(i)].superscript𝑙𝑖delimited-[]superscript𝑓𝑤superscript𝑓𝑤delimited-[]superscript𝜓𝑖l^{(i)}[f^{w}]=f^{w}[\psi^{(i)}].italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ italic_f start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ] = italic_f start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT [ italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ] .444 This uses the fact that there is an isometric embedding ι:𝕍𝕍′′,v(ll[v]),:𝜄formulae-sequence𝕍superscript𝕍′′maps-to𝑣maps-to𝑙𝑙delimited-[]𝑣\iota\colon{\mathbb{V}}\to{\mathbb{V}}^{\prime\prime},v\mapsto(l\mapsto l[v]),italic_ι : blackboard_V → blackboard_V start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , italic_v ↦ ( italic_l ↦ italic_l [ italic_v ] ) , where 𝕍′′superscript𝕍′′{\mathbb{V}}^{\prime\prime}blackboard_V start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT denotes the strong bidual of 𝕍𝕍{\mathbb{V}}blackboard_V (Yosida, 1995, Section IV.8). In particular, in the example from section 2.1.1, this implies l(i)[𝒟w[𝒖]]=B[𝒖,ψ(i)]superscript𝑙𝑖delimited-[]superscript𝒟𝑤delimited-[]𝒖𝐵𝒖superscript𝜓𝑖l^{(i)}[\mathcal{D}^{w}[{\bm{u}}]]=B[{\bm{u}},\psi^{(i)}]italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT [ bold_italic_u ] ] = italic_B [ bold_italic_u , italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ] and l(i)[fw]=q˙V,ψ(i)L2(𝔻).superscript𝑙𝑖delimited-[]superscript𝑓𝑤subscriptsubscript˙𝑞𝑉superscript𝜓𝑖subscript𝐿2𝔻l^{(i)}[f^{w}]=\langle\dot{q}_{V},\psi^{(i)}\rangle_{L_{2}\left({\mathbb{D}}% \right)}.italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ italic_f start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ] = ⟨ over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( blackboard_D ) end_POSTSUBSCRIPT . Following Fletcher (1984), we will also refer to these methods as methods of weighted residuals.

Table 1 lists the aforementioned examples of MWRs together with the corresponding trial and test function(al)s that induce the method.

2.2 Gaussian Processes

A Gaussian process (GP) ff{\mathrm{f}}roman_f with index set 𝕏𝕏{\mathbb{X}}blackboard_X is a family {f𝒙}𝒙𝕏subscriptsubscriptf𝒙𝒙𝕏\{{\mathrm{f}}_{\bm{x}}\}_{{\bm{x}}\in{\mathbb{X}}}{ roman_f start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT } start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_X end_POSTSUBSCRIPT of real-valued random variables on a common probability space (Ω,,P)ΩP(\Omega,\mathcal{F},\mathrm{P})( roman_Ω , caligraphic_F , roman_P ), such that, for each finite set of indices 𝒙1,,𝒙n𝕏subscript𝒙1subscript𝒙𝑛𝕏{\bm{x}}_{1},\dotsc,{\bm{x}}_{n}\in{\mathbb{X}}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ blackboard_X, the joint distribution of f𝒙1,,f𝒙nsubscriptfsubscript𝒙1subscriptfsubscript𝒙𝑛{\mathrm{f}}_{{\bm{x}}_{1}},\dotsc,{\mathrm{f}}_{{\bm{x}}_{n}}roman_f start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , roman_f start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT is Gaussian. We also write f(𝒙)f𝒙f𝒙subscriptf𝒙{\mathrm{f}}({\bm{x}})\coloneqq{\mathrm{f}}_{\bm{x}}roman_f ( bold_italic_x ) ≔ roman_f start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT and f(𝒙,ω)f𝒙(ω)f𝒙𝜔subscriptf𝒙𝜔{\mathrm{f}}({\bm{x}},\omega)\coloneqq{\mathrm{f}}_{\bm{x}}(\omega)roman_f ( bold_italic_x , italic_ω ) ≔ roman_f start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_ω ). The function 𝒙𝔼P[f(𝒙)]maps-to𝒙subscript𝔼Pf𝒙{\bm{x}}\mapsto\operatorname{\mathbb{E}}_{\mathrm{P}}\left[{\mathrm{f}}({\bm{x% }})\right]bold_italic_x ↦ blackboard_E start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ roman_f ( bold_italic_x ) ] is called the mean (function) of ff{\mathrm{f}}roman_f and the function (𝒙1,𝒙2)CovP[f(𝒙1),f(𝒙2)]maps-tosubscript𝒙1subscript𝒙2subscriptCovPfsubscript𝒙1fsubscript𝒙2({\bm{x}}_{1},{\bm{x}}_{2})\mapsto\operatorname{Cov}_{\mathrm{P}}\left[{% \mathrm{f}}({\bm{x}}_{1}),{\mathrm{f}}({\bm{x}}_{2})\right]( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ↦ roman_Cov start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ roman_f ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , roman_f ( bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] is called the covariance function or kernel of ff{\mathrm{f}}roman_f. We write f𝒢𝒫(m,k)similar-tof𝒢𝒫𝑚𝑘{\mathrm{f}}\sim{\operatorname{\mathcal{GP}}\left(m,k\right)}roman_f ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( italic_m , italic_k ) to indicate that ff{\mathrm{f}}roman_f is a Gaussian process with mean function m𝑚mitalic_m and covariance function k𝑘kitalic_k. For each ωΩ𝜔Ω\omega\in\Omegaitalic_ω ∈ roman_Ω, the function f(,ω):𝕏,𝒙f(𝒙,ω):f𝜔formulae-sequence𝕏maps-to𝒙f𝒙𝜔{\mathrm{f}}(\cdot,\omega)\colon{\mathbb{X}}\to\mathbb{R},{\bm{x}}\mapsto{% \mathrm{f}}({\bm{x}},\omega)roman_f ( ⋅ , italic_ω ) : blackboard_X → blackboard_R , bold_italic_x ↦ roman_f ( bold_italic_x , italic_ω ) is called a sample or (sample) path of the Gaussian process. We denote the set of all sample paths of ff{\mathrm{f}}roman_f by paths(f){f(,ω):ωΩ}𝕏.pathsfconditional-setf𝜔𝜔Ωsuperscript𝕏\operatorname{paths}\left({\mathrm{f}}\right)\coloneqq\{{\mathrm{f}}(\cdot,% \omega)\colon\omega\in\Omega\}\subset\mathbb{R}^{{\mathbb{X}}}.roman_paths ( roman_f ) ≔ { roman_f ( ⋅ , italic_ω ) : italic_ω ∈ roman_Ω } ⊂ blackboard_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT .

The sample paths of Gaussian processes are always real-valued. However, especially in the context of PDEs, vector-valued functions are ubiquitous, e.g. when dealing with vector fields such as the electric field. Fortunately, the index set of a Gaussian process can be chosen freely, which means that we can “emulate” vector-valued GPs. More precisely, a function 𝒇:𝕏d:𝒇𝕏superscriptsuperscript𝑑{\bm{f}}\colon{\mathbb{X}}\to\mathbb{R}^{d^{\prime}}bold_italic_f : blackboard_X → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT can be equivalently viewed as a function f~:{1,,d}×𝕏:~𝑓1superscript𝑑𝕏\tilde{f}\colon\{1,\dotsc,d^{\prime}\}\times{\mathbb{X}}\to\mathbb{R}over~ start_ARG italic_f end_ARG : { 1 , … , italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } × blackboard_X → blackboard_R with f~(i,𝒙)fi(𝒙)~𝑓𝑖𝒙subscript𝑓𝑖𝒙\tilde{f}(i,{\bm{x}})\coloneqq{f}_{i}({\bm{x}})over~ start_ARG italic_f end_ARG ( italic_i , bold_italic_x ) ≔ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ). Applying this construction to a Gaussian process leads to the notion of a multi-output Gaussian process: A dsuperscript𝑑d^{\prime}italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT-output Gaussian process 𝐟𝐟{\bm{\mathrm{f}}}bold_f with index set 𝕏𝕏{\mathbb{X}}blackboard_X is a family {𝐟𝒙}𝒙𝕏subscriptsubscript𝐟𝒙𝒙𝕏\{{\bm{\mathrm{f}}}_{\bm{x}}\}_{{\bm{x}}\in{\mathbb{X}}}{ bold_f start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT } start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_X end_POSTSUBSCRIPT of dsuperscriptsuperscript𝑑\mathbb{R}^{d^{\prime}}blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT-valued random variables on (Ω,,P)ΩP(\Omega,\mathcal{F},\mathrm{P})( roman_Ω , caligraphic_F , roman_P ) such that f~{(𝐟𝒙)i}(i,𝒙){1,,d}×𝕏~fsubscriptsubscriptsubscript𝐟𝒙𝑖𝑖𝒙1𝑑𝕏\tilde{{\mathrm{f}}}\coloneqq\{({\bm{\mathrm{f}}}_{\bm{x}})_{i}\}_{(i,{\bm{x}}% )\in\{1,\dotsc,d\}\times{\mathbb{X}}}over~ start_ARG roman_f end_ARG ≔ { ( bold_f start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT ( italic_i , bold_italic_x ) ∈ { 1 , … , italic_d } × blackboard_X end_POSTSUBSCRIPT is a Gaussian process. As before, we define 𝐟(𝒙)𝐟𝒙𝐟𝒙subscript𝐟𝒙{\bm{\mathrm{f}}}({\bm{x}})\coloneqq{\bm{\mathrm{f}}}_{\bm{x}}bold_f ( bold_italic_x ) ≔ bold_f start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT and 𝐟(𝒙,ω)𝐟𝒙(ω)𝐟𝒙𝜔subscript𝐟𝒙𝜔{\bm{\mathrm{f}}}({\bm{x}},\omega)\coloneqq{\bm{\mathrm{f}}}_{\bm{x}}(\omega)bold_f ( bold_italic_x , italic_ω ) ≔ bold_f start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_ω ). The mean function 𝒎:𝕏d:𝒎𝕏superscriptsuperscript𝑑{\bm{m}}\colon{\mathbb{X}}\to\mathbb{R}^{d^{\prime}}bold_italic_m : blackboard_X → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and covariance function 𝒌:𝕏×𝕏d×d:𝒌𝕏𝕏superscriptsuperscript𝑑superscript𝑑{\bm{k}}\colon{\mathbb{X}}\times{\mathbb{X}}\to\mathbb{R}^{d^{\prime}\times d^% {\prime}}bold_italic_k : blackboard_X × blackboard_X → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT × italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT of 𝐟𝐟{\bm{\mathrm{f}}}bold_f are defined by

𝒎(𝒙)=(m~(1,𝒙)m~(d,𝒙))and𝒌(𝒙1,𝒙2)=(k~((1,𝒙1),(1,𝒙2))k~((1,𝒙1),(d,𝒙2))k~((d,𝒙1),(1,𝒙2))k~((d,𝒙1),(d,𝒙2))),formulae-sequence𝒎𝒙matrix~𝑚1𝒙~𝑚superscript𝑑𝒙and𝒌subscript𝒙1subscript𝒙2matrix~𝑘1subscript𝒙11subscript𝒙2~𝑘1subscript𝒙1superscript𝑑subscript𝒙2~𝑘superscript𝑑subscript𝒙11subscript𝒙2~𝑘superscript𝑑subscript𝒙1superscript𝑑subscript𝒙2{\bm{m}}({\bm{x}})=\begin{pmatrix}\tilde{m}(1,{\bm{x}})\\ \vdots\\ \tilde{m}(d^{\prime},{\bm{x}})\end{pmatrix}\quad\text{and}\quad{\bm{k}}({\bm{x% }}_{1},{\bm{x}}_{2})=\begin{pmatrix}\tilde{k}((1,{\bm{x}}_{1}),(1,{\bm{x}}_{2}% ))&\ldots&\tilde{k}((1,{\bm{x}}_{1}),(d^{\prime},{\bm{x}}_{2}))\\ \vdots&\ddots&\vdots\\ \tilde{k}((d^{\prime},{\bm{x}}_{1}),(1,{\bm{x}}_{2}))&\ldots&\tilde{k}((d^{% \prime},{\bm{x}}_{1}),(d^{\prime},{\bm{x}}_{2}))\\ \end{pmatrix},bold_italic_m ( bold_italic_x ) = ( start_ARG start_ROW start_CELL over~ start_ARG italic_m end_ARG ( 1 , bold_italic_x ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_m end_ARG ( italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_x ) end_CELL end_ROW end_ARG ) and bold_italic_k ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ( start_ARG start_ROW start_CELL over~ start_ARG italic_k end_ARG ( ( 1 , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( 1 , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) end_CELL start_CELL … end_CELL start_CELL over~ start_ARG italic_k end_ARG ( ( 1 , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_k end_ARG ( ( italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( 1 , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) end_CELL start_CELL … end_CELL start_CELL over~ start_ARG italic_k end_ARG ( ( italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) end_CELL end_ROW end_ARG ) ,

where f~𝒢𝒫(m~,k~)similar-to~f𝒢𝒫~𝑚~𝑘\tilde{{\mathrm{f}}}\sim{\operatorname{\mathcal{GP}}\left(\tilde{m},\tilde{k}% \right)}over~ start_ARG roman_f end_ARG ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( over~ start_ARG italic_m end_ARG , over~ start_ARG italic_k end_ARG ), and we write 𝐟𝒢𝒫(𝒎,𝒌)similar-to𝐟𝒢𝒫𝒎𝒌{\bm{\mathrm{f}}}\sim{\operatorname{\mathcal{GP}}\left({\bm{m}},{\bm{k}}\right)}bold_f ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m , bold_italic_k ).

3 Learning the Solution to a Linear PDE

Consider a linear partial differential equation 𝒟[𝒖]=f𝒟delimited-[]𝒖𝑓\mathcal{D}[{\bm{u}}]=fcaligraphic_D [ bold_italic_u ] = italic_f subject to linear boundary conditions [𝒖]=gdelimited-[]𝒖𝑔\mathcal{B}[{\bm{u}}]=gcaligraphic_B [ bold_italic_u ] = italic_g as in section 2.1. Our goal is to find a solution 𝒖𝕌𝒖𝕌{\bm{u}}\in{\mathbb{U}}bold_italic_u ∈ blackboard_U satisfying the PDE for (partially) known (𝒟,f)𝒟𝑓(\mathcal{D},f)( caligraphic_D , italic_f ) and (,g)𝑔(\mathcal{B},g)( caligraphic_B , italic_g ). In general, one cannot find a closed-form expression for the solution 𝒖𝒖{\bm{u}}bold_italic_u (Borthwick, 2018). Therefore, we aim to compute an accurate approximation 𝒖^𝒖^𝒖𝒖\hat{{\bm{u}}}\approx{\bm{u}}over^ start_ARG bold_italic_u end_ARG ≈ bold_italic_u instead. Motivated by the challenges (C1), (C2) and (C3) of partial information inherent to numerically solving PDEs, we approach the problem from a statistical inference perspective. In other words, we will learn the solution of the PDE from multiple heterogeneous sources of information. This way we can quantify the epistemic uncertainty about the solution at any time during the computation, as figure 1(a) illustrates.

Indirectly Observing the Solution of a PDE

Typically, we think of observations as a finite number of direct measurements 𝒖(𝒙i)=𝒚i𝒖subscript𝒙𝑖subscript𝒚𝑖{\bm{u}}({\bm{x}}_{i})={\bm{y}}_{i}bold_italic_u ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of the latent function 𝒖𝒖{\bm{u}}bold_italic_u. As it turns out, we can generalize this notion of a measurement and even interpret the PDE itself as an (indirect) observation of 𝒖𝒖{\bm{u}}bold_italic_u. As an example, consider the important case where 𝒖𝒖{\bm{u}}bold_italic_u models the state of a physical system. The laws of physics governing such a system are often formulated as conservation laws in the language of PDEs. For example, they may require physical quantities like mass, momentum, charge or energy to be conserved over time.

Example 1 (Thermal Conduction and the Heat Equation).

Say we want to simulate heat conduction in a solid object with shape 𝔻3𝔻superscript3{\mathbb{D}}\subset\mathbb{R}^{3}blackboard_D ⊂ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, i.e. we want to find the time-varying temperature distribution u:[0,T]×𝔻:𝑢0𝑇𝔻u\colon[0,T]\times{\mathbb{D}}\to\mathbb{R}italic_u : [ 0 , italic_T ] × blackboard_D → blackboard_R. Neglecting radiation and convection, u(t,𝐱)𝑢𝑡𝐱u(t,{\bm{x}})italic_u ( italic_t , bold_italic_x ) is described by a linear PDE known as the heat equation (Lienhard and Lienhard, 2020). Assuming spatially and temporally uniform material parameters cp,ρ,κsubscript𝑐𝑝𝜌𝜅c_{p},\rho,\kappa\in\mathbb{R}italic_c start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_ρ , italic_κ ∈ blackboard_R, it reduces to

(cpρtκΔ)uq˙V=0.subscript𝑐𝑝𝜌𝑡𝜅Δ𝑢subscript˙𝑞𝑉0\left(c_{p}\rho\frac{\partial}{\partial t}-\kappa\Delta\right)u-\dot{q}_{V}=0.( italic_c start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_ρ divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG - italic_κ roman_Δ ) italic_u - over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = 0 . (3.1)

Thermal conduction is described by κΔu𝜅Δ𝑢-\kappa\Delta u- italic_κ roman_Δ italic_u, while q˙Vsubscript˙𝑞𝑉\dot{q}_{V}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT are local heat sources, e.g. from electric currents. Any energy flowing into a region due to conduction or a heat source is balanced by an increase in energy of the material. The net-zero balance shows that energy is conserved.

Notice how a conservation law is an observation of the behavior of the physical system! To formalize this, we begin by rephrasing the classical notion of an observation at a point 𝒙isubscript𝒙𝑖{\bm{x}}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as measuring the result of a specific linear operator applied to the solution 𝒖𝒖{\bm{u}}bold_italic_u:

𝒖(𝒙i)=𝒚iδ𝒙i[𝒖]=𝒚iiff𝒖subscript𝒙𝑖subscript𝒚𝑖subscript𝛿subscript𝒙𝑖delimited-[]𝒖subscript𝒚𝑖{\bm{u}}({\bm{x}}_{i})={\bm{y}}_{i}\iff\delta_{{\bm{x}}_{i}}[{\bm{u}}]={\bm{y}% }_{i}bold_italic_u ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⇔ italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_u ] = bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

where δ𝒙isubscript𝛿subscript𝒙𝑖\delta_{{\bm{x}}_{i}}italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the evaluation functional. Now, the key idea is to generalize the notion of a direct observation to collecting information about the solution via an arbitrary linear operator 𝓛𝓛{\bm{\mathcal{L}}}bold_caligraphic_L with values in nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT applied to the solution 𝒖𝒖{\bm{u}}bold_italic_u, such that 𝓛[𝒖]=𝒚𝓛[𝒖]𝒚=𝟎.iff𝓛delimited-[]𝒖𝒚𝓛delimited-[]𝒖𝒚0{\bm{\mathcal{L}}}[{\bm{u}}]={\bm{y}}\iff{\bm{\mathcal{L}}}[{\bm{u}}]-{\bm{y}}% ={\bm{0}}.bold_caligraphic_L [ bold_italic_u ] = bold_italic_y ⇔ bold_caligraphic_L [ bold_italic_u ] - bold_italic_y = bold_0 . The affine operator

𝓘[𝒖]𝓛[𝒖]𝒚𝓘delimited-[]𝒖𝓛delimited-[]𝒖𝒚{\bm{\mathcal{I}}}[{\bm{u}}]\coloneqq{\bm{\mathcal{L}}}[{\bm{u}}]-{\bm{y}}bold_caligraphic_I [ bold_italic_u ] ≔ bold_caligraphic_L [ bold_italic_u ] - bold_italic_y (3.2)

is a specific kind of information operator (Cockayne et al., 2019b). In this setting the information operator may describe a conservation law as in equation 3.1, a general linear PDE of the form (2.1) or an arbitrary affine operator mapping a function space into nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. This generalized notion of an observation turns out to be very powerful to incorporate different kinds of mathematical, physical, or experimental properties of the solution. Since PDEs and conservation laws are often assumed to hold exactly, we focused on noise-free observations above. However, generally we are not limited to this case and can also model 𝒚𝒚{\bm{y}}bold_italic_y as random variable, in which case the information operator [(𝒖,𝒚)]delimited-[]𝒖𝒚\mathcal{I}[({\bm{u}},{\bm{y}})]caligraphic_I [ ( bold_italic_u , bold_italic_y ) ] is a (jointly) linear functional of the solution 𝒖𝒖{\bm{u}}bold_italic_u and the right-hand side 𝒚𝒚{\bm{y}}bold_italic_y.

3.1 Solving PDEs as a Bayesian Inference Problem

One of the main challenges (C1), (C2) and (C3) outlined in the beginning is the limited computational budget available to us to approximate the solution. Fortunately, in practice, the solution 𝒖𝒖{\bm{u}}bold_italic_u is not hopelessly unconstrained, but we usually a-priori have information about it. At the very least, we know the space of functions 𝕌𝕌{\mathbb{U}}blackboard_U in which to search for the solution. Additionally, we might have expert knowledge about its rough shape and value range, or solutions to related PDEs at our disposal. Now, the question becomes: How do we combine this prior knowledge with indirect observations of the solution through the information operator 𝓘𝓘{\bm{\mathcal{I}}}bold_caligraphic_I 3.2? To do so, we turn to the Bayesian inference framework. This provides a different perspective on the numerical problem of solving a linear PDE as a learning task.

Gaussian Process Inference

We represent our belief about the solution of the linear PDE via a (multi-output) Gaussian process 𝐮𝒢𝒫(𝒎,𝒌)similar-to𝐮𝒢𝒫𝒎𝒌{\bm{\mathrm{u}}}\sim{\operatorname{\mathcal{GP}}\left({\bm{m}},{\bm{k}}\right)}bold_u ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m , bold_italic_k ) with mean function 𝒎:𝔻d:𝒎𝔻superscriptsuperscript𝑑{\bm{m}}\colon{\mathbb{D}}\to\mathbb{R}^{d^{\prime}}bold_italic_m : blackboard_D → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and kernel 𝒌:𝔻×𝔻d×d:𝒌𝔻𝔻superscriptsuperscript𝑑superscript𝑑{\bm{k}}\colon{\mathbb{D}}\times{\mathbb{D}}\to\mathbb{R}^{d^{\prime}\times d^% {\prime}}bold_italic_k : blackboard_D × blackboard_D → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT × italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. Gaussian processes are well-suited for this purpose since:

  1. (i)

    For an appropriate choice of kernel, the Gaussian process defines a probability measure over the function space in which the PDE’s solution is sought.

  2. (ii)

    Kernels provide a powerful modeling toolkit to incorporate prior information (e.g. variability, periodicity, multi-scale effects, in- / equivariances, …) in a modular fashion.

  3. (iii)

    Measurement noise often follows a Gaussian distribution.

  4. (iv)

    Conditioning a Gaussian process on observations made via a linear map again results in a Gaussian process.

While the result in (iv) is used ubiquitously in the literature, its general form where observations are made via arbitrary linear operators with values in nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT as opposed to finite-dimensional linear maps, has only been rigorously demonstrated for Gaussian measures on separable Hilbert spaces, not for the Gaussian process perspective, to the best of our knowledge. The two perspectives are closely related, but there are thorny technical difficulties to consider. We intentionally frame the problem from the Gaussian process perspective to make use of the expressive modeling capabilities provided by the kernel. Our framework at its core relies on this result, which we explain in detail in section 4 and prove in section B.3.

3.1.1 Encoding Prior Knowledge about the Solution

We can infer the solution of a linear PDE more quickly by specifying inductive biases in the prior, which can encode both provable and approximately known properties of the solution.555In the special case of GP regression, if the prior smoothness matches the smoothness of the target function 𝒖𝒖{\bm{u}}bold_italic_u, the convergence rate is optimal in the number of observations (Kanagawa et al., 2018, Thm. 5.1).

Function Space of the Solution

The most basic known property derived from the PDE is an appropriate choice of function space for the solution. For strong solutions, this can be done by inspecting the differential operator 𝒟𝒟\mathcal{D}caligraphic_D and keeping track of the partial derivatives. In fact, in implementation this can be automatically derived solely from the problem definition, e.g. by compositionally defining differential operators and storing information on the necessary differentiability. Let βi0subscript𝛽𝑖subscript0{\beta}_{i}\in\mathbb{N}_{0}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be the number of times any partial derivative in the differential operator 𝒟𝒟\mathcal{D}caligraphic_D differentiates w.r.t. the variable xisubscript𝑥𝑖{x}_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.666Formally, 𝜷0d𝜷superscriptsubscript0𝑑{\bm{\beta}}\in\mathbb{N}_{0}^{d}bold_italic_β ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is the “smallest” multi-index such that 𝜶𝜷𝜶𝜷{\bm{\alpha}}\leq{\bm{\beta}}bold_italic_α ≤ bold_italic_β for every multi-index of a partial derivative occurring in 𝒟𝒟\mathcal{D}caligraphic_D (see definitions 22 and 23). Then a sensible choice of solution space is the space 𝕌=C𝜷(𝔻¯)𝕌superscript𝐶𝜷¯𝔻{\mathbb{U}}=C^{{\bm{\beta}}}(\overline{{\mathbb{D}}})blackboard_U = italic_C start_POSTSUPERSCRIPT bold_italic_β end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_D end_ARG ) (see section B.2). To define a prior with paths in this solution space, a common choice of prior covariance function is the tensor product of one-dimensional half-integer Matérn kernels kνisubscript𝑘subscript𝜈𝑖k_{{\nu}_{i}}italic_k start_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT with νi=βi+12subscript𝜈𝑖subscript𝛽𝑖12{\nu}_{i}={\beta}_{i}+\frac{1}{2}italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG (see section B.4). For weak solutions, the Sobolev spaces 𝕌=Hm(𝔻)𝕌superscript𝐻𝑚𝔻{\mathbb{U}}=H^{m}\left({\mathbb{D}}\right)blackboard_U = italic_H start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( blackboard_D ) are prototypical choices of solution spaces. In this case, a (multivariate) Matérn kernel with smoothness parameter ν=m+12𝜈𝑚12\nu=m+\frac{1}{2}italic_ν = italic_m + divide start_ARG 1 end_ARG start_ARG 2 end_ARG is a useful default prior covariance function. In both cases, a parametric kernel k(𝒙0,𝒙1)=ϕ(𝒙0)𝚺ϕ(𝒙1)𝑘subscript𝒙0subscript𝒙1bold-italic-ϕsuperscriptsubscript𝒙0top𝚺bold-italic-ϕsubscript𝒙1k({\bm{x}}_{0},{\bm{x}}_{1})={\bm{\phi}}({\bm{x}}_{0})^{\top}{\bm{\Sigma}}{\bm% {\phi}}({\bm{x}}_{1})italic_k ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = bold_italic_ϕ ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Σ bold_italic_ϕ ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) is also a valid choice if ϕi𝕌subscriptitalic-ϕ𝑖𝕌{\phi}_{i}\in{\mathbb{U}}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_U. See section B.4.2 for a detailed account on how to choose priors for physics-informed GP regression.

Symmetries, In- and Equivariances

Many solutions of PDEs exhibit a-priori known symmetries. For example, to calculate the strength of a magnet rotated by 𝑹:33:𝑹superscript3superscript3{\bm{R}}:\mathbb{R}^{3}\to\mathbb{R}^{3}bold_italic_R : blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, one can equivalently compute the field 𝑩𝑩{\bm{B}}bold_italic_B of the magnet in its original position and rotate the field, i.e. 𝑩(𝑹𝒙)=𝑹𝑩(𝒙)𝑩𝑹𝒙𝑹𝑩𝒙{\bm{B}}({\bm{R}}{\bm{x}})={\bm{R}}{\bm{B}}({\bm{x}})bold_italic_B ( bold_italic_R bold_italic_x ) = bold_italic_R bold_italic_B ( bold_italic_x ). Inductive biases reflecting symmetries can be encoded via kernels that are invariant 𝒌(𝝆g𝒙0,𝝆g𝒙1)=𝒌(𝒙0,𝒙1),𝒌subscript𝝆𝑔subscript𝒙0subscript𝝆𝑔subscript𝒙1𝒌subscript𝒙0subscript𝒙1{\bm{k}}({\bm{\rho}}_{g}{\bm{x}}_{0},{\bm{\rho}}_{g}{\bm{x}}_{1})={\bm{k}}({% \bm{x}}_{0},{\bm{x}}_{1}),bold_italic_k ( bold_italic_ρ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_ρ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = bold_italic_k ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , or equivariant 𝒌(𝝆g𝒙0,𝝆g𝒙1)=𝝆g𝒌(𝒙0,𝒙1)𝝆g,𝒌subscript𝝆𝑔subscript𝒙0subscript𝝆𝑔subscript𝒙1subscript𝝆𝑔𝒌subscript𝒙0subscript𝒙1superscriptsubscript𝝆𝑔{\bm{k}}({\bm{\rho}}_{g}{\bm{x}}_{0},{\bm{\rho}}_{g}{\bm{x}}_{1})={\bm{\rho}}_% {g}{\bm{k}}({\bm{x}}_{0},{\bm{x}}_{1}){\bm{\rho}}_{g}^{*},bold_italic_k ( bold_italic_ρ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_ρ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = bold_italic_ρ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT bold_italic_k ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) bold_italic_ρ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , where 𝝆gsubscript𝝆𝑔{\bm{\rho}}_{g}bold_italic_ρ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT is a unitary group representation. The most commonly used kernels are stationary, i.e. translation invariant, but one can also construct invariant (Haasdonk and Burkhardt, 2007; Azangulov et al., 2022), as well as equivariant kernels (Reisert and Burkhardt, 2007; Holderrieth et al., 2021) for many other group actions.

Related Problems

Given a set of solutions from related problems, the prior mean function can be set to a combination thereof and the prior kernel can then be chosen to reflect how related the problems are. For example, if we have an approximate solution of a PDE computed on a coarser mesh, we can condition our function space prior on the coarse solution with a noise level reflecting the fidelity of the discretization. Similarly, if we solved the same PDE with different parameters, we can condition on the available solutions with a noise level chosen according to how similar the parameters are to the one of interest.

Domain Expertise

Domain experts often have approximate knowledge of what solutions can be expected, either from experience, previous experiments, or familiarity with the physical interpretation of the solution 𝒖𝒖{\bm{u}}bold_italic_u. For example, an engineer who designs electrical components is likely able to give realistic temperature ranges for a component for which we aim to simulate the temperature distribution. This can be included by choosing the (initial) kernel hyperparameters, such as the output- and lengthscales based on this expertise.

3.1.2 (Indirectly) Observing the Solution

From a computational perspective, the most important reason for choosing Gaussian processes is that when conditioning on linear observations, the resulting posterior is again a Gaussian process with closed form mean and covariance function (Bishop, 2006). We extend this classic result from observations via a finite-dimensional linear map to general nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT-valued linear operators in Theorem 1. This is crucial to condition on the different types of observations, most importantly the PDE itself, made via the information operator in (3.2). Given such an affine observation defined via a linear operator 𝓛:𝕌n:𝓛𝕌superscript𝑛{\bm{\mathcal{L}}}\colon{\mathbb{U}}\to\mathbb{R}^{n}bold_caligraphic_L : blackboard_U → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and an independent Gaussian random variable ϵ𝒩(𝝁,𝚺),similar-tobold-italic-ϵ𝒩𝝁𝚺{\bm{\mathrm{\epsilon}}}\sim{\operatorname{\mathcal{N}}\left({\bm{\mu}},{\bm{% \Sigma}}\right)},bold_italic_ϵ ∼ caligraphic_N ( bold_italic_μ , bold_Σ ) , we can condition our prior belief using theorem 1 on the observations to obtain a posterior of the form 𝐮\nonscript|\nonscript(𝓛[𝐮]+ϵ=𝒚)𝒢𝒫(𝒎𝐮\nonscript|\nonscript𝒚,𝒌𝐮\nonscript|\nonscript𝒚)\left.{\bm{\mathrm{u}}}\nonscript\>\middle|\allowbreak\nonscript\>\mathopen{}(% {\bm{\mathcal{L}}}[{\bm{\mathrm{u}}}]+{\bm{\mathrm{\epsilon}}}={\bm{y}})\right% .\sim{\operatorname{\mathcal{GP}}\left({\bm{m}}^{{\bm{\mathrm{u}}}\nonscript\>% |\allowbreak\nonscript\>\mathopen{}{\bm{y}}},{\bm{k}}^{{\bm{\mathrm{u}}}% \nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{y}}}\right)}bold_u | ( bold_caligraphic_L [ bold_u ] + bold_italic_ϵ = bold_italic_y ) ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m start_POSTSUPERSCRIPT bold_u | bold_italic_y end_POSTSUPERSCRIPT , bold_italic_k start_POSTSUPERSCRIPT bold_u | bold_italic_y end_POSTSUPERSCRIPT ) with mean and covariance function given by

mi𝐮\nonscript|\nonscript𝒚(𝒙)subscriptsuperscript𝑚conditional𝐮\nonscript\nonscript𝒚𝑖𝒙\displaystyle{m}^{{\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>% \mathopen{}{\bm{y}}}_{i}({\bm{x}})italic_m start_POSTSUPERSCRIPT bold_u | bold_italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) =mi(𝒙)+𝓛[k:,i(,𝒙)](𝓛𝒌𝓛+𝚺)(𝒚(𝓛[𝒎]+𝝁)),absentsubscript𝑚𝑖𝒙𝓛superscriptdelimited-[]subscript𝑘:𝑖𝒙topsuperscript𝓛𝒌superscript𝓛𝚺𝒚𝓛delimited-[]𝒎𝝁\displaystyle={m}_{i}({\bm{x}})+{\bm{\mathcal{L}}}[{k}_{:,i}(\cdot,{\bm{x}})]^% {\top}\left({\bm{\mathcal{L}}}{\bm{k}}{\bm{\mathcal{L}}}^{\prime}+{\bm{\Sigma}% }\right)^{\dagger}\left({\bm{y}}-({\bm{\mathcal{L}}}[{\bm{m}}]+{\bm{\mu}})% \right),= italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) + bold_caligraphic_L [ italic_k start_POSTSUBSCRIPT : , italic_i end_POSTSUBSCRIPT ( ⋅ , bold_italic_x ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_caligraphic_L bold_italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_Σ ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( bold_italic_y - ( bold_caligraphic_L [ bold_italic_m ] + bold_italic_μ ) ) , (3.3)
ki,j𝐮\nonscript|\nonscript𝒚(𝒙1,𝒙2)subscriptsuperscript𝑘conditional𝐮\nonscript\nonscript𝒚𝑖𝑗subscript𝒙1subscript𝒙2\displaystyle{k}^{{\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>% \mathopen{}{\bm{y}}}_{i,j}({\bm{x}}_{1},{\bm{x}}_{2})italic_k start_POSTSUPERSCRIPT bold_u | bold_italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) =ki,j(𝒙1,𝒙2)+𝓛[k:,i(,𝒙1)](𝓛𝒌𝓛+𝚺)𝓛[k:,j(,𝒙2)].absentsubscript𝑘𝑖𝑗subscript𝒙1subscript𝒙2𝓛superscriptdelimited-[]subscript𝑘:𝑖subscript𝒙1topsuperscript𝓛𝒌superscript𝓛𝚺𝓛delimited-[]subscript𝑘:𝑗subscript𝒙2\displaystyle={k}_{i,j}({\bm{x}}_{1},{\bm{x}}_{2})+{\bm{\mathcal{L}}}[{k}_{:,i% }(\cdot,{\bm{x}}_{1})]^{\top}\left({\bm{\mathcal{L}}}{\bm{k}}{\bm{\mathcal{L}}% }^{\prime}+{\bm{\Sigma}}\right)^{\dagger}{\bm{\mathcal{L}}}[{k}_{:,j}(\cdot,{% \bm{x}}_{2})].= italic_k start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + bold_caligraphic_L [ italic_k start_POSTSUBSCRIPT : , italic_i end_POSTSUBSCRIPT ( ⋅ , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_caligraphic_L bold_italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_Σ ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_caligraphic_L [ italic_k start_POSTSUBSCRIPT : , italic_j end_POSTSUBSCRIPT ( ⋅ , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] . (3.4)

We will now look more closely at how we can condition on the boundary conditions, the PDE itself and direct measurements of the solution.

Observing the Solution via the PDE

The differential operator 𝒟𝒟\mathcal{D}caligraphic_D in equation 2.1 is linear and therefore it is tempting to define the information operator [𝐮]=𝒟[𝐮]fdelimited-[]𝐮𝒟delimited-[]𝐮𝑓\mathcal{I}[{\bm{\mathrm{u}}}]=\mathcal{D}[{\bm{\mathrm{u}}}]-fcaligraphic_I [ bold_u ] = caligraphic_D [ bold_u ] - italic_f and attempt to condition on [𝐮]=0delimited-[]𝐮0\mathcal{I}[{\bm{\mathrm{u}}}]=0caligraphic_I [ bold_u ] = 0. Under some assumptions on 𝕌𝕌{\mathbb{U}}blackboard_U, 𝒟𝒟\mathcal{D}caligraphic_D, and 𝐮𝐮{\bm{\mathrm{u}}}bold_u, one can even show that this is well-defined. Unfortunately, it turns out that computing the posterior moments is then at least as hard as solving the PDE directly and thus typically intractable in practice. Loosely speaking, this is because f𝑓fitalic_f is a function and hence 𝒟[𝐮]=f𝒟delimited-[]𝐮𝑓\mathcal{D}[{\bm{\mathrm{u}}}]=fcaligraphic_D [ bold_u ] = italic_f corresponds to an infinite number of observations. However, by only enforcing the PDE at a finite number of points in the domain, we can immediately give a canonical example of an approximation to this intractable information operator. Concretely, we can condition 𝐮𝐮{\bm{\mathrm{u}}}bold_u on the fact that the PDE holds at a finite sequence of well-chosen domain points 𝑿PDE=(𝒙i)i=1n𝔻n,subscript𝑿PDEsuperscriptsubscriptsubscript𝒙𝑖𝑖1𝑛superscript𝔻𝑛{\bm{X}}_{\text{PDE}}=({\bm{x}}_{i})_{i=1}^{n}\in{\mathbb{D}}^{n},bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT = ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ blackboard_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , i.e. we compute 𝐮\nonscript|\nonscript(𝒟[𝐮](𝑿PDE)f(𝑿PDE)=0)conditional𝐮\nonscript\nonscript𝒟delimited-[]𝐮subscript𝑿PDE𝑓subscript𝑿PDE0{\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}(\mathcal{D}[{% \bm{\mathrm{u}}}]({\bm{X}}_{\text{PDE}})-f({\bm{X}}_{\text{PDE}})=0)bold_u | ( caligraphic_D [ bold_u ] ( bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT ) - italic_f ( bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT ) = 0 ) by choosing 𝓛=δ𝑿PDE𝒟𝓛subscript𝛿subscript𝑿PDE𝒟{\bm{\mathcal{L}}}=\delta_{{\bm{X}}_{\text{PDE}}}\circ\mathcal{D}bold_caligraphic_L = italic_δ start_POSTSUBSCRIPT bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_D and 𝒚=f(𝑿PDE)𝒚𝑓subscript𝑿PDE{\bm{y}}=f({\bm{X}}_{\text{PDE}})bold_italic_y = italic_f ( bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT ). If the set 𝑿PDEsubscript𝑿PDE{\bm{X}}_{\text{PDE}}bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT of domain points is dense enough, we obtain a good approximation to the exact conditional process. This approach, known as the probabilistic meshless method (Cockayne et al., 2017), is analogous to existing non-probabilistic approaches to solving PDEs, commonly referred to as collocation methods, wherein the points 𝑿𝑿{\bm{X}}bold_italic_X are called collocation points. Satisfying the PDE at a set of collocation points is far from the only choice within our general framework. For example, we can choose a set of test functions l(i)𝕍,superscript𝑙𝑖superscript𝕍l^{(i)}\in{\mathbb{V}}^{\prime},italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ blackboard_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , which we use to observe the PDE with, such that i[𝐮]=l(i)[𝒟[𝐮]]subscript𝑖delimited-[]𝐮superscript𝑙𝑖delimited-[]𝒟delimited-[]𝐮{\mathcal{L}}_{i}[{\bm{\mathrm{u}}}]=l^{(i)}[\mathcal{D}[{\bm{\mathrm{u}}}]]caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ bold_u ] = italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ caligraphic_D [ bold_u ] ] and 𝒚i=l(i)[f]subscript𝒚𝑖superscript𝑙𝑖delimited-[]𝑓{\bm{y}}_{i}=l^{(i)}[f]bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ italic_f ]. For efficient evaluation of the differential operator we can further represent the solution in a basis of trial functions from a subspace 𝕌^^𝕌\hat{{\mathbb{U}}}over^ start_ARG blackboard_U end_ARG, resulting in i[𝐮]=l(i)[𝒟[𝒫𝕌^[𝐮]]]subscript𝑖delimited-[]𝐮superscript𝑙𝑖delimited-[]𝒟delimited-[]subscript𝒫^𝕌delimited-[]𝐮{\mathcal{L}}_{i}[{\bm{\mathrm{u}}}]=l^{(i)}[\mathcal{D}[\mathcal{P}_{\hat{{% \mathbb{U}}}}[{\bm{\mathrm{u}}}]]]caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ bold_u ] = italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ caligraphic_D [ caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT [ bold_u ] ] ]. This turns out to be very powerful and is analogous to some of the most successful classical PDE solvers. In fact, for certain priors and choices of subspaces, our framework recovers several important classic solvers in the posterior mean (see Section 3.3.4). The above can be applied to both time-dependent and time-independent PDEs and regardless of the type of linear PDE (e.g. elliptic, parabolic, hyperbolic). Moreover, an extension to systems of linear PDEs is straightforward.

Observing the Solution at the Boundary

As for the PDE, we could attempt to directly condition on the boundary conditions by choosing 𝓛=𝓛{\bm{\mathcal{L}}}=\mathcal{B}bold_caligraphic_L = caligraphic_B and 𝒚=g𝒚𝑔{\bm{y}}=gbold_italic_y = italic_g. However, we are faced with the same intractability issues that we discussed above. Instead, we observe that the boundary conditions hold at a finite set of points 𝑿BC𝔻subscript𝑿BC𝔻{\bm{X}}_{\text{BC}}\subset\partial{\mathbb{D}}bold_italic_X start_POSTSUBSCRIPT BC end_POSTSUBSCRIPT ⊂ ∂ blackboard_D, i.e. =δ𝑿BCsubscript𝛿subscript𝑿BC\mathcal{L}=\delta_{{\bm{X}}_{\text{BC}}}\circ\mathcal{B}caligraphic_L = italic_δ start_POSTSUBSCRIPT bold_italic_X start_POSTSUBSCRIPT BC end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_B and 𝒚=g(𝑿BC)𝒚𝑔subscript𝑿BC{\bm{y}}=g({\bm{X}}_{\text{BC}})bold_italic_y = italic_g ( bold_italic_X start_POSTSUBSCRIPT BC end_POSTSUBSCRIPT ). In practice, sometimes the boundary conditions are only known at a finite set of points making this a natural choice.

Observing the Solution Directly

Finally, as in standard GP regression, we can directly condition on (noisy) measurements of the solution, for example from a real world experiment, by choosing 𝓛=δ𝑿MEAS𝓛subscript𝛿subscript𝑿MEAS{\bm{\mathcal{L}}}=\delta_{{\bm{X}}_{\text{MEAS}}}bold_caligraphic_L = italic_δ start_POSTSUBSCRIPT bold_italic_X start_POSTSUBSCRIPT MEAS end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝒚=𝒖(𝑿MEAS)𝒚superscript𝒖subscript𝑿MEAS{\bm{y}}={\bm{u}}^{\star}({\bm{X}}_{\text{MEAS}})bold_italic_y = bold_italic_u start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( bold_italic_X start_POSTSUBSCRIPT MEAS end_POSTSUBSCRIPT ).

In summary, the probabilistic viewpoint allows us to

  • encode prior information about the solution,

  • condition on various kinds of (partial) information, such as the boundary condition, the PDE itself, or direct measurements, and

  • output a structured error estimate, reflecting all obtained information and performed computation.

We will now give concrete examples for some of the possible modeling choices described above in a case study.

3.2 Case Study: Modeling the Temperature Distribution in a CPU

Central processing units (CPUs) are pieces of computing hardware that are constrained by the vast amounts of heat they dissipate under computational load. Surpassing the maximum temperature threshold of a CPU for a prolonged period of time can result in reduced longevity or even permanent hardware damage (Michaud, 2019). To counteract overheating, cooling systems are attached to the CPU, which are controlled by digital thermal sensors (DTS). For simplicity, assume that the CPU is under sustained computational load and that the cooling device is controlled in a way such that the die reaches thermal equilibrium.

Example 2 (Stationary Heat Equation).

The temperature distribution of a solid at thermal equilibrium, i.e. ut=0𝑢𝑡0\frac{\partial u}{\partial t}=0divide start_ARG ∂ italic_u end_ARG start_ARG ∂ italic_t end_ARG = 0 in Example 1, is described by the linear PDE

κΔuq˙V=0,𝜅Δ𝑢subscript˙𝑞𝑉0-\kappa\Delta u-\dot{q}_{V}=0,- italic_κ roman_Δ italic_u - over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = 0 , (3.5)

known as the stationary heat equation (Lienhard and Lienhard, 2020). For our choice of material parameters equation 3.5 is equivalent to the Poisson equation with f=q˙Vκ𝑓subscript˙𝑞𝑉𝜅f=\frac{\dot{q}_{V}}{\kappa}italic_f = divide start_ARG over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT end_ARG start_ARG italic_κ end_ARG.

While the sensors control cooling, they only provide local, limited-precision measurements of the CPU temperature. This is problematic, since the chip may reach critical temperature thresholds in unmonitored regions. Therefore, our goal will be to infer the temperature in the entire CPU. We will use our framework to integrate the physics of heat flow, the controlled cooling at the boundary, and the noisy temperature measurements from the sensors. See figure 2(b) for an illustration of the result. During manufacturing, the resulting belief over the temperature distribution could then help decide whether the CPU design needs to be changed to avoid premature failure. From here on out, we focus on a 1D slice across the CPU surface, as shown in figure 2(a) (top), to easily visualize uncertainty.

Refer to caption
(a) Top: CPU die with CPU cores as heat sources and uniform cooling over the whole surface.
Bottom: Magnitude of heat sources and sinks q˙Vsubscript˙𝑞𝑉\dot{q}_{V}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT in the 1D slice in the upper subplot ().
Refer to caption
(b) Gaussian process integrating prior information about the temperature distribution, a mechanistic model of heat conduction in the form of a linear PDE, and empirical measurements (𝑿DTS,𝒚DTS)subscript𝑿DTSsubscript𝒚DTS({\bm{X}}_{\text{DTS}},{\bm{y}}_{\text{DTS}})( bold_italic_X start_POSTSUBSCRIPT DTS end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT DTS end_POSTSUBSCRIPT ) taken by limited-precision sensors (DTS). The plot shows the GP mean and a 1D slice illustrating the posterior uncertainty along with a few samples.
Figure 2: Physics-informed Gaussian process model of the stationary temperature distribution in an idealized hexa-core CPU die under sustained computational load.
Encoding Prior Knowledge

By inspecting the PDE’s differential operator 𝒟=κΔ=κi=1d2xi2,𝒟𝜅Δ𝜅superscriptsubscript𝑖1𝑑superscript2superscriptsubscript𝑥𝑖2\mathcal{D}=-\kappa\Delta=-\kappa\sum_{i=1}^{d}\frac{\partial^{2}}{\partial{x}% _{i}^{2}},caligraphic_D = - italic_κ roman_Δ = - italic_κ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , we can deduce that the paths of our Gaussian process need to be twice-differentiable in every input variable xisubscript𝑥𝑖{x}_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The construction in section B.4.1 tells us that that a GP prior whose covariance function is a tensor product k𝝂subscript𝑘𝝂k_{{\bm{\nu}}}italic_k start_POSTSUBSCRIPT bold_italic_ν end_POSTSUBSCRIPT of one-dimensional Matérn kernels kνisubscript𝑘subscript𝜈𝑖k_{{\nu}_{i}}italic_k start_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT with νi=2+12=52subscript𝜈𝑖21252{\nu}_{i}=2+\frac{1}{2}=\frac{5}{2}italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 2 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG = divide start_ARG 5 end_ARG start_ARG 2 end_ARG fulfills the desired path properties. Assume we also know what temperature ranges are plausible from similar CPU architectures, meaning we set the kernel output scale to σout2=9superscriptsubscript𝜎out29\sigma_{\text{out}}^{2}=9italic_σ start_POSTSUBSCRIPT out end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 9. Figure 3 shows the prior process uu{\mathrm{u}}roman_u on along with its image 𝒟[u]𝒢𝒫(𝒟[m],σout2𝒟k𝒟)similar-to𝒟delimited-[]u𝒢𝒫𝒟delimited-[]𝑚superscriptsubscript𝜎out2𝒟𝑘superscript𝒟\mathcal{D}[{\mathrm{u}}]\sim{\operatorname{\mathcal{GP}}\left(\mathcal{D}[m],% \sigma_{\text{out}}^{2}\mathcal{D}k\mathcal{D}^{\prime}\right)}caligraphic_D [ roman_u ] ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( caligraphic_D [ italic_m ] , italic_σ start_POSTSUBSCRIPT out end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_D italic_k caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) under the differential operator. A draw from 𝒟[u]𝒟delimited-[]u\mathcal{D}[{\mathrm{u}}]caligraphic_D [ roman_u ] can be interpreted as the heat sources and sinks that generated the corresponding temperature distribution draw from uu{\mathrm{u}}roman_u.

Refer to caption
(a) Gaussian process prior with a Matérn-5252\frac{5}{2}divide start_ARG 5 end_ARG start_ARG 2 end_ARG kernel over the temperature distribution of the CPU.
Refer to caption
(b) Prior under the differential operator 𝒟=κΔ𝒟𝜅Δ\mathcal{D}=-\kappa\Deltacaligraphic_D = - italic_κ roman_Δ along with heat sources and sinks q˙Vsubscript˙𝑞𝑉\dot{q}_{V}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT.
Figure 3: Prior model for the stationary temperature distribution of a CPU die under load.
Conditioning on the PDE

We can now inform our belief about the physics of heat conduction using the mechanistic model defined by the stationary heat equation. We choose a set of collocation points 𝑿PDE𝔻nsubscript𝑿PDEsuperscript𝔻𝑛{\bm{X}}_{\text{PDE}}\in{\mathbb{D}}^{n}bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT ∈ blackboard_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and then condition on the observation that the PDE holds (exactly) at these points. In other words, we compute the physically-informed Gaussian process uPDEu(𝓘PDE[u]=𝟎)conditionaluPDEconditionalusuperscript𝓘PDEdelimited-[]u0{\mathrm{u}}\mid\text{PDE}\coloneqq{\mathrm{u}}\mid({\bm{\mathcal{I}}}^{\text{% PDE}}[{\mathrm{u}}]={\bm{0}})roman_u ∣ PDE ≔ roman_u ∣ ( bold_caligraphic_I start_POSTSUPERSCRIPT PDE end_POSTSUPERSCRIPT [ roman_u ] = bold_0 ) with 𝓘PDE[u]κΔu(𝑿PDE)q˙V(𝑿PDE)superscript𝓘PDEdelimited-[]u𝜅Δusubscript𝑿PDEsubscript˙𝑞𝑉subscript𝑿PDE{\bm{\mathcal{I}}}^{\text{PDE}}[{\mathrm{u}}]\coloneqq-\kappa\Delta{\mathrm{u}% }\left({\bm{X}}_{\text{PDE}}\right)-\dot{q}_{V}({\bm{X}}_{\text{PDE}})bold_caligraphic_I start_POSTSUPERSCRIPT PDE end_POSTSUPERSCRIPT [ roman_u ] ≔ - italic_κ roman_Δ roman_u ( bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT ) - over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT ) visualized in figure 4.

Refer to caption
(a) Belief about the solution after conditioning on the PDE at a set of collocation points.
Refer to caption
(b) Belief about heat sources and sinks after conditioning on the PDE at collocation points.
Figure 4: We integrate mechanistic knowledge about the system by conditioning on PDE observations κΔu(𝑿PDE)q˙V(𝑿PDE)=𝟎𝜅Δusubscript𝑿PDEsubscript˙𝑞𝑉subscript𝑿PDE0-\kappa\Delta{\mathrm{u}}\left({\bm{X}}_{\text{PDE}}\right)-\dot{q}_{V}({\bm{X% }}_{\text{PDE}})={\bm{0}}- italic_κ roman_Δ roman_u ( bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT ) - over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT ) = bold_0 at the collocation points 𝑿PDEsubscript𝑿PDE{\bm{X}}_{\text{PDE}}bold_italic_X start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT, resulting in the conditional process u\nonscript|\nonscriptPDEconditionalu\nonscript\nonscriptPDE{\mathrm{u}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\text{PDE}roman_u | PDE. The large remaining uncertainty in figure 4(a) illustrates that the PDE by itself does not identify a unique solution.

We can see that the resulting conditional process indeed satisfies the PDE exactly at the collocation points (see figure 4(b)). The remaining uncertainty in figure 4(b) is due to the approximation error introduced by only conditioning on a finite number of collocation points. However, while the samples from our belief about the solution in figure 4(a) exhibit much more similarity to the mean function and less spatial variation, the marginal uncertainty hardly decreases. The latter is explained by the PDE not identifying a unique solution, since adding any affine function to usuperscript𝑢u^{\star}italic_u start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT does not alter its image under the differential operator, i.e. Δ(𝒂𝒙+b)=0Δsuperscript𝒂top𝒙𝑏0\Delta({\bm{a}}^{\top}{\bm{x}}+b)=0roman_Δ ( bold_italic_a start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x + italic_b ) = 0. There is an at least two-dimensional subspace of functions which can not be observed. This ambiguity can be resolved by introducing boundary conditions.

Conditioning on the Boundary Conditions

We assume that the CPU cooler extracts heat uniformly from all exposed parts of the CPU, in particular also from the sides, rather than just from the top. Instead of directly specifying the value of the temperature distribution at the edge points of the CPU slice, we only know the density q˙Asubscript˙𝑞𝐴\dot{q}_{A}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT of heat flowing out of each point on the CPU’s boundary based on the cooler specification. We can use another thermodynamical law to turn this assumption into information about the temperature distribution u𝑢uitalic_u.

Example 3 (continues=ex:thermal-conduction-heat-equation).

Fourier’s law states that the local density of heat q˙Asubscript˙𝑞𝐴\dot{q}_{A}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT flowing through a surface with normal vector 𝛈𝛈{\bm{\eta}}bold_italic_η is proportional to the inner product of the negative temperature gradient and the surface normal 𝛈𝛈{\bm{\eta}}bold_italic_η, i.e. q˙A=κ𝛈,u,subscript˙𝑞𝐴𝜅𝛈𝑢\dot{q}_{A}=-\kappa\langle{\bm{\eta}},\nabla u\rangle,over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = - italic_κ ⟨ bold_italic_η , ∇ italic_u ⟩ , where κ𝜅\kappaitalic_κ is the material’s thermal conductivity in W m1 Ktimeswattmeter1kelvin\mathrm{W}\text{\,}{\mathrm{m}}^{-1}\text{\,}\mathrm{K}start_ARG roman_W end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_m end_ARG start_ARG - 1 end_ARG end_ARG start_ARG times end_ARG start_ARG roman_K end_ARG (Lienhard and Lienhard, 2020).

Assuming sufficient differentiability of u𝑢uitalic_u, the inner product above is equal to the directional derivative ηusubscript𝜂𝑢\partial_{\eta}u∂ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT italic_u of u𝑢uitalic_u in direction η𝜂\etaitalic_η. We can assign an outward-pointing vector η(x)𝜂𝑥\eta(x)italic_η ( italic_x ) (almost) everywhere on the boundary of the domain. Since the boundary of the CPU domain is its surface, we can summarize the above in a Neumann boundary condition κη(x)u(x)=q˙A(x)𝜅subscript𝜂𝑥𝑢𝑥subscript˙𝑞𝐴𝑥-\kappa\partial_{\eta(x)}u\left(x\right)=\dot{q}_{A}(x)- italic_κ ∂ start_POSTSUBSCRIPT italic_η ( italic_x ) end_POSTSUBSCRIPT italic_u ( italic_x ) = over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( italic_x ) for x𝔻𝑥𝔻x\in\partial{\mathbb{D}}italic_x ∈ ∂ blackboard_D. Applying corollary 2 once more, we can inform our estimate of the solution about the boundary conditions by computing u\nonscript|\nonscriptPDE,NBC(u\nonscript|\nonscriptPDE)\nonscript|\nonscript𝓘NBC[(u,q˙A)]=𝟎,{\mathrm{u}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\text{PDE},\text{% NBC}\coloneqq\left.\left({\mathrm{u}}\nonscript\>|\allowbreak\nonscript\>% \mathopen{}\text{PDE}\right)\nonscript\>\middle|\allowbreak\nonscript\>% \mathopen{}{\bm{\mathcal{I}}}^{\text{NBC}}[({\mathrm{u}},\dot{{\mathrm{q}}}_{A% })]={\bm{0}}\right.,roman_u | PDE , NBC ≔ ( roman_u | PDE ) | bold_caligraphic_I start_POSTSUPERSCRIPT NBC end_POSTSUPERSCRIPT [ ( roman_u , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) ] = bold_0 , where 𝓘NBC[u]κη(𝑿NBC)u(𝑿NBC)q˙A(𝑿NBC)superscript𝓘NBCdelimited-[]u𝜅subscript𝜂subscript𝑿NBCusubscript𝑿NBCsubscript˙q𝐴subscript𝑿NBC{\bm{\mathcal{I}}}^{\text{NBC}}[{\mathrm{u}}]\coloneqq-\kappa\partial_{\eta({% \bm{X}}_{\text{NBC}})}{\mathrm{u}}\left({\bm{X}}_{\text{NBC}}\right)-\dot{{% \mathrm{q}}}_{A}({\bm{X}}_{\text{NBC}})bold_caligraphic_I start_POSTSUPERSCRIPT NBC end_POSTSUPERSCRIPT [ roman_u ] ≔ - italic_κ ∂ start_POSTSUBSCRIPT italic_η ( bold_italic_X start_POSTSUBSCRIPT NBC end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT roman_u ( bold_italic_X start_POSTSUBSCRIPT NBC end_POSTSUBSCRIPT ) - over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( bold_italic_X start_POSTSUBSCRIPT NBC end_POSTSUBSCRIPT ) with 𝑿NBC={0,wCPU}subscript𝑿NBC0subscript𝑤CPU{\bm{X}}_{\text{NBC}}=\{0,w_{\text{CPU}}\}bold_italic_X start_POSTSUBSCRIPT NBC end_POSTSUBSCRIPT = { 0 , italic_w start_POSTSUBSCRIPT CPU end_POSTSUBSCRIPT } is the information operator induced by the boundary conditions. The result is visualized in figure 5(a). The structure of the samples illustrates that most of the remaining uncertainty about the solution lies in a one-dimensional subspace of 𝕌𝕌{\mathbb{U}}blackboard_U corresponding to constant functions. This is due to the fact that two Neumann boundary conditions on both sides of the domain only determine the solution of the PDE up to an additive constant. We need an additional source of information to address the remaining degree of freedom.

Conditioning on Direct Measurements

Fortunately, CPUs are equipped with digital thermal sensors (DTS) located close to each of the cores, which provide (noisy) local measurements of the core temperatures (Michaud, 2019). These measurements can be straightforwardly accounted for in our model by performing standard GP regression using u\nonscript|\nonscriptPDE,NBCconditionalu\nonscript\nonscriptPDENBC{\mathrm{u}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\text{PDE},\text{NBC}roman_u | PDE , NBC from figure 5(a) as a prior. The resulting belief about the temperature distribution is visualized in figure 5(b).

Refer to caption
(a) Belief about the solution after conditioning on the PDE and boundary conditions (BCs).
Refer to caption
(b) Belief about the solution after conditioning on the PDE, BCs and noisy sensor data.
Figure 5: Neumann boundary conditions encoding mechanistic knowledge about the heat flux across the boundary of the CPU and a sparse set of limited-precision measurements of the temperature distribution made by digital thermal sensors (DTS) located at the points 𝑿DTSsubscript𝑿DTS{\bm{X}}_{\text{DTS}}bold_italic_X start_POSTSUBSCRIPT DTS end_POSTSUBSCRIPT further constrain the solution of the PDE. The remaining uncertainty is due to measurement noise and discretization error.

We can see that integrating the interior measurements effectively reduces the uncertainty due to the remaining degree of freedom, albeit not completely. The remaining uncertainty is due to the model’s consistent accounting for noise in the thermal sensor readings, the uncertainty about the cooling, i.e. the boundary conditions, and the discretization error incurred by only choosing a small set of collocation points.

Uncertainty in the Right-hand Side and the Boundary Function

Above, we assumed the true heat source term q˙Vsubscript˙𝑞𝑉\dot{q}_{V}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT, i.e. the right-hand side of the PDE, and the boundary heat flux q˙Asubscript˙𝑞𝐴\dot{q}_{A}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT to be known exactly. However, in practice, this is rarely the case. Fortunately, our probabilistic viewpoint admits a straightforward relaxation of this assumption. Namely, we can replace q˙Vsubscript˙𝑞𝑉\dot{q}_{V}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT and q˙Asubscript˙𝑞𝐴\dot{q}_{A}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT by a joint Gaussian process prior (q˙V,q˙A)subscript˙q𝑉subscript˙q𝐴(\dot{{\mathrm{q}}}_{V},\dot{{\mathrm{q}}}_{A})( over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ), whose means are given by estimates of q˙Vsubscript˙𝑞𝑉\dot{q}_{V}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT and q˙Asubscript˙𝑞𝐴\dot{q}_{A}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT.777 Technically speaking, if the right-hand-side of the PDE is given as a Gaussian process, the PDE turns into a stochastic partial differential equation (SPDE). Above, we assumed that the cooler is controlled in such a way, that the temperature distribution in the CPU does not change over time. However, a naive prior (q˙V,q˙A)subscript˙q𝑉subscript˙q𝐴(\dot{{\mathrm{q}}}_{V},\dot{{\mathrm{q}}}_{A})( over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) may break this assumption. We need to encode that the amount of heat entering the CPU is equal to the amount of heat leaving the CPU via its boundary, i.e.

STAT[(q˙V,q˙A)]𝔻q˙V(𝒙)d𝒙𝔻q˙A(𝒙)dA=0,superscriptSTATdelimited-[]subscript˙q𝑉subscript˙q𝐴subscript𝔻subscript˙q𝑉𝒙differential-d𝒙subscript𝔻subscript˙q𝐴𝒙differential-d𝐴0\mathcal{I}^{\text{STAT}}[(\dot{{\mathrm{q}}}_{V},\dot{{\mathrm{q}}}_{A})]% \coloneqq\int_{{\mathbb{D}}}\dot{{\mathrm{q}}}_{V}({\bm{x}})\,\mathrm{d}{\bm{x% }}-\int_{\partial{\mathbb{D}}}\dot{{\mathrm{q}}}_{A}({\bm{x}})\,\mathrm{d}A=0,caligraphic_I start_POSTSUPERSCRIPT STAT end_POSTSUPERSCRIPT [ ( over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) ] ≔ ∫ start_POSTSUBSCRIPT blackboard_D end_POSTSUBSCRIPT over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( bold_italic_x ) roman_d bold_italic_x - ∫ start_POSTSUBSCRIPT ∂ blackboard_D end_POSTSUBSCRIPT over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( bold_italic_x ) roman_d italic_A = 0 , (3.6)

The (jointly) linear information operator STATsuperscriptSTAT\mathcal{I}^{\text{STAT}}caligraphic_I start_POSTSUPERSCRIPT STAT end_POSTSUPERSCRIPT computes the net amount of thermal energy that the CPU gains per unit time. Using theorem 1 we can construct a multi-output GP prior (u,q˙V,q˙A)usubscript˙q𝑉subscript˙q𝐴({\mathrm{u}},\dot{{\mathrm{q}}}_{V},\dot{{\mathrm{q}}}_{A})( roman_u , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ), which is consistent with the assumption of thermal stationarity by conditioning on STAT[(q˙V,q˙A)]=0superscriptSTATdelimited-[]subscript˙q𝑉subscript˙q𝐴0\mathcal{I}^{\text{STAT}}[(\dot{{\mathrm{q}}}_{V},\dot{{\mathrm{q}}}_{A})]=0caligraphic_I start_POSTSUPERSCRIPT STAT end_POSTSUPERSCRIPT [ ( over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) ] = 0. Here, we assume a-priori that uu{\mathrm{u}}roman_u, q˙Vsubscript˙q𝑉\dot{{\mathrm{q}}}_{V}over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT, and q˙Asubscript˙q𝐴\dot{{\mathrm{q}}}_{A}over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT are pairwise independent. In the one-dimensional model, we can simplify equation 3.6 by assuming that heat is drawn uniformly from the sides of the CPU. By encoding this information in the prior q˙Asubscript˙q𝐴\dot{{\mathrm{q}}}_{A}over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT, the information operator corresponding to thermal stationarity resolves to

STAT[(q˙V,q˙A)]=hCPU0wCPUq˙V(x)dxhCPU(q˙A(0)+q˙A(wCPU)).superscriptSTATdelimited-[]subscript˙q𝑉subscript˙q𝐴subscriptCPUsuperscriptsubscript0subscript𝑤CPUsubscript˙q𝑉𝑥differential-d𝑥subscriptCPUsubscript˙q𝐴0subscript˙q𝐴subscript𝑤CPU\mathcal{I}^{\text{STAT}}[(\dot{{\mathrm{q}}}_{V},\dot{{\mathrm{q}}}_{A})]=h_{% \text{CPU}}\int_{0}^{w_{\text{CPU}}}\dot{{\mathrm{q}}}_{V}(x)\,\mathrm{d}x-h_{% \text{CPU}}\left(\dot{{\mathrm{q}}}_{A}(0)+\dot{{\mathrm{q}}}_{A}(w_{\text{CPU% }})\right).caligraphic_I start_POSTSUPERSCRIPT STAT end_POSTSUPERSCRIPT [ ( over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) ] = italic_h start_POSTSUBSCRIPT CPU end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT CPU end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x - italic_h start_POSTSUBSCRIPT CPU end_POSTSUBSCRIPT ( over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( 0 ) + over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT CPU end_POSTSUBSCRIPT ) ) . (3.7)

As above, we can now use corollary 2 to condition our physically-consistent GP prior (u,q˙V,q˙A)\nonscript|\nonscriptSTAT(u,q˙V,q˙A)\nonscript|\nonscript(STAT[(q˙V,q˙A)]=0)({\mathrm{u}},\dot{{\mathrm{q}}}_{V},\dot{{\mathrm{q}}}_{A})\nonscript\>|% \allowbreak\nonscript\>\mathopen{}\text{STAT}\coloneqq\left.({\mathrm{u}},\dot% {{\mathrm{q}}}_{V},\dot{{\mathrm{q}}}_{A})\nonscript\>\middle|\allowbreak% \nonscript\>\mathopen{}\left(\mathcal{I}^{\text{STAT}}[(\dot{{\mathrm{q}}}_{V}% ,\dot{{\mathrm{q}}}_{A})]=0\right)\right.( roman_u , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) | STAT ≔ ( roman_u , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) | ( caligraphic_I start_POSTSUPERSCRIPT STAT end_POSTSUPERSCRIPT [ ( over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) ] = 0 ) on 𝓘PDE[(u,q˙V)]=𝟎,superscript𝓘PDEdelimited-[]usubscript˙q𝑉0{\bm{\mathcal{I}}}^{\text{PDE}}[({\mathrm{u}},\dot{{\mathrm{q}}}_{V})]={\bm{0}},bold_caligraphic_I start_POSTSUPERSCRIPT PDE end_POSTSUPERSCRIPT [ ( roman_u , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) ] = bold_0 , as well as 𝓘NBC[(u,q˙A)]=𝟎superscript𝓘NBCdelimited-[]usubscript˙q𝐴0{\bm{\mathcal{I}}}^{\text{NBC}}[({\mathrm{u}},\dot{{\mathrm{q}}}_{A})]={\bm{0}}bold_caligraphic_I start_POSTSUPERSCRIPT NBC end_POSTSUPERSCRIPT [ ( roman_u , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) ] = bold_0 and the noisy measurements of the temperature distribution. Here, it is important to keep track of the cross-covariances in (u,q˙V,q˙A),usubscript˙q𝑉subscript˙q𝐴({\mathrm{u}},\dot{{\mathrm{q}}}_{V},\dot{{\mathrm{q}}}_{A}),( roman_u , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) , since the outputs in (q˙V,q˙A)\nonscript|\nonscriptSTATconditionalsubscript˙q𝑉subscript˙q𝐴\nonscript\nonscriptSTAT(\dot{{\mathrm{q}}}_{V},\dot{{\mathrm{q}}}_{A})\nonscript\>|\allowbreak% \nonscript\>\mathopen{}\text{STAT}( over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) | STAT become correlated. The resulting process u\nonscript|\nonscriptPDE,NBC,STAT,DTSconditionalu\nonscript\nonscriptPDENBCSTATDTS{\mathrm{u}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\text{PDE},\text{% NBC},\text{STAT},\text{DTS}roman_u | PDE , NBC , STAT , DTS (or rather its marginals) is shown in figure 6.

Refer to caption
(a) Posterior belief about the temperature distribution physically consistent with the assumption of stationarity.
Refer to caption
(b) Posterior belief about the heat sources and sinks after conditioning on the corresponding uncertain right-hand-side q˙Vsubscript˙q𝑉\dot{{\mathrm{q}}}_{V}over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT of the PDE.
Figure 6: We integrate information from the joint prior (u,q˙V,q˙A)\nonscript|\nonscriptSTATconditionalusubscript˙q𝑉subscript˙q𝐴\nonscript\nonscriptSTAT({\mathrm{u}},\dot{{\mathrm{q}}}_{V},\dot{{\mathrm{q}}}_{A})\nonscript\>|% \allowbreak\nonscript\>\mathopen{}\text{STAT}( roman_u , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) | STAT over the solution, the right-hand side of the PDE, and the values of the Neumann boundary conditions into our belief about the temperature distribution by conditioning on said PDE and boundary conditions.

Comparing figures 6(a) and 5(b), we can see that, due to the uncertainty in the right-hand side q˙Vsubscript˙q𝑉\dot{{\mathrm{q}}}_{V}over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT of the PDE, the samples of u\nonscript|\nonscriptPDE,NBC,STAT,DTSconditionalu\nonscript\nonscriptPDENBCSTATDTS{\mathrm{u}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\text{PDE},\text{% NBC},\text{STAT},\text{DTS}roman_u | PDE , NBC , STAT , DTS exhibit much more spatial variation. Moreover, the samples of the GP posterior over q˙Vsubscript˙q𝑉\dot{{\mathrm{q}}}_{V}over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT fulfill the stationarity constraint we imposed.

uu{\mathrm{u}}roman_uq˙Vsubscript˙q𝑉\dot{{\mathrm{q}}}_{V}over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPTq˙Asubscript˙q𝐴\dot{{\mathrm{q}}}_{A}over˙ start_ARG roman_q end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT𝓘PDEsuperscript𝓘PDE{\bm{\mathcal{I}}}^{\text{PDE}}bold_caligraphic_I start_POSTSUPERSCRIPT PDE end_POSTSUPERSCRIPT𝓘NBCsuperscript𝓘NBC{\bm{\mathcal{I}}}^{\text{NBC}}bold_caligraphic_I start_POSTSUPERSCRIPT NBC end_POSTSUPERSCRIPTu(𝑿DTS)usubscript𝑿DTS{\mathrm{u}}({\bm{X}}_{\text{DTS}})roman_u ( bold_italic_X start_POSTSUBSCRIPT DTS end_POSTSUBSCRIPT )+++ϵDTSsubscriptbold-italic-ϵDTS{\bm{\mathrm{\epsilon}}}_{\text{DTS}}bold_italic_ϵ start_POSTSUBSCRIPT DTS end_POSTSUBSCRIPTSTATsuperscriptSTAT\mathcal{I}^{\text{STAT}}caligraphic_I start_POSTSUPERSCRIPT STAT end_POSTSUPERSCRIPT
Figure 7: Representation of the CPU model as a directed graphical model. The inference procedure described in section 3.2 is equivalent to the junction tree algorithm (Bishop, 2006, Section 8.4.6) applied to the graphical model above. This example shows that the language of information operators is a powerful tool for aggregating heterogeneous sources of partial information in a joint probabilistic model.
Summary

Stepping back, we can view the problem of modeling the CPU under computational load as a scientific inference problem, where we need to aggregate heterogeneous sources of information in a joint probabilistic model. This inference task is illustrated as a directed graphical model in figure 7. Our physics-informed regression framework is a local computation in the global inference procedure on the graph. Importantly, its implementation does not change based on what happens to the solution estimate and the input data in either upstream or downstream computations. All this information is already handily encoded in the structured uncertainties of the Gaussian processes.

3.3 A General Class of Tractable Information Operators for Linear PDEs

Recall that conditioning on the linear PDE directly via the information operator [𝐮]=𝒟[𝐮]fdelimited-[]𝐮𝒟delimited-[]𝐮𝑓\mathcal{I}[{\bm{\mathrm{u}}}]=\mathcal{D}[{\bm{\mathrm{u}}}]-fcaligraphic_I [ bold_u ] = caligraphic_D [ bold_u ] - italic_f is usually intractable. Instead, in section 3.1.2 we approximated this information operator by 𝓘:𝕌n:𝓘𝕌superscript𝑛{\bm{\mathcal{I}}}\colon{\mathbb{U}}\to\mathbb{R}^{n}bold_caligraphic_I : blackboard_U → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with i[𝐮]𝒟[𝐮](𝒙i)f(𝒙i)subscript𝑖delimited-[]𝐮𝒟delimited-[]𝐮subscript𝒙𝑖𝑓subscript𝒙𝑖{\mathcal{I}}_{i}[{\bm{\mathrm{u}}}]\coloneqq\mathcal{D}[{\bm{\mathrm{u}}}]({% \bm{x}}_{i})-f({\bm{x}}_{i})caligraphic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ bold_u ] ≔ caligraphic_D [ bold_u ] ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) where 𝒙i𝔻subscript𝒙𝑖𝔻{\bm{x}}_{i}\in{\mathbb{D}}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_D. This implicitly assumes that point evaluation on both 𝒟[𝐮]𝒟delimited-[]𝐮\mathcal{D}[{\bm{\mathrm{u}}}]caligraphic_D [ bold_u ] and f𝑓fitalic_f is well-defined, which crucially means that this approach applies only to strong solutions of PDEs. In this section, we extend this approximation scheme to a general class of tractable information operators aimed at approximating both weak and strong solutions to linear PDEs. Our framework is inspired by the method of weighted residuals (MWR) (see section 2.1.2). In fact, in section 3.3.4 we will show that GP inference with information operators in this class reproduces any weighted residual method in the posterior mean while additionally providing an estimate of the inherent approximation error.

author=Jonathanauthor=Jonathantodo: author=JonathanThe paragraphs below need some cleanup. They seem quite disconnected. Give the reader some context of what they are reading next, e.g. by adding a paragraph header “Notation”, or better connecting the different paragraphs.

In the following, we will consider both weak and strong formulations of linear PDEs, which is why we introduce the unifying notation 𝒟(w)[𝒖]=f(w).superscript𝒟𝑤delimited-[]𝒖superscript𝑓𝑤\mathcal{D}^{(w)}[{\bm{u}}]=f^{(w)}.caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT [ bold_italic_u ] = italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT . For a strong formulation, 𝒟(w)𝒟superscript𝒟𝑤𝒟\mathcal{D}^{(w)}\coloneqq\mathcal{D}caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ≔ caligraphic_D, where 𝒟:𝕌𝕍:𝒟maps-to𝕌𝕍\mathcal{D}\colon{\mathbb{U}}\mapsto{\mathbb{V}}caligraphic_D : blackboard_U ↦ blackboard_V is a linear differential operator (see definition 23), and f(w)f𝕍superscript𝑓𝑤𝑓𝕍f^{(w)}\coloneqq f\in{\mathbb{V}}italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ≔ italic_f ∈ blackboard_V is the right-hand side function. In the context of a weak formulation, 𝒟(w)𝒟wsuperscript𝒟𝑤superscript𝒟𝑤\mathcal{D}^{(w)}\coloneqq\mathcal{D}^{w}caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ≔ caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT, where 𝒟w:𝕌𝕍:superscript𝒟𝑤maps-to𝕌superscript𝕍\mathcal{D}^{w}\colon{\mathbb{U}}\mapsto{\mathbb{V}}^{\prime}caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT : blackboard_U ↦ blackboard_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the weak differential operator, and f(w)fw𝕍superscript𝑓𝑤superscript𝑓𝑤superscript𝕍f^{(w)}\coloneqq f^{w}\in{\mathbb{V}}^{\prime}italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ≔ italic_f start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ∈ blackboard_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the right-hand side functional (see section 2.1.1). Following section 2.1.2, we will apply linear functionals to the PDE residual. To facilitate notation, we define the shorthand 𝕃(w)superscript𝕃𝑤{\mathbb{L}}^{(w)}blackboard_L start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT for the space of continuous linear functionals on the image space of 𝒟(w)superscript𝒟𝑤\mathcal{D}^{(w)}caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT, i.e. 𝕃(w)𝕍superscript𝕃𝑤superscript𝕍{\mathbb{L}}^{(w)}\coloneqq{\mathbb{V}}^{\prime}blackboard_L start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ≔ blackboard_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in the context of a strong formulation and 𝕃(w)𝕍′′superscript𝕃𝑤superscript𝕍′′{\mathbb{L}}^{(w)}\coloneqq{\mathbb{V}}^{\prime\prime}blackboard_L start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ≔ blackboard_V start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT in the context of a weak formulation. We additionally require that l𝒟(w)𝑙superscript𝒟𝑤l\circ\mathcal{D}^{(w)}italic_l ∘ caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT is continuous for every l𝕃(w)𝑙superscript𝕃𝑤l\in{\mathbb{L}}^{(w)}italic_l ∈ blackboard_L start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT. author=Marvinauthor=Marvintodo: author=MarvinMaybe mention why these assumptions are typically fulfilled in practice, e.g. via (Lions-)Lax-Milgram assumptions

Let 𝐮𝒢𝒫(𝒎,𝒌)similar-to𝐮𝒢𝒫𝒎𝒌{\bm{\mathrm{u}}}\sim{\operatorname{\mathcal{GP}}\left({\bm{m}},{\bm{k}}\right)}bold_u ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m , bold_italic_k ) be a Gaussian process prior over the solution 𝐮𝐮{\bm{\mathrm{u}}}bold_u of the PDE, whose path space can be continuously embedded into the solution space 𝕌𝕌{\mathbb{U}}blackboard_U (see section B.4 for more details on the latter assumption). It is intractable to condition the GP prior on the full information provided by the PDE via the family {l}l𝕃(w)subscriptsubscript𝑙𝑙superscript𝕃𝑤\{\mathcal{I}_{l}\}_{l\in{\mathbb{L}}^{(w)}}{ caligraphic_I start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_l ∈ blackboard_L start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT of affine information operators l[𝐮](l𝒟(w))[𝐮]l[f(w)],subscript𝑙delimited-[]𝐮𝑙superscript𝒟𝑤delimited-[]𝐮𝑙delimited-[]superscript𝑓𝑤\mathcal{I}_{l}[{\bm{\mathrm{u}}}]\coloneqq(l\circ\mathcal{D}^{(w)})[{\bm{% \mathrm{u}}}]-l[f^{(w)}],caligraphic_I start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT [ bold_u ] ≔ ( italic_l ∘ caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ) [ bold_u ] - italic_l [ italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ] , since 𝕃(w)superscript𝕃𝑤{\mathbb{L}}^{(w)}blackboard_L start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT is typically infinite-dimensional. To identify tractable families of information operators, we take inspiration from the method of weighted residuals. author=Jonathanauthor=Jonathantodo: author=Jonathanthere is some duplication here about intractability with the intro of this section

3.3.1 Infinite-Dimensional Trial Function Spaces

Using theorem 1 we can tractably condition on a finite subfamily {l(i)}i=1n{l}l𝕃(w)superscriptsubscriptsubscriptsuperscript𝑙𝑖𝑖1𝑛subscriptsubscript𝑙𝑙superscript𝕃𝑤\{\mathcal{I}_{l^{(i)}}\}_{i=1}^{n}\subset\{\mathcal{I}_{l}\}_{l\in{\mathbb{L}% }^{(w)}}{ caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⊂ { caligraphic_I start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_l ∈ blackboard_L start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT of information operators, where {l(i)}i=1n𝕃(w)superscriptsubscriptsuperscript𝑙𝑖𝑖1𝑛superscript𝕃𝑤\{l^{(i)}\}_{i=1}^{n}\subset{\mathbb{L}}^{(w)}{ italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⊂ blackboard_L start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT is a finite subset of test functionals, as long as we can compute l(i)[𝒎],subscriptsuperscript𝑙𝑖delimited-[]𝒎\mathcal{I}_{l^{(i)}}[{\bm{m}}],caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ] , l(i)[𝒌:,j(,𝒙)],subscriptsuperscript𝑙𝑖delimited-[]subscript𝒌:𝑗𝒙\mathcal{L}_{l^{(i)}}[{\bm{k}}_{:,j}(\cdot,{\bm{x}})],caligraphic_L start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_k start_POSTSUBSCRIPT : , italic_j end_POSTSUBSCRIPT ( ⋅ , bold_italic_x ) ] , and l(i)𝒌l(i),subscriptsuperscript𝑙𝑖𝒌superscriptsubscriptsuperscript𝑙𝑖\mathcal{L}_{l^{(i)}}{\bm{k}}\mathcal{L}_{l^{(i)}}^{\prime},caligraphic_L start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_italic_k caligraphic_L start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , where l(i)=l(i)𝒟(w)subscriptsuperscript𝑙𝑖superscript𝑙𝑖superscript𝒟𝑤\mathcal{L}_{l^{(i)}}=l^{(i)}\circ\mathcal{D}^{(w)}caligraphic_L start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∘ caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT. This might not always be possible in closed-form, since l(i)subscriptsuperscript𝑙𝑖\mathcal{L}_{l^{(i)}}caligraphic_L start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT often involves computing integrals. However, in these cases one could fall back to an efficient numeric quadrature method, since the integrals are often low-dimensional (typically at most four-dimensional). A prominent example of this approach is the probabilistic meshless method used in section 3.

Example 4 (Symmetric Collocation).

If the PDE is in strong formulation, then l(i)=δ𝐱i𝕍superscript𝑙𝑖subscript𝛿subscript𝐱𝑖superscript𝕍l^{(i)}=\delta_{{\bm{x}}_{i}}\in{\mathbb{V}}^{\prime}italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with 𝐱i𝔻subscript𝐱𝑖𝔻{\bm{x}}_{i}\in{\mathbb{D}}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_D is a valid test functional, which induces the information operator

l(i)[𝐮]=𝒟[𝐮](𝒙i)f(𝒙i),subscriptsuperscript𝑙𝑖delimited-[]𝐮𝒟delimited-[]𝐮subscript𝒙𝑖𝑓subscript𝒙𝑖\mathcal{I}_{l^{(i)}}[{\bm{\mathrm{u}}}]=\mathcal{D}[{\bm{\mathrm{u}}}]({\bm{x% }}_{i})-f({\bm{x}}_{i}),caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_u ] = caligraphic_D [ bold_u ] ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ,

i.e. we recover the probabilistic meshless method by Cockayne et al. (2017). They show that the conditional mean of this approach reproduces symmetric collocation (Fasshauer, 1997, 1999), a well-known method to approximate strong solutions of PDEs.

The probabilistic meshless method can only be used to approximate strong solutions of linear PDEs, since point evaluation functionals are not well-defined on the image space 𝕍superscript𝕍{\mathbb{V}}^{\prime}blackboard_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of 𝒟wsuperscript𝒟𝑤\mathcal{D}^{w}caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT. However, other choices of the l(i)superscript𝑙𝑖l^{(i)}italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT lead to approximation schemes for weak solutions.

Example 5 (Weak Formulations).

Consider a linear PDE in weak formulation. As mentioned in section 2.1.2, it is customary to use test functionals l(i)superscript𝑙𝑖l^{(i)}italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT, which are induced by test functions ψ(i)𝕍superscript𝜓𝑖𝕍\psi^{(i)}\in{\mathbb{V}}italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ blackboard_V, i.e.

l(i)[𝐮]=𝒟w[𝐮][ψ(i)]fw[ψ(i)].subscriptsuperscript𝑙𝑖delimited-[]𝐮superscript𝒟𝑤delimited-[]𝐮delimited-[]superscript𝜓𝑖superscript𝑓𝑤delimited-[]superscript𝜓𝑖\mathcal{I}_{l^{(i)}}[{\bm{\mathrm{u}}}]=\mathcal{D}^{w}[{\bm{\mathrm{u}}}][% \psi^{(i)}]-f^{w}[\psi^{(i)}].caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_u ] = caligraphic_D start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT [ bold_u ] [ italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ] - italic_f start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT [ italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ] . (3.8)

For instance, if 𝔻=]l,r[{\mathbb{D}}=]l,r[\subset\mathbb{R}blackboard_D = ] italic_l , italic_r [ ⊂ blackboard_R, then a valid set of test functions for the weak formulation from section 2.1.1 is given by

ψ(i)={xxi1xixi1if xi1xxi,xi+1xxi+1xiif xixxi+1,0otherwise.H01(]l,r[),\psi^{(i)}=\begin{cases}\frac{x-x_{i-1}}{x_{i}-x_{i-1}}&\text{if }x_{i-1}\leq x% \leq x_{i},\\ \frac{x_{i+1}-x}{x_{i+1}-x_{i}}&\text{if }x_{i}\leq x\leq x_{i+1},\\ 0&\text{otherwise}.\end{cases}\quad\in H^{1}_{0}\left(]l,r[\right),italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = { start_ROW start_CELL divide start_ARG italic_x - italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL if italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ≤ italic_x ≤ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL start_CELL if italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_x ≤ italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise . end_CELL end_ROW ∈ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( ] italic_l , italic_r [ ) , (3.9)

where l=x0<<xn+1=r𝑙subscript𝑥0subscript𝑥𝑛1𝑟l=x_{0}<\dotsb<x_{n+1}=ritalic_l = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < ⋯ < italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = italic_r. The test functions are visualized in figure 8(a). For the weak formulation in section 2.1.1, the information operator from equation 3.8 is equivalent to l(i)[u]=B[u,ψ(i)]f,ψ(i)L2.subscriptsuperscript𝑙𝑖delimited-[]u𝐵usuperscript𝜓𝑖subscript𝑓superscript𝜓𝑖subscript𝐿2\mathcal{I}_{l^{(i)}}[{\mathrm{u}}]=B[{\mathrm{u}},\psi^{(i)}]-\langle f,\psi^% {(i)}\rangle_{L_{2}}.caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ roman_u ] = italic_B [ roman_u , italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ] - ⟨ italic_f , italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

3.3.2 Finite-Dimensional Trial Function Spaces

As opposed to the methods outlined in section 2.1.2, we did not need to choose a finite-dimensional subspace of trial functions to arrive at tractable information operators in section 3.3.1. Nevertheless, in practice, it might still be desirable to specify a finite-dimensional trial function basis ϕ(1),,ϕ(m)superscriptbold-italic-ϕ1superscriptbold-italic-ϕ𝑚{\bm{\phi}}^{(1)},\dotsc,{\bm{\phi}}^{(m)}bold_italic_ϕ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT, e.g. because

  • we want to reproduce the output of a classical method in the posterior mean to use the GP solver as an uncertainty-aware drop-in replacement (see corollary 4);

  • the trial basis encompasses problem-specific knowledge, which is difficult to encode in the prior; or

  • we want to solve the problem in a coarse-to-fine scheme, allowing for mesh refinement strategies, which are informed by the GP’s uncertainty estimate.

Naively, one might achieve this goal by defining the prior over 𝐮𝐮{\bm{\mathrm{u}}}bold_u as a parametric Gaussian process with features ϕ(i)superscriptbold-italic-ϕ𝑖{\bm{\phi}}^{(i)}bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT. However, this means the posterior can not quantify the inherent approximation error, since the GP has no support outside of the finite subspace of 𝕌𝕌{\mathbb{U}}blackboard_U spanned by the trial functions. Consequently, we need to take a different approach. Starting from a general, potentially nonparametric prior over 𝐮𝐮{\bm{\mathrm{u}}}bold_u, we consider a bounded (potentially oblique) projection 𝒫𝕌^:𝕌𝕌^:subscript𝒫^𝕌𝕌^𝕌\mathcal{P}_{\hat{{\mathbb{U}}}}\colon{\mathbb{U}}\to\hat{{\mathbb{U}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT : blackboard_U → over^ start_ARG blackboard_U end_ARG onto a subspace 𝕌^𝕌^𝕌𝕌\hat{{\mathbb{U}}}\subset{\mathbb{U}}over^ start_ARG blackboard_U end_ARG ⊂ blackboard_U, i.e. 𝒫𝕌^2=𝒫𝕌^superscriptsubscript𝒫^𝕌2subscript𝒫^𝕌\mathcal{P}_{\hat{{\mathbb{U}}}}^{2}=\mathcal{P}_{\hat{{\mathbb{U}}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT, 𝒫𝕌^<delimited-∥∥subscript𝒫^𝕌\lVert\mathcal{P}_{\hat{{\mathbb{U}}}}\rVert<\infty∥ caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ∥ < ∞, and ran(𝒫𝕌^)=𝕌^ransubscript𝒫^𝕌^𝕌\operatorname{ran}(\mathcal{P}_{\hat{{\mathbb{U}}}})=\hat{{\mathbb{U}}}roman_ran ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) = over^ start_ARG blackboard_U end_ARG. In general, this subspace need not be finite-dimensional. We apply 𝒫𝕌^subscript𝒫^𝕌\mathcal{P}_{\hat{{\mathbb{U}}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT to our GP prior over 𝐮𝐮{\bm{\mathrm{u}}}bold_u, which, by corollary 2, results in another GP

𝐮^𝒫𝕌^[𝐮]𝒢𝒫(𝒫𝕌^[𝒎],𝒫𝕌^𝒌𝒫𝕌^),^𝐮subscript𝒫^𝕌delimited-[]𝐮similar-to𝒢𝒫subscript𝒫^𝕌delimited-[]𝒎subscript𝒫^𝕌𝒌superscriptsubscript𝒫^𝕌\hat{{\bm{\mathrm{u}}}}\coloneqq\mathcal{P}_{\hat{{\mathbb{U}}}}[{\bm{\mathrm{% u}}}]\sim{\operatorname{\mathcal{GP}}\left(\mathcal{P}_{\hat{{\mathbb{U}}}}[{% \bm{m}}],\mathcal{P}_{\hat{{\mathbb{U}}}}{\bm{k}}\mathcal{P}_{\hat{{\mathbb{U}% }}}^{\prime}\right)},over^ start_ARG bold_u end_ARG ≔ caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT [ bold_u ] ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT [ bold_italic_m ] , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT bold_italic_k caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ,

with sample paths in 𝕌^^𝕌\hat{{\mathbb{U}}}over^ start_ARG blackboard_U end_ARG. This discards prior information about ker(𝒫𝕌^)kersubscript𝒫^𝕌\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ). Hence, especially in case dim𝕌^<dimension^𝕌\dim{\hat{{\mathbb{U}}}}<\inftyroman_dim over^ start_ARG blackboard_U end_ARG < ∞, applying the information operators l(i)subscriptsuperscript𝑙𝑖\mathcal{I}_{l^{(i)}}caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT from section 3.3.1 directly to 𝐮^^𝐮\hat{{\bm{\mathrm{u}}}}over^ start_ARG bold_u end_ARG would suffer from similar problems as choosing a parametric prior. However,

l(i),𝒫𝕌^l(i)𝒫𝕌^=(l(i)𝒟(w)𝒫𝕌^)[]l(i)[f(w)]subscriptsuperscript𝑙𝑖subscript𝒫^𝕌subscriptsuperscript𝑙𝑖subscript𝒫^𝕌superscript𝑙𝑖superscript𝒟𝑤subscript𝒫^𝕌delimited-[]superscript𝑙𝑖delimited-[]superscript𝑓𝑤\mathcal{I}_{l^{(i)},\mathcal{P}_{\hat{{\mathbb{U}}}}}\coloneqq\mathcal{I}_{l^% {(i)}}\circ\mathcal{P}_{\hat{{\mathbb{U}}}}=(l^{(i)}\circ\mathcal{D}^{(w)}% \circ\mathcal{P}_{\hat{{\mathbb{U}}}})[\cdot]-l^{(i)}[f^{(w)}]caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≔ caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT = ( italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∘ caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ∘ caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) [ ⋅ ] - italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ]

is a valid information operator for 𝐮𝐮{\bm{\mathrm{u}}}bold_u, which leads to a probabilistic generalization of the method of weighted residuals. This is why we refer to l(i),𝒫𝕌^subscriptsuperscript𝑙𝑖subscript𝒫^𝕌\mathcal{I}_{l^{(i)},\mathcal{P}_{\hat{{\mathbb{U}}}}}caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT as an MWR information operator.

The similarity to the method of weighted residuals is particularly prominent if we choose a finite-dimensional subspace 𝕌^=span(ϕ(1),,ϕ(m))^𝕌spansuperscriptbold-italic-ϕ1superscriptbold-italic-ϕ𝑚\hat{{\mathbb{U}}}=\operatorname{span}\left({\bm{\phi}}^{(1)},\dotsc,{\bm{\phi% }}^{(m)}\right)over^ start_ARG blackboard_U end_ARG = roman_span ( bold_italic_ϕ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) as in section 2.1.2. In this case, there is a bounded linear operator 𝓟m:𝕌m:subscript𝓟superscript𝑚𝕌superscript𝑚{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}\colon{\mathbb{U}}\to\mathbb{R}^{m}bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT : blackboard_U → blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT such that

𝒫𝕌^[𝐮]=i=1mciϕ(i)m𝕌^[𝐜],subscript𝒫^𝕌delimited-[]𝐮superscriptsubscript𝑖1𝑚subscriptc𝑖superscriptbold-italic-ϕ𝑖superscriptsubscriptsuperscript𝑚^𝕌delimited-[]𝐜\mathcal{P}_{\hat{{\mathbb{U}}}}[{\bm{\mathrm{u}}}]=\sum_{i=1}^{m}{\mathrm{c}}% _{i}{\bm{\phi}}^{(i)}\eqqcolon\mathcal{I}_{\mathbb{R}^{m}}^{\hat{{\mathbb{U}}}% }[{\bm{\mathrm{c}}}],caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT [ bold_u ] = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ≕ caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUPERSCRIPT [ bold_c ] ,

where the 𝐜𝓟m[𝐮]m𝐜subscript𝓟superscript𝑚delimited-[]𝐮superscript𝑚{\bm{\mathrm{c}}}\coloneqq{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{\mathrm{u}}% }]\in\mathbb{R}^{m}bold_c ≔ bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_u ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT are the coordinates of 𝒫𝕌^[𝐮]subscript𝒫^𝕌delimited-[]𝐮\mathcal{P}_{\hat{{\mathbb{U}}}}[{\bm{\mathrm{u}}}]caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT [ bold_u ] in 𝕌^^𝕌\hat{{\mathbb{U}}}over^ start_ARG blackboard_U end_ARG and m𝕌^:m𝕌^:superscriptsubscriptsuperscript𝑚^𝕌superscript𝑚^𝕌\mathcal{I}_{\mathbb{R}^{m}}^{\hat{{\mathbb{U}}}}\colon\mathbb{R}^{m}\to\hat{{% \mathbb{U}}}caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT → over^ start_ARG blackboard_U end_ARG is the canonical isomorphism between msuperscript𝑚\mathbb{R}^{m}blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT and 𝕌^^𝕌\hat{{\mathbb{U}}}over^ start_ARG blackboard_U end_ARG. Hence, we get the factorization

𝒫𝕌^=m𝕌^𝓟m,subscript𝒫^𝕌superscriptsubscriptsuperscript𝑚^𝕌subscript𝓟superscript𝑚\mathcal{P}_{\hat{{\mathbb{U}}}}=\mathcal{I}_{\mathbb{R}^{m}}^{\hat{{\mathbb{U% }}}}{\bm{\mathcal{P}}}_{\mathbb{R}^{m}},caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT = caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUPERSCRIPT bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , (3.10)

which implies that 𝐮^^𝐮\hat{{\bm{\mathrm{u}}}}over^ start_ARG bold_u end_ARG is a parametric Gaussian process. Moreover, l(i)[f(w)]=f^isuperscript𝑙𝑖delimited-[]superscript𝑓𝑤subscript^𝑓𝑖l^{(i)}[f^{(w)}]=\hat{{f}}_{i}italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ] = over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and

(l(i)𝒟(w)m𝕌^)[𝐜]=i=1mci(l(i)𝒟(w))[ϕ(i)]=(𝑫^𝐜)i,superscript𝑙𝑖superscript𝒟𝑤superscriptsubscriptsuperscript𝑚^𝕌delimited-[]𝐜superscriptsubscript𝑖1𝑚subscriptc𝑖superscript𝑙𝑖superscript𝒟𝑤delimited-[]superscriptbold-italic-ϕ𝑖subscript^𝑫𝐜𝑖(l^{(i)}\circ\mathcal{D}^{(w)}\circ\mathcal{I}_{\mathbb{R}^{m}}^{\hat{{\mathbb% {U}}}})[{\bm{\mathrm{c}}}]=\sum_{i=1}^{m}{\mathrm{c}}_{i}(l^{(i)}\circ\mathcal% {D}^{(w)})[{\bm{\phi}}^{(i)}]=(\hat{{\bm{D}}}{\bm{\mathrm{c}}})_{i},( italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∘ caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ∘ caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUPERSCRIPT ) [ bold_c ] = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∘ caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ) [ bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ] = ( over^ start_ARG bold_italic_D end_ARG bold_c ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where 𝑫^^𝑫\hat{{\bm{D}}}over^ start_ARG bold_italic_D end_ARG and 𝒇^^𝒇\hat{{\bm{f}}}over^ start_ARG bold_italic_f end_ARG are defined as in section 2.1.2. Consequently, the MWR information operator is given by l(i),𝒫𝕌^[𝐮]=(𝓘m𝒫𝕌^)[𝐮]i,subscriptsuperscript𝑙𝑖subscript𝒫^𝕌delimited-[]𝐮subscript𝓘superscript𝑚subscript𝒫^𝕌subscriptdelimited-[]𝐮𝑖\mathcal{I}_{l^{(i)},\mathcal{P}_{\hat{{\mathbb{U}}}}}[{\bm{\mathrm{u}}}]=({% \bm{\mathcal{I}}}_{\mathbb{R}^{m}}\circ\mathcal{P}_{\hat{{\mathbb{U}}}})[{\bm{% \mathrm{u}}}]_{i},caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_u ] = ( bold_caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) [ bold_u ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , where 𝓘m[𝐜]𝑫^𝐜𝒇^.subscript𝓘superscript𝑚delimited-[]𝐜^𝑫𝐜^𝒇{\bm{\mathcal{I}}}_{\mathbb{R}^{m}}[{\bm{\mathrm{c}}}]\coloneqq\hat{{\bm{D}}}{% \bm{\mathrm{c}}}-\hat{{\bm{f}}}.bold_caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_c ] ≔ over^ start_ARG bold_italic_D end_ARG bold_c - over^ start_ARG bold_italic_f end_ARG . This illustrates that we are dealing with the hierarchical model

𝐮𝐮\displaystyle{\bm{\mathrm{u}}}bold_u 𝒢𝒫(𝒎,𝒌)similar-toabsent𝒢𝒫𝒎𝒌\displaystyle\sim{\operatorname{\mathcal{GP}}\left({\bm{m}},{\bm{k}}\right)}∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m , bold_italic_k )
𝐜\nonscript|\nonscript𝐮conditional𝐜\nonscript\nonscript𝐮\displaystyle{\bm{\mathrm{c}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{% \bm{\mathrm{u}}}bold_c | bold_u δ𝓟m[𝐮]similar-toabsentsubscript𝛿subscript𝓟superscript𝑚delimited-[]𝐮\displaystyle\sim\delta_{{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{\mathrm{u}}}]}∼ italic_δ start_POSTSUBSCRIPT bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_u ] end_POSTSUBSCRIPT

with observations 𝓘m[𝐜]=𝟎,subscript𝓘superscript𝑚delimited-[]𝐜0{\bm{\mathcal{I}}}_{\mathbb{R}^{m}}[{\bm{\mathrm{c}}}]={\bm{0}},bold_caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_c ] = bold_0 , where 𝐜𝒩(𝓟m[𝒎],𝓟m𝒌𝓟m).similar-to𝐜𝒩subscript𝓟superscript𝑚delimited-[]𝒎subscript𝓟superscript𝑚𝒌superscriptsubscript𝓟superscript𝑚{\bm{\mathrm{c}}}\sim{\operatorname{\mathcal{N}}\left({\bm{\mathcal{P}}}_{% \mathbb{R}^{m}}[{\bm{m}}],{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}{\bm{k}}{\bm{% \mathcal{P}}}_{\mathbb{R}^{m}}^{\prime}\right)}.bold_c ∼ caligraphic_N ( bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ] , bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_italic_k bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) . Inference in this model can be broken down into two steps. First, we update our belief about the solution’s coordinates in 𝕌^^𝕌\hat{{\mathbb{U}}}over^ start_ARG blackboard_U end_ARG by computing the conditional random variable 𝐜\nonscript|\nonscript𝓘m[𝐜]=𝟎,conditional𝐜\nonscript\nonscriptsubscript𝓘superscript𝑚delimited-[]𝐜0{\bm{\mathrm{c}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{\mathcal{% I}}}_{\mathbb{R}^{m}}[{\bm{\mathrm{c}}}]={\bm{0}},bold_c | bold_caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_c ] = bold_0 , which is also Gaussian. If 𝑫^^𝑫\hat{{\bm{D}}}over^ start_ARG bold_italic_D end_ARG is invertible and 𝐜𝐜{\bm{\mathrm{c}}}bold_c has full support on msuperscript𝑚\mathbb{R}^{m}blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, then the law of 𝐜\nonscript|\nonscript𝓘m[𝐜]=𝟎,conditional𝐜\nonscript\nonscriptsubscript𝓘superscript𝑚delimited-[]𝐜0{\bm{\mathrm{c}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{\mathcal{% I}}}_{\mathbb{R}^{m}}[{\bm{\mathrm{c}}}]={\bm{0}},bold_c | bold_caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_c ] = bold_0 , is a Dirac measure whose mean is given by the coordinates of the MWR approximation 𝒄MWR=𝑫^1𝒇^superscript𝒄MWRsuperscript^𝑫1^𝒇{\bm{c}}^{\mathrm{MWR}}=\hat{{\bm{D}}}^{-1}\hat{{\bm{f}}}bold_italic_c start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT = over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_f end_ARG from equation 2.8. Next, we can reuse precomputed quantities from the conditional moments of 𝐜\nonscript|\nonscript𝓘m[𝐜]=𝟎,conditional𝐜\nonscript\nonscriptsubscript𝓘superscript𝑚delimited-[]𝐜0{\bm{\mathrm{c}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{\mathcal{% I}}}_{\mathbb{R}^{m}}[{\bm{\mathrm{c}}}]={\bm{0}},bold_c | bold_caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_c ] = bold_0 , such as the representer weights 𝒘=(𝑫^𝓟m𝒌𝓟m𝑫^)(𝒇^𝑫^𝓟m[𝒎])𝒘superscript^𝑫subscript𝓟superscript𝑚𝒌superscriptsubscript𝓟superscript𝑚superscript^𝑫top^𝒇^𝑫subscript𝓟superscript𝑚delimited-[]𝒎{\bm{w}}=(\hat{{\bm{D}}}{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}{\bm{k}}{\bm{% \mathcal{P}}}_{\mathbb{R}^{m}}^{\prime}\hat{{\bm{D}}}^{\top})^{\dagger}(\hat{{% \bm{f}}}-\hat{{\bm{D}}}{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{m}}])bold_italic_w = ( over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_italic_k bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_f end_ARG - over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ] ) to efficiently compute the conditional random process

(𝐮\nonscript|\nonscript(𝓘m𝓟m)[𝐜]=𝟎)=(𝐮\nonscript|\nonscript{l(i),𝒫𝕌^[𝐮]=0}i=1n),conditional𝐮\nonscript\nonscriptsubscript𝓘superscript𝑚subscript𝓟superscript𝑚delimited-[]𝐜0conditional𝐮\nonscript\nonscriptsuperscriptsubscriptsubscriptsuperscript𝑙𝑖subscript𝒫^𝕌delimited-[]𝐮0𝑖1𝑛({\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}({\bm{% \mathcal{I}}}_{\mathbb{R}^{m}}\circ{\bm{\mathcal{P}}}_{\mathbb{R}^{m}})[{\bm{% \mathrm{c}}}]={\bm{0}})=({\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>% \mathopen{}\{\mathcal{I}_{l^{(i)},\mathcal{P}_{\hat{{\mathbb{U}}}}}[{\bm{% \mathrm{u}}}]=0\}_{i=1}^{n}),( bold_u | ( bold_caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∘ bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) [ bold_c ] = bold_0 ) = ( bold_u | { caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_u ] = 0 } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ,

i.e. the main quantity of interest. Assuming once more that 𝑫^^𝑫\hat{{\bm{D}}}over^ start_ARG bold_italic_D end_ARG is invertible and 𝐜𝐜{\bm{\mathrm{c}}}bold_c has full support on msuperscript𝑚\mathbb{R}^{m}blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, the remaining uncertainty of the conditional process lies in the kernel of 𝒫𝕌^subscript𝒫^𝕌\mathcal{P}_{\hat{{\mathbb{U}}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT, since the law of 𝐜\nonscript|\nonscript𝓘m[𝐜]=𝟎,conditional𝐜\nonscript\nonscriptsubscript𝓘superscript𝑚delimited-[]𝐜0{\bm{\mathrm{c}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{\mathcal{% I}}}_{\mathbb{R}^{m}}[{\bm{\mathrm{c}}}]={\bm{0}},bold_c | bold_caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_c ] = bold_0 , is a Dirac measure and

(𝒫𝕌^[𝐮]\nonscript|\nonscript{l(i),𝒫𝕌^[𝐮]=0}i=1n)=(m𝕌^[𝐜]\nonscript|\nonscript𝓘m[𝐜]=𝟎).conditionalsubscript𝒫^𝕌delimited-[]𝐮\nonscript\nonscriptsuperscriptsubscriptsubscriptsuperscript𝑙𝑖subscript𝒫^𝕌delimited-[]𝐮0𝑖1𝑛conditionalsuperscriptsubscriptsuperscript𝑚^𝕌delimited-[]𝐜\nonscript\nonscriptsubscript𝓘superscript𝑚delimited-[]𝐜0(\mathcal{P}_{\hat{{\mathbb{U}}}}[{\bm{\mathrm{u}}}]\nonscript\>|\allowbreak% \nonscript\>\mathopen{}\{\mathcal{I}_{l^{(i)},\mathcal{P}_{\hat{{\mathbb{U}}}}% }[{\bm{\mathrm{u}}}]=0\}_{i=1}^{n})=(\mathcal{I}_{\mathbb{R}^{m}}^{\hat{{% \mathbb{U}}}}[{\bm{\mathrm{c}}}]\nonscript\>|\allowbreak\nonscript\>\mathopen{% }{\bm{\mathcal{I}}}_{\mathbb{R}^{m}}[{\bm{\mathrm{c}}}]={\bm{0}}).( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT [ bold_u ] | { caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_u ] = 0 } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ( caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUPERSCRIPT [ bold_c ] | bold_caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_c ] = bold_0 ) .

Thus, all remaining uncertainty must be due to (id𝕌𝒫𝕌^)[𝐮]\nonscript|\nonscript{l(i),𝒫𝕌^[𝐮]=0}i=1n.conditionalsubscriptid𝕌subscript𝒫^𝕌delimited-[]𝐮\nonscript\nonscriptsuperscriptsubscriptsubscriptsuperscript𝑙𝑖subscript𝒫^𝕌delimited-[]𝐮0𝑖1𝑛(\operatorname{id}_{{\mathbb{U}}}-\mathcal{P}_{\hat{{\mathbb{U}}}})[{\bm{% \mathrm{u}}}]\nonscript\>|\allowbreak\nonscript\>\mathopen{}\{\mathcal{I}_{l^{% (i)},\mathcal{P}_{\hat{{\mathbb{U}}}}}[{\bm{\mathrm{u}}}]=0\}_{i=1}^{n}.( roman_id start_POSTSUBSCRIPT blackboard_U end_POSTSUBSCRIPT - caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) [ bold_u ] | { caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_u ] = 0 } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT . Note the striking similarity of this property to the notion of Galerkin orthogonality (Logg et al., 2012, Equation 2.63).

A canonical choice for 𝒫𝕌^subscript𝒫^𝕌\mathcal{P}_{\hat{{\mathbb{U}}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT would arguably be an orthogonal projection w.r.t. the RKHS inner product of the sample space of 𝐮𝐮{\bm{\mathrm{u}}}bold_u (see e.g. Kanagawa et al. 2018). However, this inner product is generally difficult to compute. Fortunately, we can use the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT inner products or Sobolev inner products on the samples to induce a (usually non-orthogonal) projection 𝒫𝕌^subscript𝒫^𝕌\mathcal{P}_{\hat{{\mathbb{U}}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT.

Example 6.

If the elements of 𝕌𝕌{\mathbb{U}}blackboard_U are square-integrable, then the linear operator

𝓟m[𝐮]𝑷1(𝔻ϕ(i)(𝒙),𝐮(𝒙)dd𝒙)i=1m,subscript𝓟superscript𝑚delimited-[]𝐮superscript𝑷1superscriptsubscriptsubscript𝔻subscriptsuperscriptbold-italic-ϕ𝑖𝒙𝐮𝒙superscript𝑑differential-d𝒙𝑖1𝑚{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{\mathrm{u}}}]\coloneqq{\bm{P}}^{-1}% \left(\int_{{\mathbb{D}}}\langle{\bm{\phi}}^{(i)}({\bm{x}}),{\bm{\mathrm{u}}}(% {\bm{x}})\rangle_{\mathbb{R}^{d}}\,\mathrm{d}{\bm{x}}\right)_{i=1}^{m},bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_u ] ≔ bold_italic_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ∫ start_POSTSUBSCRIPT blackboard_D end_POSTSUBSCRIPT ⟨ bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( bold_italic_x ) , bold_u ( bold_italic_x ) ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_d bold_italic_x ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ,

where

PijDϕ(i)(𝒙),ϕ(j)(𝒙)dd𝒙,subscript𝑃𝑖𝑗subscript𝐷subscriptsuperscriptbold-italic-ϕ𝑖𝒙superscriptbold-italic-ϕ𝑗𝒙superscript𝑑differential-d𝒙{P}_{ij}\coloneqq\int_{D}\langle{\bm{\phi}}^{(i)}({\bm{x}}),{\bm{\phi}}^{(j)}(% {\bm{x}})\rangle_{\mathbb{R}^{d}}\,\mathrm{d}{\bm{x}},italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ≔ ∫ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ⟨ bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( bold_italic_x ) , bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ( bold_italic_x ) ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_d bold_italic_x ,

induces a projection 𝒫𝕌^=m𝕌^𝓟msubscript𝒫^𝕌superscriptsubscriptsuperscript𝑚^𝕌subscript𝓟superscript𝑚\mathcal{P}_{\hat{{\mathbb{U}}}}=\mathcal{I}_{\mathbb{R}^{m}}^{\hat{{\mathbb{U% }}}}{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT = caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUPERSCRIPT bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT onto 𝕌^𝕌^𝕌𝕌\hat{{\mathbb{U}}}\subset{\mathbb{U}}over^ start_ARG blackboard_U end_ARG ⊂ blackboard_U, even if 𝕌𝕌{\mathbb{U}}blackboard_U is not a Hilbert space with inner product ,L2(𝔻)subscriptsubscript𝐿2𝔻\langle\cdot,\cdot\rangle_{L_{2}\left({\mathbb{D}}\right)}⟨ ⋅ , ⋅ ⟩ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( blackboard_D ) end_POSTSUBSCRIPT.

At first glance, information operators restricting 𝕌^^𝕌\hat{{\mathbb{U}}}over^ start_ARG blackboard_U end_ARG to be finite-dimensional might seem fundamentally inferior to the information operators from section 3.3.1. However, the conditional mean of a Gaussian process prior conditioned on {l(i)[𝐮]=0}i=1nsuperscriptsubscriptsubscriptsuperscript𝑙𝑖delimited-[]𝐮0𝑖1𝑛\{\mathcal{I}_{l^{(i)}}[{\bm{\mathrm{u}}}]=0\}_{i=1}^{n}{ caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_u ] = 0 } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is updated by a linear combination of n𝑛nitalic_n functions, while the covariance function receives an at most rank n𝑛nitalic_n downdate. This means that, implicitly, conditioning a Gaussian process on an information operator with 𝒫𝕌^=id𝕌subscript𝒫^𝕌subscriptid𝕌\mathcal{P}_{\hat{{\mathbb{U}}}}=\operatorname{id}_{{\mathbb{U}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT = roman_id start_POSTSUBSCRIPT blackboard_U end_POSTSUBSCRIPT also constructs a finite-dimensional trial function space, which depends on the test function basis, the bilinear form B𝐵Bitalic_B and the prior covariance function 𝒌𝒌{\bm{k}}bold_italic_k.

MWR information operators with finite-dimensional trial function bases can be used to realize a GP-based analogue of the finite element method.

Example 7 (A 1D Finite Element Method).

Finite element methods are (generalized) Galerkin methods, where the functions in the test and trial bases have compact support, i.e. they are nonzero only in a highly localized region of the domain. The archetype of a finite element method for the weak formulation from section 2.1.1 uses linear Lagrange elements (Logg et al., 2012, Section 3.3.1) as test and trial functions, i.e. ϕ(i)(x)=ψ(i)(x)superscriptitalic-ϕ𝑖𝑥superscript𝜓𝑖𝑥\phi^{(i)}(x)=\psi^{(i)}(x)italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_x ) = italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_x ) and m=n𝑚𝑛m=nitalic_m = italic_n. Linear Lagrange elements are piecewise linear on a triangulation of the domain. For instance, on a one-dimensional domain 𝔻=]1,1[{\mathbb{D}}=]-1,1[blackboard_D = ] - 1 , 1 [, the linear Lagrange elements are given by equation 3.9 from example 5. Multiplying a coordinate vector 𝐜m𝐜superscript𝑚{\bm{c}}\in\mathbb{R}^{m}bold_italic_c ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT with these basis functions leads to a piecewise linear interpolation between the points

(x0,0),(x1,c1),,(xn,cn),(xn+1,0),subscript𝑥00subscript𝑥1subscript𝑐1subscript𝑥𝑛subscript𝑐𝑛subscript𝑥𝑛10(x_{0},0),(x_{1},{c}_{1}),\dotsc,(x_{n},{c}_{n}),(x_{n+1},0),( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , 0 ) , ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT , 0 ) ,

since, for x[xi,xi+1]𝑥subscript𝑥𝑖subscript𝑥𝑖1x\in[x_{i},x_{i+1}]italic_x ∈ [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ],

i=1mciϕ(i)(x)=cixi+1xxi+1xi+ci+1xxixi+1xi=(1xxixi+1xi)ci+(xxixi+1xi)ci+1.superscriptsubscript𝑖1𝑚subscript𝑐𝑖superscriptitalic-ϕ𝑖𝑥subscript𝑐𝑖subscript𝑥𝑖1𝑥subscript𝑥𝑖1subscript𝑥𝑖subscript𝑐𝑖1𝑥subscript𝑥𝑖subscript𝑥𝑖1subscript𝑥𝑖1𝑥subscript𝑥𝑖subscript𝑥𝑖1subscript𝑥𝑖subscript𝑐𝑖𝑥subscript𝑥𝑖subscript𝑥𝑖1subscript𝑥𝑖subscript𝑐𝑖1\sum_{i=1}^{m}{c}_{i}\phi^{(i)}(x)={c}_{i}\frac{x_{i+1}-x}{x_{i+1}-x_{i}}+{c}_% {i+1}\frac{x-x_{i}}{x_{i+1}-x_{i}}=\left(1-\frac{x-x_{i}}{x_{i+1}-x_{i}}\right% ){c}_{i}+\left(\frac{x-x_{i}}{x_{i+1}-x_{i}}\right){c}_{i+1}.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_x ) = italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + italic_c start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT divide start_ARG italic_x - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = ( 1 - divide start_ARG italic_x - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( divide start_ARG italic_x - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) italic_c start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT .

The basis functions and an element in their span are visualized in figure 8. The Lagrange elements at the boundary of the domain can also be easily modified such that arbitrary piecewise linear boundary conditions are fulfilled by construction. The effect of MWR information operators based on this set of test and trial functions is visualized in figure 9(a).

Refer to caption
(a) Test/trial functions ϕ(i)=ψ(i)superscriptitalic-ϕ𝑖superscript𝜓𝑖\phi^{(i)}=\psi^{(i)}italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT.
Refer to caption
(b) The trial functions ϕ(i)superscriptitalic-ϕ𝑖\phi^{(i)}italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT span the space of piecewise linear functions on the given grid.
Figure 8: Linear Lagrange test and trial functions as used by the finite element method.
Refer to caption
(a) Posterior process corresponding to a Matérn-3/232\nicefrac{{3}}{{2}}/ start_ARG 3 end_ARG start_ARG 2 end_ARG prior. The sample paths of the process embed continuously into the Sobolev space H1(𝔻)superscript𝐻1𝔻H^{1}\left({\mathbb{D}}\right)italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( blackboard_D ) (see section B.4).
Refer to caption
(b) Posterior process corresponding to an MWR Recovery Prior constructed from a Matérn-3/232\nicefrac{{3}}{{2}}/ start_ARG 3 end_ARG start_ARG 2 end_ARG prior via proposition 5. The posterior mean corresponds to the point estimate produced by the classical MWR.
Figure 9: Conditioning two different Gaussian process priors on the MWR information operators {ψ(i),𝒫𝕌^}i=1nsuperscriptsubscriptsubscriptsuperscript𝜓𝑖subscript𝒫^𝕌𝑖1𝑛\{\mathcal{I}_{\psi^{(i)},\mathcal{P}_{\hat{{\mathbb{U}}}}}\}_{i=1}^{n}{ caligraphic_I start_POSTSUBSCRIPT italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT corresponding to the weak formulation of the Poisson equation, i.e. equation 2.3, and m=3𝑚3m=3italic_m = 3 linear Lagrange elements as test functions ψ(i)superscript𝜓𝑖\psi^{(i)}italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT and trial functions ϕ(i)superscriptitalic-ϕ𝑖\phi^{(i)}italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT (see example 7). The trial functions ϕ(1)superscriptitalic-ϕ1\phi^{(1)}italic_ϕ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT and ϕ(m)superscriptitalic-ϕ𝑚\phi^{(m)}italic_ϕ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT were modified to fulfill the non-zero boundary conditions exactly.

3.3.3 MWR Information Operators

Even though the class of information operators introduced above is constructed for linear PDEs, it can naturally be applied to the weak form of an arbitrary operator equation. In particular, we can use MWR information operators for the boundary conditions in an (I)BVP. Moreover, it is straightforward to extend l,𝒫𝕌^subscript𝑙subscript𝒫^𝕌\mathcal{I}_{l,\mathcal{P}_{\hat{{\mathbb{U}}}}}caligraphic_I start_POSTSUBSCRIPT italic_l , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT to a joint GP prior over (𝐮,f(w))𝐮superscriptf𝑤({\bm{\mathrm{u}}},{\mathrm{f}}^{(w)})( bold_u , roman_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ) if the right-hand side f(w)superscript𝑓𝑤f^{(w)}italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT of the operator equation is unknown as in section 2.1. In this case, l,𝒫𝕌^subscript𝑙subscript𝒫^𝕌\mathcal{I}_{l,\mathcal{P}_{\hat{{\mathbb{U}}}}}caligraphic_I start_POSTSUBSCRIPT italic_l , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT is jointly linear in (𝐮,f(w))𝐮superscriptf𝑤({\bm{\mathrm{u}}},{\mathrm{f}}^{(w)})( bold_u , roman_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ). Summarizing sections 3.3.1 and 3.3.2 and incorporating the extensions discussed here, we define an MWR information operator as follows:

Definition 2 (MWR Information Operator).

Let 𝒟(w)[𝐮]=f(w)superscript𝒟𝑤delimited-[]𝐮superscript𝑓𝑤\mathcal{D}^{(w)}[{\bm{\mathrm{u}}}]=f^{(w)}caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT [ bold_u ] = italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT be an operator equation in strong or weak formulation. An MWR information operator for said operator equation is a continuous affine functional

l,𝒫𝕌^(l𝒟(w)𝒫𝕌^)[]l[f(w)]subscript𝑙subscript𝒫^𝕌𝑙superscript𝒟𝑤subscript𝒫^𝕌delimited-[]𝑙delimited-[]superscript𝑓𝑤\mathcal{I}_{l,\mathcal{P}_{\hat{{\mathbb{U}}}}}\coloneqq(l\circ\mathcal{D}^{(% w)}\circ\mathcal{P}_{\hat{{\mathbb{U}}}})[\cdot]-l[f^{(w)}]caligraphic_I start_POSTSUBSCRIPT italic_l , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≔ ( italic_l ∘ caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ∘ caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) [ ⋅ ] - italic_l [ italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ]

parameterized by a test functional l𝕃(w)𝑙superscript𝕃𝑤l\in{\mathbb{L}}^{(w)}italic_l ∈ blackboard_L start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT and a bounded (potentially oblique) projection 𝒫𝕌^subscript𝒫^𝕌\mathcal{P}_{\hat{{\mathbb{U}}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT onto a subspace 𝕌^𝕌^𝕌𝕌\hat{{\mathbb{U}}}\subset{\mathbb{U}}over^ start_ARG blackboard_U end_ARG ⊂ blackboard_U. We also write ll,id𝕌subscript𝑙subscript𝑙subscriptid𝕌\mathcal{I}_{l}\coloneqq\mathcal{I}_{l,\operatorname{id}_{{\mathbb{U}}}}caligraphic_I start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ≔ caligraphic_I start_POSTSUBSCRIPT italic_l , roman_id start_POSTSUBSCRIPT blackboard_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT. The input of l,𝒫𝕌^subscript𝑙subscript𝒫^𝕌\mathcal{I}_{l,\mathcal{P}_{\hat{{\mathbb{U}}}}}caligraphic_I start_POSTSUBSCRIPT italic_l , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT can be extended to the right-hand side f(w)superscriptf𝑤{\mathrm{f}}^{(w)}roman_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT of the operator equation, i.e.

l,𝒫𝕌^[(𝐮,f(w))](l𝒟(w)𝒫𝕌^)[𝐮]l[f(w)],subscript𝑙subscript𝒫^𝕌delimited-[]𝐮superscriptf𝑤𝑙superscript𝒟𝑤subscript𝒫^𝕌delimited-[]𝐮𝑙delimited-[]superscriptf𝑤\mathcal{I}_{l,\mathcal{P}_{\hat{{\mathbb{U}}}}}[({\bm{\mathrm{u}}},{\mathrm{f% }}^{(w)})]\coloneqq(l\circ\mathcal{D}^{(w)}\circ\mathcal{P}_{\hat{{\mathbb{U}}% }})[{\bm{\mathrm{u}}}]-l[{\mathrm{f}}^{(w)}],caligraphic_I start_POSTSUBSCRIPT italic_l , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( bold_u , roman_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ) ] ≔ ( italic_l ∘ caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ∘ caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) [ bold_u ] - italic_l [ roman_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ] ,

which is jointly linear in (𝐮,f(w))𝐮superscriptf𝑤({\bm{\mathrm{u}}},{\mathrm{f}}^{(w)})( bold_u , roman_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ).

Method Trial Functions ϕ(i)superscriptbold-italic-ϕ𝑖{\bm{\phi}}^{(i)}bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT Test Functionals l(i)superscript𝑙𝑖l^{(i)}italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT / Functions ψ(i)superscript𝜓𝑖\psi^{(i)}italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT
Strong Solutions Collocation arbitrary l(i)=δ𝒙isuperscript𝑙𝑖subscript𝛿subscript𝒙𝑖l^{(i)}=\delta_{{\bm{x}}_{i}}italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT for 𝒙i𝔻subscript𝒙𝑖𝔻{\bm{x}}_{i}\in{\mathbb{D}}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_D
(l(i)𝒟)[𝐮]=𝒟[𝐮](𝒙i)absentsuperscript𝑙𝑖𝒟delimited-[]𝐮𝒟delimited-[]𝐮subscript𝒙𝑖\Rightarrow(l^{(i)}\circ\mathcal{D})[{\bm{\mathrm{u}}}]=\mathcal{D}[{\bm{% \mathrm{u}}}]({\bm{x}}_{i})⇒ ( italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∘ caligraphic_D ) [ bold_u ] = caligraphic_D [ bold_u ] ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
Subdomain
(Finite Volume)
arbitrary ψ(i)=χ𝔻isuperscript𝜓𝑖subscript𝜒subscript𝔻𝑖\psi^{(i)}=\chi_{{\mathbb{D}}_{i}}italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_χ start_POSTSUBSCRIPT blackboard_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT for 𝔻i𝔻subscript𝔻𝑖𝔻{\mathbb{D}}_{i}\subset{\mathbb{D}}blackboard_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊂ blackboard_D
(l(i)𝒟)[𝐮]=𝔻i𝒟[𝐮](𝒙)d𝒙absentsuperscript𝑙𝑖𝒟delimited-[]𝐮subscriptsubscript𝔻𝑖𝒟delimited-[]𝐮𝒙differential-d𝒙\Rightarrow(l^{(i)}\circ\mathcal{D})[{\bm{\mathrm{u}}}]=\int_{{\mathbb{D}}_{i}% }\mathcal{D}[{\bm{\mathrm{u}}}]({\bm{x}})\,\mathrm{d}{\bm{x}}⇒ ( italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∘ caligraphic_D ) [ bold_u ] = ∫ start_POSTSUBSCRIPT blackboard_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_D [ bold_u ] ( bold_italic_x ) roman_d bold_italic_x
Pseudospectral orthogonal and globally supported (e.g. Fourier basis or Chebychev polynomials) l(i)=δ𝒙isuperscript𝑙𝑖subscript𝛿subscript𝒙𝑖l^{(i)}=\delta_{{\bm{x}}_{i}}italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT for 𝒙i𝔻subscript𝒙𝑖𝔻{\bm{x}}_{i}\in{\mathbb{D}}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_D
(l(i)𝒟)[𝐮]=𝒟[𝐮](𝒙i)absentsuperscript𝑙𝑖𝒟delimited-[]𝐮𝒟delimited-[]𝐮subscript𝒙𝑖\Rightarrow(l^{(i)}\circ\mathcal{D})[{\bm{\mathrm{u}}}]=\mathcal{D}[{\bm{% \mathrm{u}}}]({\bm{x}}_{i})⇒ ( italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∘ caligraphic_D ) [ bold_u ] = caligraphic_D [ bold_u ] ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
Weak & Strong Solutions Generalized Galerkin arbitrary arbitrary, but in general ψ(i)ϕ(i)superscript𝜓𝑖superscriptitalic-ϕ𝑖\psi^{(i)}\neq\phi^{(i)}italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ≠ italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT
Finite Element locally supported (e.g. piecewise polynomial) same class as trial functions, but in general ψ(i)ϕ(i)superscript𝜓𝑖superscriptitalic-ϕ𝑖\psi^{(i)}\neq\phi^{(i)}italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ≠ italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT
Spectral (Galerkin) orthogonal and globally supported (e.g. Fourier basis or Chebychev polynomials) same class as trial functions, but in general ψ(i)ϕ(i)superscript𝜓𝑖superscriptitalic-ϕ𝑖\psi^{(i)}\neq\phi^{(i)}italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ≠ italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT
(Ritz-)Galerkin arbitrary ψ(i)=ϕ(i)superscript𝜓𝑖superscriptitalic-ϕ𝑖\psi^{(i)}=\phi^{(i)}italic_ψ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT
Table 1: Trial and test function(al)s defining commonly used methods of weighted residuals. If used as part of an MWR information operator, the GP posterior mean recovers the corresponding classic method (see corollary 4).

3.3.4 Recovery of Classical Methods

In this section we will show that, under certain assumptions, the posterior mean of a GP prior conditioned on a set of MWR information operators is identical to the approximation generated by the corresponding traditional method of weighted residuals, examples of which are given in Table 1. More precisely, we will show that there is a flexible family of GP priors 𝐮𝒢𝒫(𝒎,𝒌)similar-to𝐮𝒢𝒫𝒎𝒌{\bm{\mathrm{u}}}\sim{\operatorname{\mathcal{GP}}\left({\bm{m}},{\bm{k}}\right)}bold_u ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m , bold_italic_k ) whose posterior means after conditioning on {l(i),𝒫𝕌^}i=1msuperscriptsubscriptsubscriptsuperscript𝑙𝑖subscript𝒫^𝕌𝑖1𝑚\{\mathcal{I}_{l^{(i)},\mathcal{P}_{\hat{{\mathbb{U}}}}}\}_{i=1}^{m}{ caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT are identical to the corresponding classical MWR approximation 𝒄MWRsuperscript𝒄MWR{\bm{c}}^{\mathrm{MWR}}bold_italic_c start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT to the solution of the same weak form linear PDE, where we use the same trial functions ϕ(1),,ϕ(m)superscriptbold-italic-ϕ1superscriptbold-italic-ϕ𝑚{\bm{\phi}}^{(1)},\dotsc,{\bm{\phi}}^{(m)}bold_italic_ϕ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT and test functionals l(1),,l(n)superscript𝑙1superscript𝑙𝑛l^{(1)},\dotsc,l^{(n)}italic_l start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_l start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT in both cases, i.e. 𝕌^=span(ϕ(1),,ϕ(m))^𝕌spansuperscriptbold-italic-ϕ1superscriptbold-italic-ϕ𝑚\hat{{\mathbb{U}}}=\operatorname{span}\left({\bm{\phi}}^{(1)},\dotsc,{\bm{\phi% }}^{(m)}\right)over^ start_ARG blackboard_U end_ARG = roman_span ( bold_italic_ϕ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ). As in section 2.1.2, we assume that the trial functions are already constructed in such a way that the boundary conditions are fulfilled. However, it is possible to extend the results below to the general case by adding MWR information operators corresponding to the boundary conditions and using

𝒄MWR=(𝑫^PDE𝑫^BC)1(𝒇^PDE𝒇^BC)superscript𝒄MWRsuperscriptmatrixsubscript^𝑫PDEsubscript^𝑫BC1matrixsubscript^𝒇PDEsubscript^𝒇BC{\bm{c}}^{\mathrm{MWR}}=\begin{pmatrix}\hat{{\bm{D}}}_{\text{PDE}}\\ \hat{{\bm{D}}}_{\text{BC}}\end{pmatrix}^{-1}\begin{pmatrix}\hat{{\bm{f}}}_{% \text{PDE}}\\ \hat{{\bm{f}}}_{\text{BC}}\end{pmatrix}bold_italic_c start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT = ( start_ARG start_ROW start_CELL over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT BC end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT BC end_POSTSUBSCRIPT end_CELL end_ROW end_ARG )

as coordinates for the reference solution generated by the traditional MWR.

Proposition 3.

If 𝐃^n×m^𝐃superscript𝑛𝑚\hat{{\bm{D}}}\in\mathbb{R}^{n\times m}over^ start_ARG bold_italic_D end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_m end_POSTSUPERSCRIPT and 𝚺𝐜𝓟m𝐤𝓟mm×msubscript𝚺𝐜subscript𝓟superscript𝑚𝐤superscriptsubscript𝓟superscript𝑚superscript𝑚𝑚{\bm{\Sigma}}_{\bm{\mathrm{c}}}\coloneqq{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}{% \bm{k}}{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}^{\prime}\in\mathbb{R}^{m\times m}bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT ≔ bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_italic_k bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT are invertible, then

𝐜\nonscript|\nonscript𝑫^𝐜𝒇^=𝟎δ𝒄MWRconditional𝐜\nonscript\nonscript^𝑫𝐜^𝒇0similar-tosubscript𝛿superscript𝒄MWR{\bm{\mathrm{c}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\hat{{\bm{D}}}% {\bm{\mathrm{c}}}-\hat{{\bm{f}}}={\bm{0}}\sim\delta_{{\bm{c}}^{\mathrm{MWR}}}bold_c | over^ start_ARG bold_italic_D end_ARG bold_c - over^ start_ARG bold_italic_f end_ARG = bold_0 ∼ italic_δ start_POSTSUBSCRIPT bold_italic_c start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT end_POSTSUBSCRIPT

and the conditional mean 𝐦𝐮\nonscript|\nonscript𝐃^,𝐟^superscript𝐦conditional𝐮\nonscript\nonscript^𝐃^𝐟{\bm{m}}^{{\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\hat% {{\bm{D}}},\hat{{\bm{f}}}}bold_italic_m start_POSTSUPERSCRIPT bold_u | over^ start_ARG bold_italic_D end_ARG , over^ start_ARG bold_italic_f end_ARG end_POSTSUPERSCRIPT of 𝐮\nonscript|\nonscript𝐃^𝓟m[𝐮]𝐟^=𝟎conditional𝐮\nonscript\nonscript^𝐃subscript𝓟superscript𝑚delimited-[]𝐮^𝐟0{\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\hat{{\bm{D}}}% {\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{\mathrm{u}}}]-\hat{{\bm{f}}}={\bm{0}}bold_u | over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_u ] - over^ start_ARG bold_italic_f end_ARG = bold_0 admits a unique additive decomposition

𝒎𝐮\nonscript|\nonscript𝑫^,𝒇^=𝒖MWR+𝒖ker(𝒫𝕌^)superscript𝒎conditional𝐮\nonscript\nonscript^𝑫^𝒇superscript𝒖MWRsubscript𝒖kersubscript𝒫^𝕌{\bm{m}}^{{\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\hat% {{\bm{D}}},\hat{{\bm{f}}}}={\bm{u}}^{\mathrm{MWR}}+{\bm{u}}_{\operatorname{ker% }(\mathcal{P}_{\hat{{\mathbb{U}}}})}bold_italic_m start_POSTSUPERSCRIPT bold_u | over^ start_ARG bold_italic_D end_ARG , over^ start_ARG bold_italic_f end_ARG end_POSTSUPERSCRIPT = bold_italic_u start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT + bold_italic_u start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT (3.11)

with 𝐮MWR𝕌^superscript𝐮MWR^𝕌{\bm{u}}^{\mathrm{MWR}}\in\hat{{\mathbb{U}}}bold_italic_u start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT ∈ over^ start_ARG blackboard_U end_ARG and 𝐮ker(𝒫𝕌^)ker(𝒫𝕌^)subscript𝐮kersubscript𝒫^𝕌kersubscript𝒫^𝕌{\bm{u}}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})}\in% \operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})bold_italic_u start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∈ roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ).

Corollary 4 (MWR Generalization).

If, additionally, 𝐦𝕌^𝐦^𝕌{\bm{m}}\in\hat{{\mathbb{U}}}bold_italic_m ∈ over^ start_ARG blackboard_U end_ARG and 𝒫ker(𝒫𝕌^)𝐤𝓟m=𝟎subscript𝒫kersubscript𝒫^𝕌𝐤superscriptsubscript𝓟superscript𝑚0\mathcal{P}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})}{\bm{k}}{\bm% {\mathcal{P}}}_{\mathbb{R}^{m}}^{\prime}={\bm{0}}caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT bold_italic_k bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = bold_0, then the conditional mean function 𝐦𝐮\nonscript|\nonscript𝐃^,𝐟^superscript𝐦conditional𝐮\nonscript\nonscript^𝐃^𝐟{\bm{m}}^{{\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\hat% {{\bm{D}}},\hat{{\bm{f}}}}bold_italic_m start_POSTSUPERSCRIPT bold_u | over^ start_ARG bold_italic_D end_ARG , over^ start_ARG bold_italic_f end_ARG end_POSTSUPERSCRIPT is equal to the MWR approximation, i.e.

𝒎𝐮\nonscript|\nonscript𝑫^,𝒇^=𝒖MWR.superscript𝒎conditional𝐮\nonscript\nonscript^𝑫^𝒇superscript𝒖MWR{\bm{m}}^{{\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\hat% {{\bm{D}}},\hat{{\bm{f}}}}={\bm{u}}^{\mathrm{MWR}}.bold_italic_m start_POSTSUPERSCRIPT bold_u | over^ start_ARG bold_italic_D end_ARG , over^ start_ARG bold_italic_f end_ARG end_POSTSUPERSCRIPT = bold_italic_u start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT .

It turns out that it is possible to transform any admissible GP prior over the (weak) solution of the PDE into a prior that fulfills the assumptions of corollary 4.

Proposition 5 (MWR Recovery Prior).

Let 𝐮~𝒢𝒫(𝐦~,𝐤~)similar-to~𝐮𝒢𝒫~𝐦~𝐤\tilde{{\bm{\mathrm{u}}}}\sim{\operatorname{\mathcal{GP}}\left(\tilde{{\bm{m}}% },\tilde{{\bm{k}}}\right)}over~ start_ARG bold_u end_ARG ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( over~ start_ARG bold_italic_m end_ARG , over~ start_ARG bold_italic_k end_ARG ) with mean and sample paths in 𝕌𝕌{\mathbb{U}}blackboard_U. Then 𝐮𝒢𝒫(𝐦,𝐤)similar-to𝐮𝒢𝒫𝐦𝐤{\bm{\mathrm{u}}}\sim{\operatorname{\mathcal{GP}}\left({\bm{m}},{\bm{k}}\right)}bold_u ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m , bold_italic_k ) with 𝐦𝒫𝕌^[𝐦~]𝐦subscript𝒫^𝕌delimited-[]~𝐦{\bm{m}}\coloneqq\mathcal{P}_{\hat{{\mathbb{U}}}}[\tilde{{\bm{m}}}]bold_italic_m ≔ caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT [ over~ start_ARG bold_italic_m end_ARG ] and

𝒌𝒌\displaystyle{\bm{k}}bold_italic_k 𝒫𝕌^𝒌~𝒫𝕌^+𝒫ker(𝒫𝕌^)𝒌~𝒫ker(𝒫𝕌^)absentsubscript𝒫^𝕌~𝒌superscriptsubscript𝒫^𝕌subscript𝒫kersubscript𝒫^𝕌~𝒌superscriptsubscript𝒫kersubscript𝒫^𝕌\displaystyle\coloneqq\mathcal{P}_{\hat{{\mathbb{U}}}}\tilde{{\bm{k}}}\mathcal% {P}_{\hat{{\mathbb{U}}}}^{\prime}+\mathcal{P}_{\operatorname{ker}(\mathcal{P}_% {\hat{{\mathbb{U}}}})}\tilde{{\bm{k}}}\mathcal{P}_{\operatorname{ker}(\mathcal% {P}_{\hat{{\mathbb{U}}}})}^{\prime}≔ caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT over~ start_ARG bold_italic_k end_ARG caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT over~ start_ARG bold_italic_k end_ARG caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
=𝒫𝕌^𝒌~𝒫𝕌^+(id𝕌𝒫𝕌^)𝒌~(id𝕌𝒫𝕌^)absentsubscript𝒫^𝕌~𝒌superscriptsubscript𝒫^𝕌subscriptid𝕌subscript𝒫^𝕌~𝒌superscriptsubscriptid𝕌subscript𝒫^𝕌\displaystyle=\mathcal{P}_{\hat{{\mathbb{U}}}}\tilde{{\bm{k}}}\mathcal{P}_{% \hat{{\mathbb{U}}}}^{\prime}+(\operatorname{id}_{{\mathbb{U}}}-\mathcal{P}_{% \hat{{\mathbb{U}}}})\tilde{{\bm{k}}}(\operatorname{id}_{{\mathbb{U}}}-\mathcal% {P}_{\hat{{\mathbb{U}}}})^{\prime}= caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT over~ start_ARG bold_italic_k end_ARG caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + ( roman_id start_POSTSUBSCRIPT blackboard_U end_POSTSUBSCRIPT - caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) over~ start_ARG bold_italic_k end_ARG ( roman_id start_POSTSUBSCRIPT blackboard_U end_POSTSUBSCRIPT - caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
=𝒌~𝒫𝕌^𝒌~𝒌~𝒫𝕌^+2𝒫𝕌^𝒌~𝒫𝕌^absent~𝒌subscript𝒫^𝕌~𝒌~𝒌superscriptsubscript𝒫^𝕌2subscript𝒫^𝕌~𝒌superscriptsubscript𝒫^𝕌\displaystyle=\tilde{{\bm{k}}}-\mathcal{P}_{\hat{{\mathbb{U}}}}\tilde{{\bm{k}}% }-\tilde{{\bm{k}}}\mathcal{P}_{\hat{{\mathbb{U}}}}^{\prime}+2\mathcal{P}_{\hat% {{\mathbb{U}}}}\tilde{{\bm{k}}}\mathcal{P}_{\hat{{\mathbb{U}}}}^{\prime}= over~ start_ARG bold_italic_k end_ARG - caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT over~ start_ARG bold_italic_k end_ARG - over~ start_ARG bold_italic_k end_ARG caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + 2 caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT over~ start_ARG bold_italic_k end_ARG caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

has sample paths in 𝕌𝕌{\mathbb{U}}blackboard_U, 𝐦𝕌^𝐦^𝕌{\bm{m}}\in\hat{{\mathbb{U}}}bold_italic_m ∈ over^ start_ARG blackboard_U end_ARG, and 𝒫ker(𝒫𝕌^)𝐤𝓟m=𝟎subscript𝒫kersubscript𝒫^𝕌𝐤superscriptsubscript𝓟superscript𝑚0\mathcal{P}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})}{\bm{k}}{\bm% {\mathcal{P}}}_{\mathbb{R}^{m}}^{\prime}={\bm{0}}caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT bold_italic_k bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = bold_0.

Figure 9(b) visualizes how a prior of this form reproduces a 1D finite element method in the posterior mean and figure 9 as a whole contrasts the difference between 𝐮~~𝐮\tilde{{\bm{\mathrm{u}}}}over~ start_ARG bold_u end_ARG and 𝐮𝐮{\bm{\mathrm{u}}}bold_u. Intuitively speaking, the construction for the covariance from proposition 5 enforces statistical independence between the subspaces 𝕌^^𝕌\hat{{\mathbb{U}}}over^ start_ARG blackboard_U end_ARG and ker(𝒫𝕌^)kersubscript𝒫^𝕌\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) of the GP’s path space. This way, an observation of the GP prior in the subspace 𝕌^^𝕌\hat{{\mathbb{U}}}over^ start_ARG blackboard_U end_ARG gains no information about ker(𝒫𝕌^)kersubscript𝒫^𝕌\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ), which means that the posterior process will not be updated along ker(𝒫𝕌^)kersubscript𝒫^𝕌\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ). Since 𝒎𝕌^𝒎^𝕌{\bm{m}}\in\hat{{\mathbb{U}}}bold_italic_m ∈ over^ start_ARG blackboard_U end_ARG, i.e. 𝒫ker(𝒫𝕌^)[𝒎]subscript𝒫kersubscript𝒫^𝕌delimited-[]𝒎\mathcal{P}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})}[{\bm{m}}]caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ bold_italic_m ], it follows that the posterior mean will also lie in 𝕌^^𝕌\hat{{\mathbb{U}}}over^ start_ARG blackboard_U end_ARG. Even though this choice of prior is somewhat restrictive, there are good reasons to use it in practice, arguably the most important of which is that the uncertainty quantification provided by the GP can be added on top of traditional MWR solvers in existing pipelines in a plug-and-play fashion. This is because given the MWR recovery prior, the mean estimate of the probabilistic numerical method agrees with the point estimate produced by the classical solver.

3.4 Algorithm

Algorithm 1 summarizes our framework from an algorithmic standpoint. It outlines how a GP prior can be conditioned on heterogeneous sources of information such as mechanistic knowledge given in the form of a linear boundary value problem, and noisy measurement data by leveraging the notion of a linear information operator. All GP posteriors in this article were computed by this algorithm with different choices of prior, PDE, boundary conditions and policy.

Algorithm 1 Solving PDEs via Gaussian Process Inference

Input: Joint GP prior (𝐮,f(w),g(w),ϵ)𝒢𝒫(𝒎,𝒌)similar-to𝐮superscriptf𝑤superscriptg𝑤bold-italic-ϵ𝒢𝒫𝒎𝒌({\bm{\mathrm{u}}},{\mathrm{f}}^{(w)},{\mathrm{g}}^{(w)},{\bm{\mathrm{\epsilon% }}})\sim{\operatorname{\mathcal{GP}}\left({\bm{m}},{\bm{k}}\right)}( bold_u , roman_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT , roman_g start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT , bold_italic_ϵ ) ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m , bold_italic_k ), linear PDE (𝒟(w),f(w))superscript𝒟𝑤superscriptf𝑤(\mathcal{D}^{(w)},{\mathrm{f}}^{(w)})( caligraphic_D start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT , roman_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ), boundary conditions ((w),g(w))superscript𝑤superscriptg𝑤(\mathcal{B}^{(w)},{\mathrm{g}}^{(w)})( caligraphic_B start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT , roman_g start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ), (noisy) measurements (𝑿MEAS,𝒚MEAS)subscript𝑿MEASsubscript𝒚MEAS({\bm{X}}_{\text{MEAS}},{\bm{y}}_{\text{MEAS}})( bold_italic_X start_POSTSUBSCRIPT MEAS end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT MEAS end_POSTSUBSCRIPT ), \dotsc
Output: GP posterior 𝒢𝒫(𝒎(i),𝒌(i))𝒢𝒫superscript𝒎𝑖superscript𝒌𝑖{\operatorname{\mathcal{GP}}\left({\bm{m}}^{(i)},{\bm{k}}^{(i)}\right)}start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , bold_italic_k start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT )

1procedure LinPDE-GP(𝒎,𝒌,,PDE,,BC,𝑿MEAS,𝒚MEAS𝒎𝒌superscriptsubscriptPDEsuperscriptsubscriptBCsubscript𝑿MEASsubscript𝒚MEAS{\bm{m}},{\bm{k}},\mathcal{I}_{\cdot,\cdot}^{\text{PDE}},\mathcal{I}_{\cdot,% \cdot}^{\text{BC}},{\bm{X}}_{\text{MEAS}},{\bm{y}}_{\text{MEAS}}bold_italic_m , bold_italic_k , caligraphic_I start_POSTSUBSCRIPT ⋅ , ⋅ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT PDE end_POSTSUPERSCRIPT , caligraphic_I start_POSTSUBSCRIPT ⋅ , ⋅ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BC end_POSTSUPERSCRIPT , bold_italic_X start_POSTSUBSCRIPT MEAS end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT MEAS end_POSTSUBSCRIPT)
2     i0𝑖0i\leftarrow 0italic_i ← 0
3     (𝒎(0),𝒌(0))(𝒎,𝒌)superscript𝒎0superscript𝒌0𝒎𝒌({\bm{m}}^{(0)},{\bm{k}}^{(0)})\leftarrow({\bm{m}},{\bm{k}})( bold_italic_m start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , bold_italic_k start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) ← ( bold_italic_m , bold_italic_k )
4     𝒘(0)()superscript𝒘0{\bm{w}}^{(0)}\leftarrow()bold_italic_w start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ← ( )
5     𝑮(0)()superscript𝑮0{\bm{G}}^{(0)}\leftarrow()bold_italic_G start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ← ( )
6     while not StoppingCriterion() do
7         ii+1𝑖𝑖1i\leftarrow i+1italic_i ← italic_i + 1
8         (lPDE(i),lBC(i),𝒫𝕌^(i),,𝒗MEAS(i))Policy(𝒎(i),𝒌(i))superscriptsubscript𝑙PDE𝑖superscriptsubscript𝑙BC𝑖superscriptsubscript𝒫^𝕌𝑖superscriptsubscript𝒗MEAS𝑖Policysuperscript𝒎𝑖superscript𝒌𝑖(l_{\text{PDE}}^{(i)},l_{\text{BC}}^{(i)},\mathcal{P}_{\hat{{\mathbb{U}}}}^{(i% )},\dotsc,{\bm{v}}_{\text{MEAS}}^{(i)})\leftarrow\textsc{Policy}({\bm{m}}^{(i)% },{\bm{k}}^{(i)})( italic_l start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_l start_POSTSUBSCRIPT BC end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , … , bold_italic_v start_POSTSUBSCRIPT MEAS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ← Policy ( bold_italic_m start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , bold_italic_k start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) \vartriangleright Action
9         𝓘(i)(𝒖,f(w),g(w),ϵ)(𝓘lPDE(i),𝒫𝕌^(i)PDE[(𝒖,f(w))]𝓘lBC(i),𝒫𝕌^(i)BC[(𝒖,g(w))]𝒗MEAS(i),𝒖(𝑿MEAS)+ϵ)superscript𝓘𝑖𝒖superscript𝑓𝑤superscript𝑔𝑤bold-italic-ϵmaps-tomatrixsuperscriptsubscript𝓘superscriptsubscript𝑙PDE𝑖superscriptsubscript𝒫^𝕌𝑖PDEdelimited-[]𝒖superscript𝑓𝑤superscriptsubscript𝓘superscriptsubscript𝑙BC𝑖superscriptsubscript𝒫^𝕌𝑖BCdelimited-[]𝒖superscript𝑔𝑤superscriptsubscript𝒗MEAS𝑖𝒖subscript𝑿MEASbold-italic-ϵ{\bm{\mathcal{I}}}^{(i)}\leftarrow({\bm{u}},f^{(w)},g^{(w)},{\bm{\epsilon}})% \mapsto\begin{pmatrix}{\bm{\mathcal{I}}}_{l_{\text{PDE}}^{(i)},\mathcal{P}_{% \hat{{\mathbb{U}}}}^{(i)}}^{\text{PDE}}[({\bm{u}},f^{(w)})]\\ {\bm{\mathcal{I}}}_{l_{\text{BC}}^{(i)},\mathcal{P}_{\hat{{\mathbb{U}}}}^{(i)}% }^{\text{BC}}[({\bm{u}},g^{(w)})]\\ \vdots\\ \langle{\bm{v}}_{\text{MEAS}}^{(i)},{\bm{u}}({\bm{X}}_{\text{MEAS}})+{\bm{% \epsilon}}\rangle\end{pmatrix}bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ← ( bold_italic_u , italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT , bold_italic_ϵ ) ↦ ( start_ARG start_ROW start_CELL bold_caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT PDE end_POSTSUPERSCRIPT [ ( bold_italic_u , italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ) ] end_CELL end_ROW start_ROW start_CELL bold_caligraphic_I start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT BC end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BC end_POSTSUPERSCRIPT [ ( bold_italic_u , italic_g start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ) ] end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL ⟨ bold_italic_v start_POSTSUBSCRIPT MEAS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , bold_italic_u ( bold_italic_X start_POSTSUBSCRIPT MEAS end_POSTSUBSCRIPT ) + bold_italic_ϵ ⟩ end_CELL end_ROW end_ARG ) \vartriangleright Information operator
10         𝒚(i)(00𝒗MEAS(i),𝒚MEAS)superscript𝒚𝑖superscriptmatrix00superscriptsubscript𝒗MEAS𝑖subscript𝒚MEAStop{\bm{y}}^{(i)}\leftarrow\begin{pmatrix}0&0&\ldots&\langle{\bm{v}}_{\text{MEAS}% }^{(i)},{\bm{y}}_{\text{MEAS}}\rangle\end{pmatrix}^{\top}bold_italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ← ( start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL ⟨ bold_italic_v start_POSTSUBSCRIPT MEAS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUBSCRIPT MEAS end_POSTSUBSCRIPT ⟩ end_CELL end_ROW end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT \vartriangleright Observations
11         𝑮(i)(𝑮(i1)𝓘(1:i1)𝒌(𝓘(i))𝓘(i)𝒌(𝓘(1:i1))𝓘(i)𝒌(𝓘(i)))superscript𝑮𝑖matrixsuperscript𝑮𝑖1superscript𝓘:1𝑖1𝒌superscriptsuperscript𝓘𝑖superscript𝓘𝑖𝒌superscriptsuperscript𝓘:1𝑖1superscript𝓘𝑖𝒌superscriptsuperscript𝓘𝑖{\bm{G}}^{(i)}\leftarrow\begin{pmatrix}{\bm{G}}^{(i-1)}&{\bm{\mathcal{I}}}^{(1% :i-1)}{\bm{k}}({\bm{\mathcal{I}}}^{(i)})^{\prime}\\ {\bm{\mathcal{I}}}^{(i)}{\bm{k}}({\bm{\mathcal{I}}}^{(1:i-1)})^{\prime}&{\bm{% \mathcal{I}}}^{(i)}{\bm{k}}({\bm{\mathcal{I}}}^{(i)})^{\prime}\end{pmatrix}bold_italic_G start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ← ( start_ARG start_ROW start_CELL bold_italic_G start_POSTSUPERSCRIPT ( italic_i - 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL bold_caligraphic_I start_POSTSUPERSCRIPT ( 1 : italic_i - 1 ) end_POSTSUPERSCRIPT bold_italic_k ( bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT bold_italic_k ( bold_caligraphic_I start_POSTSUPERSCRIPT ( 1 : italic_i - 1 ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL start_CELL bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT bold_italic_k ( bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) \vartriangleright Update Gram matrix
12         𝒘(i)(𝑮(i))(𝒚(1:i)𝓘(1:i)[𝒎])superscript𝒘𝑖superscriptsuperscript𝑮𝑖superscript𝒚:1𝑖superscript𝓘:1𝑖delimited-[]𝒎{\bm{w}}^{(i)}\leftarrow({\bm{G}}^{(i)})^{\dagger}({\bm{y}}^{(1:i)}-{\bm{% \mathcal{I}}}^{(1:i)}[{\bm{m}}])bold_italic_w start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ← ( bold_italic_G start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( bold_italic_y start_POSTSUPERSCRIPT ( 1 : italic_i ) end_POSTSUPERSCRIPT - bold_caligraphic_I start_POSTSUPERSCRIPT ( 1 : italic_i ) end_POSTSUPERSCRIPT [ bold_italic_m ] ) \vartriangleright Update representer weights
13         mj(i)𝒙mj(𝒙)+𝓘(1:i)[𝒌:,j(,𝒙)]𝒘(i)subscriptsuperscript𝑚𝑖𝑗𝒙maps-tosubscript𝑚𝑗𝒙superscript𝓘:1𝑖superscriptdelimited-[]subscript𝒌:𝑗𝒙topsuperscript𝒘𝑖{m}^{(i)}_{j}\leftarrow{\bm{x}}\mapsto{m}_{j}({\bm{x}})+{\bm{\mathcal{I}}}^{(1% :i)}[{\bm{k}}_{:,j}(\cdot,{\bm{x}})]^{\top}{\bm{w}}^{(i)}italic_m start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ← bold_italic_x ↦ italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) + bold_caligraphic_I start_POSTSUPERSCRIPT ( 1 : italic_i ) end_POSTSUPERSCRIPT [ bold_italic_k start_POSTSUBSCRIPT : , italic_j end_POSTSUBSCRIPT ( ⋅ , bold_italic_x ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT \vartriangleright Belief Update
14         kj1,j2(i)(𝒙1,𝒙2)kj1,j2(𝒙1,𝒙2)𝓘(1:i)[𝒌:,j1(,𝒙1)](𝑮(i))𝓘(1:i)[𝒌:,j2(,𝒙2)]subscriptsuperscript𝑘𝑖subscript𝑗1subscript𝑗2subscript𝒙1subscript𝒙2maps-tosubscript𝑘subscript𝑗1subscript𝑗2subscript𝒙1subscript𝒙2superscript𝓘:1𝑖superscriptdelimited-[]subscript𝒌:subscript𝑗1subscript𝒙1topsuperscriptsuperscript𝑮𝑖superscript𝓘:1𝑖delimited-[]subscript𝒌:subscript𝑗2subscript𝒙2{k}^{(i)}_{j_{1},j_{2}}\leftarrow({\bm{x}}_{1},{\bm{x}}_{2})\mapsto{k}_{j_{1},% j_{2}}({\bm{x}}_{1},{\bm{x}}_{2})-{\bm{\mathcal{I}}}^{(1:i)}[{\bm{k}}_{:,j_{1}% }(\cdot,{\bm{x}}_{1})]^{\top}({\bm{G}}^{(i)})^{\dagger}{\bm{\mathcal{I}}}^{(1:% i)}[{\bm{k}}_{:,j_{2}}(\cdot,{\bm{x}}_{2})]italic_k start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ↦ italic_k start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - bold_caligraphic_I start_POSTSUPERSCRIPT ( 1 : italic_i ) end_POSTSUPERSCRIPT [ bold_italic_k start_POSTSUBSCRIPT : , italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_G start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_caligraphic_I start_POSTSUPERSCRIPT ( 1 : italic_i ) end_POSTSUPERSCRIPT [ bold_italic_k start_POSTSUBSCRIPT : , italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ]      
15     return 𝒢𝒫(𝒎(i),𝒌(i))𝒢𝒫superscript𝒎𝑖superscript𝒌𝑖{\operatorname{\mathcal{GP}}\left({\bm{m}}^{(i)},{\bm{k}}^{(i)}\right)}start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , bold_italic_k start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT )

Modeling uncertainty over the right-hand side f(w)superscript𝑓𝑤f^{(w)}italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT of the PDE and the boundary function(al) g(w)superscript𝑔𝑤g^{(w)}italic_g start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT is achieved by specifying a joint prior (𝐮,f(w),g(w),ϵ)𝐮superscriptf𝑤superscriptg𝑤bold-italic-ϵ({\bm{\mathrm{u}}},{\mathrm{f}}^{(w)},{\mathrm{g}}^{(w)},{\bm{\mathrm{\epsilon% }}})( bold_u , roman_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT , roman_g start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT , bold_italic_ϵ ). Therefore, Algorithm 1 also returns a multi-output Gaussian process posterior over the same objects. This means that our method can be used to solve PDE-constrained Bayesian inverse problems for the right-hand side f(w)superscript𝑓𝑤f^{(w)}italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT and the boundary function g(w)superscript𝑔𝑤g^{(w)}italic_g start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT, while computing a consistent distributional estimate for the corresponding solution 𝒖𝒖{\bm{u}}bold_italic_u of the forward problem. This is a generalization of a linear latent force model (Alvarez et al., 2009). If f(w)superscript𝑓𝑤f^{(w)}italic_f start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT and g(w)superscript𝑔𝑤g^{(w)}italic_g start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT are not uncertain, the corresponding covariance functions in the joint prior can simply be set to 0, which (in the absence of measurements) reduces the joint prior to a simple prior over the solution 𝐮𝐮{\bm{\mathrm{u}}}bold_u. To condition the GP on the PDE and the boundary conditions, we make use of MWR information operators (see definition 2), where the test functions and projections are chosen by an arbitrary policy in each iteration of the method. An example of such a policy which reproduces figure 1(c) chooses 𝒫𝕌^subscript𝒫^𝕌\mathcal{P}_{\hat{{\mathbb{U}}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT as the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT projection onto the basis from example 7 in every iteration, the test functions lBC{δ1,δ1}subscript𝑙BCsubscript𝛿1subscript𝛿1l_{\text{BC}}\in\{\delta_{-1},\delta_{1}\}italic_l start_POSTSUBSCRIPT BC end_POSTSUBSCRIPT ∈ { italic_δ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }, and lPDE=0subscript𝑙PDE0l_{\text{PDE}}=0italic_l start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT = 0 in the first two iterations; and lPDEsubscript𝑙PDEl_{\text{PDE}}italic_l start_POSTSUBSCRIPT PDE end_POSTSUBSCRIPT is induced by ψ(i2)=ϕ(i2)superscript𝜓𝑖2superscriptitalic-ϕ𝑖2\psi^{(i-2)}=\phi^{(i-2)}italic_ψ start_POSTSUPERSCRIPT ( italic_i - 2 ) end_POSTSUPERSCRIPT = italic_ϕ start_POSTSUPERSCRIPT ( italic_i - 2 ) end_POSTSUPERSCRIPT (and lBC=0subscript𝑙BC0l_{\text{BC}}=0italic_l start_POSTSUBSCRIPT BC end_POSTSUBSCRIPT = 0) from iteration 3 onward. The ellipses in the information operator 𝓘(i)superscript𝓘𝑖{\bm{\mathcal{I}}}^{(i)}bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT and the observations 𝒚(i)superscript𝒚𝑖{\bm{y}}^{(i)}bold_italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT indicate that adding additional information operators is possible in the same fashion. For instance, adding additional PDE information operators enables the solution of systems of linear PDEs.

Performance Considerations

Instead of naively conditioning the previous conditional process on the new observation in each iteration, algorithm 1 always conditions the prior on the accumulated observations. This is because the naive expressions for the conditional moments become more and more complex over time. While, in principle, it is possible to use automatic differentiation (AD) to compute 𝓘(i)[𝒎(i)]superscript𝓘𝑖delimited-[]superscript𝒎𝑖{\bm{\mathcal{I}}}^{(i)}[{\bm{m}}^{(i)}]bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ bold_italic_m start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ], 𝓘(i)[𝒌:,j(i1)(,𝒙)]superscript𝓘𝑖delimited-[]subscriptsuperscript𝒌𝑖1:𝑗𝒙{\bm{\mathcal{I}}}^{(i)}[{\bm{k}}^{(i-1)}_{:,j}(\cdot,{\bm{x}})]bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ bold_italic_k start_POSTSUPERSCRIPT ( italic_i - 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT : , italic_j end_POSTSUBSCRIPT ( ⋅ , bold_italic_x ) ], and 𝓘(i)𝒌(i1)(𝓘(i))superscript𝓘𝑖superscript𝒌𝑖1superscriptsuperscript𝓘𝑖{\bm{\mathcal{I}}}^{(i)}{\bm{k}}^{(i-1)}({\bm{\mathcal{I}}}^{(i)})^{\prime}bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT bold_italic_k start_POSTSUPERSCRIPT ( italic_i - 1 ) end_POSTSUPERSCRIPT ( bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in each iteration and then evaluate equations 4.10 and 4.11 naively, we found that this is detrimental to the performance of the algorithm. In algorithm 1, we only need to compute 𝓘(i)[𝒎]superscript𝓘𝑖delimited-[]𝒎{\bm{\mathcal{I}}}^{(i)}[{\bm{m}}]bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ bold_italic_m ], 𝓘(i)[𝒌:,j(,𝒙)]superscript𝓘𝑖delimited-[]subscript𝒌:𝑗𝒙{\bm{\mathcal{I}}}^{(i)}[{\bm{k}}_{:,j}(\cdot,{\bm{x}})]bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT [ bold_italic_k start_POSTSUBSCRIPT : , italic_j end_POSTSUBSCRIPT ( ⋅ , bold_italic_x ) ], and 𝓘(i)𝒌(𝓘(i))superscript𝓘𝑖𝒌superscriptsuperscript𝓘𝑖{\bm{\mathcal{I}}}^{(i)}{\bm{k}}({\bm{\mathcal{I}}}^{(i)})^{\prime}bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT bold_italic_k ( bold_caligraphic_I start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT on the prior moments, which are much less complex and cheaper to evaluate. For maximum efficiency, for many information operator / kernel combinations one can compute optimized closed-form expressions for these terms, alleviating the need for automatic differentiation or quadrature. We can avoid unnecessary recomputation of the representer weights at every iteration of the method by means of block-matrix inversion. For instance, if a Cholesky decomposition is used to invert the Gramian 𝑮(i)superscript𝑮𝑖{\bm{G}}^{(i)}bold_italic_G start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT, we can use a variant of the block Cholesky decomposition (Golub and Van Loan, 2013) to update the Cholesky factor of 𝑮(i1)superscript𝑮𝑖1{\bm{G}}^{(i-1)}bold_italic_G start_POSTSUPERSCRIPT ( italic_i - 1 ) end_POSTSUPERSCRIPT.

Code

A Python implementation of algorithm 1 based on ProbNum (Wenger et al., 2021) and JAX (Bradbury et al., 2018) is available at:

https://github.com/marvinpfoertner/linpde-gp

3.5 Related Work

The area of physics-informed machine learning (Karniadakis et al., 2021) aims at augmenting machine learning models with mechanistic knowledge about physical phenomena, mostly in the form of ordinary and partial differential equations. Recently, there has been growing interest in deep learning–based approaches (Raissi et al., 2019; Li et al., 2020, 2021). However, this model choice makes it inherently difficult to quantify the uncertainty about the solution induced by noise-corrupted input data and inevitable approximation error. Instead, we approach the problem through the lens of probabilistic numerics (Hennig et al., 2015; Cockayne et al., 2019b; Oates and Sullivan, 2019; Owhadi et al., 2019; Hennig et al., 2022), which frames numerical problems as statistical estimation tasks. Probabilistic numerical methods for the solution of PDEs are predominantly based on Gaussian process priors. Our work builds upon and extends these works. Many existing methods aim to find a strong solution to a linear PDE using a collocation scheme (e.g. Graepel 2003; Cockayne et al. 2017; Raissi et al. 2017). Unfortunately, many practically relevant (linear) PDEs only admit weak solutions. Our framework extends existing collocation approaches to weak formulations. Probabilistic numerical methods approximating weak formulations are primarily based on discretization. For example, Cockayne et al. (2019a); Wenger and Hennig (2020) apply a probabilistic linear solver to the linear system arising from discretization. Girolami et al. (2021) propose a statistical version of the finite element method (statFEM), which uses a specific parametric GP prior. However, these approaches do not quantify the inherent discretization error – often the largest source of uncertainty about the solution. In contrast, our framework models this error and additionally admits a broader class of discretizations. Wang et al. (2021); Krämer et al. (2022) propose GP-based solvers for strong formulations of time-dependent nonlinear PDEs by leveraging finite-difference approximations to the differential operator and linearization-based approximate inference. While it is possible to apply such methods to linear PDEs, the finite difference approximation of the differential operator introduces additional estimation error. In contrast, the evaluation of the differential operator in our method is exact. Cockayne et al. (2017); Raissi et al. (2017); Girolami et al. (2021) also apply their methods to solve PDE-constrained (Bayesian) inverse problems. Särkkä (2011) directly infers the right-hand side of a linear PDE in strong formulation by observing measurements of the solution through the associated Green’s function. Our approach also builds a belief over an unknown right-hand side without requiring access to a Green’s function. The aforementioned methods use the closure of Gaussian processes under conditioning on observations of the sample paths through a linear operator without proof. Owhadi and Scovel (2018) show how to condition Gaussian measures on an orthogonal direct sum of separable Hilbert spaces on observations of one of the summands. However, this result does not apply to separable Banach spaces such as Hölder spaces, which are ubiquitous in the study of strong solutions of linear PDEs. Furthermore, when it can be applied, it does not translate to Gaussian processes without significant effort.888The theoretical results of an earlier version of this work were based on the result by Owhadi and Scovel (2018). In order to generalize our framework to Banach spaces, we’ve adopted a different proof strategy. Our work therefore provides the theoretical basis for conditioning Gaussian processes on observations of their sample paths made through an arbitrary bounded linear operator with values in nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Recent results about the sample spaces of GPs (Steinwart, 2019; Kanagawa et al., 2018) ensure the applicability of our work to practical GP regression problems. From a practitioner’s perspective this allows the modeling flexibility of Gaussian processes via the kernel, while ensuring that conditioning on observations of the sample paths through a linear operator is possible. To our knowledge this is the first complete proof of this widely used property of GPs. Thus, theorem 1 provides the theoretical basis for physics-informed GP regression, including the aforementioned methods for the solution of PDEs. In our work, it enables conditioning on information operators constructed from e.g. PDEs, boundary conditions and general integral equations.

4 Gaussian Process Inference with Linear Operator Observations

Our framework fundamentally relies on the fact that when a Gaussian process prior is conditioned on linear observations of its paths, one obtains a closed-form posterior. This section provides the theoretical foundation for this result. While this property is used widely in the literature (see e.g. Graepel (2003); Rasmussen and Williams (2006); Särkkä (2011); Särkkä et al. (2013); Cockayne et al. (2017); Raissi et al. (2017); Agrell (2019); Albert (2019); Krämer et al. (2022)), no proof of its general form where observations are made via bounded linear operators mapping a separable Banach function spaces into nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, instead of finite-dimensional linear maps on a finite number of point evaluations exists, to the best of our knowledge. Owhadi and Scovel (2018) give a proof of a related property for Gaussian measures on separable Hilbert spaces. Here, we extend their results to the case of Gaussian processes. While these perspectives are closely related, significant technical attention needs to be paid for this result to transfer to the GP case. For our framework this is essential such that we can leverage the modeling capabilities provided by specifying a kernel as described in section 3.1.1.

To state the result, let 𝐟𝒢𝒫(𝒎,𝒌)similar-to𝐟𝒢𝒫𝒎𝒌{\bm{\mathrm{f}}}\sim{\operatorname{\mathcal{GP}}\left({\bm{m}},{\bm{k}}\right)}bold_f ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m , bold_italic_k ) be a (multi-output) GP prior with index set 𝕏𝕏{\mathbb{X}}blackboard_X, 𝓛:paths(𝐟)n:𝓛paths𝐟superscript𝑛{\bm{\mathcal{L}}}\colon\operatorname{paths}\left({\bm{\mathrm{f}}}\right)\to% \mathbb{R}^{n}bold_caligraphic_L : roman_paths ( bold_f ) → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT a linear operator acting on the paths of 𝐟𝐟{\bm{\mathrm{f}}}bold_f, and ϵ𝒩(𝝁,𝚺)similar-tobold-italic-ϵ𝒩𝝁𝚺{\bm{\mathrm{\epsilon}}}\sim{\operatorname{\mathcal{N}}\left({\bm{\mu}},{\bm{% \Sigma}}\right)}bold_italic_ϵ ∼ caligraphic_N ( bold_italic_μ , bold_Σ ) a Gaussian random vector in nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with ϵ𝐟{\bm{\mathrm{\epsilon}}}\perp\!\!\!\!\perp{\bm{\mathrm{f}}}bold_italic_ϵ ⟂ ⟂ bold_f. We need to compute the conditional random process

𝐟\nonscript|\nonscript𝓛[𝐟]+ϵ=𝒚conditional𝐟\nonscript\nonscript𝓛delimited-[]𝐟bold-italic-ϵ𝒚{\bm{\mathrm{f}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{\mathcal{% L}}}[{\bm{\mathrm{f}}}]+{\bm{\mathrm{\epsilon}}}={\bm{y}}bold_f | bold_caligraphic_L [ bold_f ] + bold_italic_ϵ = bold_italic_y

for some 𝒚n𝒚superscript𝑛{\bm{y}}\in\mathbb{R}^{n}bold_italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. This object is defined as the family (𝐟\nonscript|\nonscript𝓛[𝐟]+ϵ=𝒚){𝐟(x,)\nonscript|\nonscriptE}x𝕏,\left(\left.{\bm{\mathrm{f}}}\nonscript\>\middle|\allowbreak\nonscript\>% \mathopen{}{\bm{\mathcal{L}}}[{\bm{\mathrm{f}}}]+{\bm{\mathrm{\epsilon}}}={\bm% {y}}\right.\right)\coloneqq\{{\bm{\mathrm{f}}}(x,\cdot)\nonscript\>|% \allowbreak\nonscript\>\mathopen{}E\}_{x\in{\mathbb{X}}},( bold_f | bold_caligraphic_L [ bold_f ] + bold_italic_ϵ = bold_italic_y ) ≔ { bold_f ( italic_x , ⋅ ) | italic_E } start_POSTSUBSCRIPT italic_x ∈ blackboard_X end_POSTSUBSCRIPT , of conditional random variables999 Here, we need to work with regular conditional probability measures (Klenke, 2014), since the event E𝐸Eitalic_E typically has probability 0. , where (Ω,(Ω),P)ΩΩP(\Omega,\mathcal{B}\left(\Omega\right),\mathrm{P})( roman_Ω , caligraphic_B ( roman_Ω ) , roman_P ) is the probability space on which both 𝐟𝐟{\bm{\mathrm{f}}}bold_f and ϵbold-italic-ϵ{\bm{\mathrm{\epsilon}}}bold_italic_ϵ are defined, E𝐸Eitalic_E is the event E𝐡1({𝒚})(Ω)𝐸superscript𝐡1𝒚ΩE\coloneqq{\bm{\mathrm{h}}}^{-1}(\{{\bm{y}}\})\in\mathcal{B}\left(\Omega\right)italic_E ≔ bold_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( { bold_italic_y } ) ∈ caligraphic_B ( roman_Ω ), and 𝐡𝐡{\bm{\mathrm{h}}}bold_h is the random variable

𝐡:Ωn,ω𝓛[𝐟(,ω)]+ϵ(ω).:𝐡formulae-sequenceΩsuperscript𝑛maps-to𝜔𝓛delimited-[]𝐟𝜔bold-italic-ϵ𝜔{\bm{\mathrm{h}}}\colon\Omega\to\mathbb{R}^{n},\omega\mapsto{\bm{\mathcal{L}}}% [{\bm{\mathrm{f}}}(\cdot,\omega)]+{\bm{\mathrm{\epsilon}}}(\omega).bold_h : roman_Ω → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_ω ↦ bold_caligraphic_L [ bold_f ( ⋅ , italic_ω ) ] + bold_italic_ϵ ( italic_ω ) .

We refer to section B for definitions of the objects mentioned above. For instance, in section 3, we use 𝓛(𝒟[](𝒙i))i=1n𝓛superscriptsubscript𝒟delimited-[]subscript𝒙𝑖𝑖1𝑛{\bm{\mathcal{L}}}\coloneqq\left(\mathcal{D}[\cdot]({\bm{x}}_{i})\right)_{i=1}% ^{n}bold_caligraphic_L ≔ ( caligraphic_D [ ⋅ ] ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where 𝒟𝒟\mathcal{D}caligraphic_D is a linear differential operator, as well as 𝓛[𝐟](𝐟(𝒙i))i=1n𝓛delimited-[]𝐟superscriptsubscript𝐟subscript𝒙𝑖𝑖1𝑛{\bm{\mathcal{L}}}[{\bm{\mathrm{f}}}]\coloneqq({\bm{\mathrm{f}}}({\bm{x}}_{i})% )_{i=1}^{n}bold_caligraphic_L [ bold_f ] ≔ ( bold_f ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and, in section 3.2, we additionally use

𝓛[𝐟]=𝕏𝐟(𝒙)d𝒙.𝓛delimited-[]𝐟subscript𝕏𝐟𝒙differential-d𝒙{\bm{\mathcal{L}}}[{\bm{\mathrm{f}}}]=\int_{{\mathbb{X}}}{\bm{\mathrm{f}}}({% \bm{x}})\,\mathrm{d}{\bm{x}}.bold_caligraphic_L [ bold_f ] = ∫ start_POSTSUBSCRIPT blackboard_X end_POSTSUBSCRIPT bold_f ( bold_italic_x ) roman_d bold_italic_x .

It is well-known that 𝐡𝐡{\bm{\mathrm{h}}}bold_h is a Gaussian random vector 𝐡𝒢𝒫(𝓛[𝒎]+𝝁,𝓛𝒌𝓛+𝚺),similar-to𝐡𝒢𝒫𝓛delimited-[]𝒎𝝁𝓛𝒌superscript𝓛𝚺{\bm{\mathrm{h}}}\sim{\operatorname{\mathcal{GP}}\left({\bm{\mathcal{L}}}[{\bm% {m}}]+{\bm{\mu}},{\bm{\mathcal{L}}}{\bm{k}}{\bm{\mathcal{L}}}^{\prime}+{\bm{% \Sigma}}\right)},bold_h ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_caligraphic_L [ bold_italic_m ] + bold_italic_μ , bold_caligraphic_L bold_italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_Σ ) , where 𝓛𝒌𝓛n×n𝓛𝒌superscript𝓛superscript𝑛𝑛{\bm{\mathcal{L}}}{\bm{k}}{\bm{\mathcal{L}}}^{\prime}\in\mathbb{R}^{n\times n}bold_caligraphic_L bold_italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT with (𝓛𝒌𝓛)i1,i2=i1[𝒙(i2[𝒌j1,:(𝒙,)])j1=1n],subscript𝓛𝒌superscript𝓛subscript𝑖1subscript𝑖2subscriptsubscript𝑖1delimited-[]maps-to𝒙superscriptsubscriptsubscriptsubscript𝑖2delimited-[]subscript𝒌subscript𝑗1:𝒙subscript𝑗11𝑛\left({\bm{\mathcal{L}}}{\bm{k}}{\bm{\mathcal{L}}}^{\prime}\right)_{i_{1},i_{2% }}={\mathcal{L}}_{i_{1}}[{\bm{x}}\mapsto({\mathcal{L}}_{i_{2}}[{\bm{k}}_{j_{1}% ,:}({\bm{x}},\cdot)])_{j_{1}=1}^{n}],( bold_caligraphic_L bold_italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = caligraphic_L start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_x ↦ ( caligraphic_L start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_k start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , : end_POSTSUBSCRIPT ( bold_italic_x , ⋅ ) ] ) start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ] , and that the conditional random process is a (multi-output) Gaussian process

𝐟\nonscript|\nonscript𝓛[𝐟]+ϵ=𝒚𝒢𝒫(𝒎𝐟\nonscript|\nonscript𝒚,𝒌𝐟\nonscript|\nonscript𝒚)\left.{\bm{\mathrm{f}}}\nonscript\>\middle|\allowbreak\nonscript\>\mathopen{}{% \bm{\mathcal{L}}}[{\bm{\mathrm{f}}}]+{\bm{\mathrm{\epsilon}}}={\bm{y}}\right.% \sim{\operatorname{\mathcal{GP}}\left({\bm{m}}^{{\bm{\mathrm{f}}}\nonscript\>|% \allowbreak\nonscript\>\mathopen{}{\bm{y}}},{\bm{k}}^{{\bm{\mathrm{f}}}% \nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{y}}}\right)}bold_f | bold_caligraphic_L [ bold_f ] + bold_italic_ϵ = bold_italic_y ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( bold_italic_m start_POSTSUPERSCRIPT bold_f | bold_italic_y end_POSTSUPERSCRIPT , bold_italic_k start_POSTSUPERSCRIPT bold_f | bold_italic_y end_POSTSUPERSCRIPT )

with conditional moments given by

mi𝐟\nonscript|\nonscript𝒚(𝒙)subscriptsuperscript𝑚conditional𝐟\nonscript\nonscript𝒚𝑖𝒙\displaystyle{m}^{{\bm{\mathrm{f}}}\nonscript\>|\allowbreak\nonscript\>% \mathopen{}{\bm{y}}}_{i}({\bm{x}})italic_m start_POSTSUPERSCRIPT bold_f | bold_italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) =mi(𝒙)+𝓛[𝒌:,i(,𝒙)](𝓛𝒌𝓛+𝚺)1(𝒚(𝓛[𝒎]+𝝁)),andabsentsubscript𝑚𝑖𝒙𝓛superscriptdelimited-[]subscript𝒌:𝑖𝒙topsuperscript𝓛𝒌superscript𝓛𝚺1𝒚𝓛delimited-[]𝒎𝝁and\displaystyle={m}_{i}({\bm{x}})+{\bm{\mathcal{L}}}[{\bm{k}}_{:,i}(\cdot,{\bm{x% }})]^{\top}\left({\bm{\mathcal{L}}}{\bm{k}}{\bm{\mathcal{L}}}^{\prime}+{\bm{% \Sigma}}\right)^{-1}\left({\bm{y}}-({\bm{\mathcal{L}}}[{\bm{m}}]+{\bm{\mu}})% \right),\qquad\text{and}= italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) + bold_caligraphic_L [ bold_italic_k start_POSTSUBSCRIPT : , italic_i end_POSTSUBSCRIPT ( ⋅ , bold_italic_x ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_caligraphic_L bold_italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_Σ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_y - ( bold_caligraphic_L [ bold_italic_m ] + bold_italic_μ ) ) , and
ki1,i2𝐟\nonscript|\nonscript𝒚(𝒙1,𝒙2)subscriptsuperscript𝑘conditional𝐟\nonscript\nonscript𝒚subscript𝑖1subscript𝑖2subscript𝒙1subscript𝒙2\displaystyle{k}^{{\bm{\mathrm{f}}}\nonscript\>|\allowbreak\nonscript\>% \mathopen{}{\bm{y}}}_{i_{1},i_{2}}({\bm{x}}_{1},{\bm{x}}_{2})italic_k start_POSTSUPERSCRIPT bold_f | bold_italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) =ki1,i2(𝒙1,𝒙2)+𝓛[𝒌:,i1(,𝒙1)](𝓛𝒌𝓛+𝚺)1𝓛[𝒌:,i2(,𝒙2)].absentsubscript𝑘subscript𝑖1subscript𝑖2subscript𝒙1subscript𝒙2𝓛superscriptdelimited-[]subscript𝒌:subscript𝑖1subscript𝒙1topsuperscript𝓛𝒌superscript𝓛𝚺1𝓛delimited-[]subscript𝒌:subscript𝑖2subscript𝒙2\displaystyle={k}_{i_{1},i_{2}}({\bm{x}}_{1},{\bm{x}}_{2})+{\bm{\mathcal{L}}}[% {\bm{k}}_{:,i_{1}}(\cdot,{\bm{x}}_{1})]^{\top}\left({\bm{\mathcal{L}}}{\bm{k}}% {\bm{\mathcal{L}}}^{\prime}+{\bm{\Sigma}}\right)^{-1}{\bm{\mathcal{L}}}[{\bm{k% }}_{:,i_{2}}(\cdot,{\bm{x}}_{2})].= italic_k start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + bold_caligraphic_L [ bold_italic_k start_POSTSUBSCRIPT : , italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_caligraphic_L bold_italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_Σ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_caligraphic_L [ bold_italic_k start_POSTSUBSCRIPT : , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] .

Since the above are nontrivial claims about potentially ill-behaved infinite-dimensional objects, a proof is important, be it just to identify a precise set of assumptions about the objects at play, ensuring the result holds. For instance, the statement that 𝐡𝐡{\bm{\mathrm{h}}}bold_h is a random vector, i.e. a measurable function, is highly nontrivial. To remedy this situation, a major contribution of this work are theorems 1 and 2 and their proof in section B, which prove the claims above under realistic assumptions. Hence, besides being the theoretical basis for this work, theorems 1 and 2 also provide theoretical backing for many of the publications cited above. Our results identify a set of mild assumptions, which are easy to verify and widely-applicable in practical applications. Assumption 1 constitutes the common set of assumptions shared by theorems 1 and 2.

Assumption 1.

Let f𝒢𝒫(m,k)similar-tof𝒢𝒫𝑚𝑘{\mathrm{f}}\sim{\operatorname{\mathcal{GP}}\left(m,k\right)}roman_f ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( italic_m , italic_k ) be a Gaussian process prior with index set 𝕏𝕏{\mathbb{X}}blackboard_X on the probability space (Ω,,P)ΩP(\Omega,\mathcal{F},\mathrm{P})( roman_Ω , caligraphic_F , roman_P ), whose paths lie in a real separable reproducing kernel Banach space (RKBS) 𝔹𝕏𝔹superscript𝕏{\mathbb{B}}\subset\mathbb{R}^{{\mathbb{X}}}blackboard_B ⊂ blackboard_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT such that ωf(,ω)maps-to𝜔f𝜔\omega\mapsto{\mathrm{f}}(\cdot,\omega)italic_ω ↦ roman_f ( ⋅ , italic_ω ) is a 𝔹𝔹{\mathbb{B}}blackboard_B-valued Gaussian random variable.

For instance, for a 1D domain 𝔻𝔻{\mathbb{D}}\subset\mathbb{R}blackboard_D ⊂ blackboard_R, a GP prior with half-integer Matérn kernel with smoothness parameter ν=p+12𝜈𝑝12\nu=p+\frac{1}{2}italic_ν = italic_p + divide start_ARG 1 end_ARG start_ARG 2 end_ARG fulfills assumption 1 with 𝔹=Cp(𝔻¯)𝔹superscript𝐶𝑝¯𝔻{\mathbb{B}}=C^{p}(\overline{{\mathbb{D}}})blackboard_B = italic_C start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_D end_ARG ), i.e. the space of p𝑝pitalic_p-times differentiable functions with bounded and uniformly continuous derivatives. Similar results hold in multiple dimensions and for other kernels. See section B.4 for more information on prior selection.

Table 2: theorem 1 provides the theoretical basis to condition on (affine) observations of a Gaussian process. While results like conditioning on derivative evaluations are used ubiquitously (e.g. for monotonic GPs, Bayesian optimization, probabilistic numerical PDE solvers, …) a complete proof does not exist in the literature, to the best of our knowledge.
Observation Information operator Proof known? Reference
Point evaluation 𝐟(𝒙)𝐟𝒙{\bm{\mathrm{f}}}({\bm{x}})bold_f ( bold_italic_x ) Bishop (2006)
Finite-dim. affine map 𝑨𝐟(𝑿)+𝒃𝑨𝐟𝑿𝒃{\bm{A}}{\bm{\mathrm{f}}}({\bm{X}})+{\bm{b}}bold_italic_A bold_f ( bold_italic_X ) + bold_italic_b Bishop (2006)
Point evaluation of derivative |𝜶|𝒙𝜶fi|𝒙=𝒙evaluated-atsuperscript𝜶superscript𝒙𝜶subscriptf𝑖𝒙superscript𝒙\left.\frac{\partial^{\lvert{\bm{\alpha}}\rvert}}{\partial{\bm{x}}^{{\bm{% \alpha}}}}{\mathrm{f}}_{i}\right|_{{\bm{x}}={\bm{x}}^{\prime}}divide start_ARG ∂ start_POSTSUPERSCRIPT | bold_italic_α | end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_italic_x start_POSTSUPERSCRIPT bold_italic_α end_POSTSUPERSCRIPT end_ARG roman_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT corollary 2
Integral 𝕏𝝍(𝒙),𝐟(𝒙)dμ(𝒙)subscript𝕏𝝍𝒙𝐟𝒙differential-d𝜇𝒙\int_{{\mathbb{X}}}\langle{\bm{\psi}}({\bm{x}}),{\bm{\mathrm{f}}}({\bm{x}})% \rangle\,\mathrm{d}\mu\left({\bm{x}}\right)∫ start_POSTSUBSCRIPT blackboard_X end_POSTSUBSCRIPT ⟨ bold_italic_ψ ( bold_italic_x ) , bold_f ( bold_italic_x ) ⟩ roman_d italic_μ ( bold_italic_x ) theorem 1
General affine functionals 𝓛[𝐟]+𝒃𝓛delimited-[]𝐟𝒃{\bm{\mathcal{L}}}[{\bm{\mathrm{f}}}]+{\bm{b}}bold_caligraphic_L [ bold_f ] + bold_italic_b theorem 1

Theorem 1 enables affine observations, in which the GP sample paths enter through one or multiple continuous linear functionals. For example, we used theorem 1 in section 3.2 to condition on observations of an integral of a GP’s paths and in section 3.3 to condition on projections of the paths. To state the result conveniently, we introduce some notation.

Notation 1.

Let assumption 1 hold and let 𝓛:𝔹n:𝓛𝔹superscript𝑛{\bm{\mathcal{L}}}\colon{\mathbb{B}}\to\mathbb{R}^{n}bold_caligraphic_L : blackboard_B → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝓛~:𝔹n~:~𝓛𝔹superscript~𝑛\tilde{{\bm{\mathcal{L}}}}\colon{\mathbb{B}}\to\mathbb{R}^{\tilde{n}}over~ start_ARG bold_caligraphic_L end_ARG : blackboard_B → blackboard_R start_POSTSUPERSCRIPT over~ start_ARG italic_n end_ARG end_POSTSUPERSCRIPT be bounded linear operators. By 𝓛k𝓛~n1×n2𝓛𝑘superscript~𝓛superscriptsubscript𝑛1subscript𝑛2{\bm{\mathcal{L}}}k\tilde{{\bm{\mathcal{L}}}}^{\prime}\in\mathbb{R}^{n_{1}% \times n_{2}}bold_caligraphic_L italic_k over~ start_ARG bold_caligraphic_L end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT we denote the matrix with entries

(𝓛k𝓛~)ij𝓛[𝒙𝓛~[k(𝒙,)]j]i.subscript𝓛𝑘superscript~𝓛𝑖𝑗𝓛subscriptdelimited-[]maps-to𝒙~𝓛subscriptdelimited-[]𝑘𝒙𝑗𝑖({\bm{\mathcal{L}}}k\tilde{{\bm{\mathcal{L}}}}^{\prime})_{ij}\coloneqq{\bm{% \mathcal{L}}}[{\bm{x}}\mapsto\tilde{{\bm{\mathcal{L}}}}[k({\bm{x}},\cdot)]_{j}% ]_{i}.( bold_caligraphic_L italic_k over~ start_ARG bold_caligraphic_L end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ≔ bold_caligraphic_L [ bold_italic_x ↦ over~ start_ARG bold_caligraphic_L end_ARG [ italic_k ( bold_italic_x , ⋅ ) ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

The order in which the operators 𝓛𝓛{\bm{\mathcal{L}}}bold_caligraphic_L, 𝓛~~𝓛\tilde{{\bm{\mathcal{L}}}}over~ start_ARG bold_caligraphic_L end_ARG are applied to the arguments of k𝑘kitalic_k does not matter, i.e. (𝓛k𝓛~)ij=𝓛[𝒙𝓛~[k(𝒙,)]j]i=𝓛~[𝒙𝓛[k(,𝒙)]i]j.subscript𝓛𝑘superscript~𝓛𝑖𝑗𝓛subscriptdelimited-[]maps-to𝒙~𝓛subscriptdelimited-[]𝑘𝒙𝑗𝑖~𝓛subscriptdelimited-[]maps-to𝒙𝓛subscriptdelimited-[]𝑘𝒙𝑖𝑗({\bm{\mathcal{L}}}k\tilde{{\bm{\mathcal{L}}}}^{\prime})_{ij}={\bm{\mathcal{L}% }}[{\bm{x}}\mapsto\tilde{{\bm{\mathcal{L}}}}[k({\bm{x}},\cdot)]_{j}]_{i}=% \tilde{{\bm{\mathcal{L}}}}[{\bm{x}}\mapsto{\bm{\mathcal{L}}}[k(\cdot,{\bm{x}})% ]_{i}]_{j}.( bold_caligraphic_L italic_k over~ start_ARG bold_caligraphic_L end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = bold_caligraphic_L [ bold_italic_x ↦ over~ start_ARG bold_caligraphic_L end_ARG [ italic_k ( bold_italic_x , ⋅ ) ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over~ start_ARG bold_caligraphic_L end_ARG [ bold_italic_x ↦ bold_caligraphic_L [ italic_k ( ⋅ , bold_italic_x ) ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . This motivates the parenthesis-free notation 𝓛k𝓛~𝓛𝑘superscript~𝓛{\bm{\mathcal{L}}}k\tilde{{\bm{\mathcal{L}}}}^{\prime}bold_caligraphic_L italic_k over~ start_ARG bold_caligraphic_L end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT introduced above.

Theorem 1.

Let assumption 1 hold and let 𝓛:𝔹n:𝓛𝔹superscript𝑛{\bm{\mathcal{L}}}\colon{\mathbb{B}}\to\mathbb{R}^{n}bold_caligraphic_L : blackboard_B → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be a bounded linear operator. Then

𝓛[f]𝒩(𝓛[m],𝓛k𝓛).similar-to𝓛delimited-[]f𝒩𝓛delimited-[]𝑚𝓛𝑘superscript𝓛{\bm{\mathcal{L}}}[{\mathrm{f}}]\sim{\operatorname{\mathcal{N}}\left({\bm{% \mathcal{L}}}[m],{\bm{\mathcal{L}}}k{\bm{\mathcal{L}}}^{\prime}\right)}.bold_caligraphic_L [ roman_f ] ∼ caligraphic_N ( bold_caligraphic_L [ italic_m ] , bold_caligraphic_L italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) . (4.1)

Let ϵ𝒩(𝛍,𝚺)similar-tobold-ϵ𝒩𝛍𝚺{\bm{\mathrm{\epsilon}}}\sim{\operatorname{\mathcal{N}}\left({\bm{\mu}},{\bm{% \Sigma}}\right)}bold_italic_ϵ ∼ caligraphic_N ( bold_italic_μ , bold_Σ ) be an nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT-valued Gaussian random vector with ϵf{\bm{\mathrm{\epsilon}}}\perp\!\!\!\!\perp{\mathrm{f}}bold_italic_ϵ ⟂ ⟂ roman_f. Then, for any 𝐲n𝐲superscript𝑛{\bm{y}}\in\mathbb{R}^{n}bold_italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT,

f\nonscript|\nonscript𝓛[f]+ϵ=𝒚𝒢𝒫(mf\nonscript|\nonscript𝒚,kf\nonscript|\nonscript𝒚),conditionalf\nonscript\nonscript𝓛delimited-[]fbold-italic-ϵ𝒚similar-to𝒢𝒫superscript𝑚conditionalf\nonscript\nonscript𝒚superscript𝑘conditionalf\nonscript\nonscript𝒚{\mathrm{f}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{\mathcal{L}}}[% {\mathrm{f}}]+{\bm{\mathrm{\epsilon}}}={\bm{y}}\sim{\operatorname{\mathcal{GP}% }\left(m^{{\mathrm{f}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{y}}}% ,k^{{\mathrm{f}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{y}}}\right% )},roman_f | bold_caligraphic_L [ roman_f ] + bold_italic_ϵ = bold_italic_y ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( italic_m start_POSTSUPERSCRIPT roman_f | bold_italic_y end_POSTSUPERSCRIPT , italic_k start_POSTSUPERSCRIPT roman_f | bold_italic_y end_POSTSUPERSCRIPT ) , (4.2)

with conditional mean and covariance function given by

mf\nonscript|\nonscript𝒚(𝒙)=m(𝒙)+𝓛[k(𝒙,)](𝓛k𝓛+𝚺)(𝒚(𝓛[m]+𝝁)),superscript𝑚conditionalf\nonscript\nonscript𝒚𝒙𝑚𝒙𝓛superscriptdelimited-[]𝑘𝒙topsuperscript𝓛𝑘superscript𝓛𝚺𝒚𝓛delimited-[]𝑚𝝁m^{{\mathrm{f}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{y}}}({\bm{x% }})=m({\bm{x}})+{\bm{\mathcal{L}}}[k({\bm{x}},\cdot)]^{\top}\left({\bm{% \mathcal{L}}}k{\bm{\mathcal{L}}}^{\prime}+{\bm{\Sigma}}\right)^{\dagger}\left(% {\bm{y}}-\left({\bm{\mathcal{L}}}[m]+{\bm{\mu}}\right)\right),italic_m start_POSTSUPERSCRIPT roman_f | bold_italic_y end_POSTSUPERSCRIPT ( bold_italic_x ) = italic_m ( bold_italic_x ) + bold_caligraphic_L [ italic_k ( bold_italic_x , ⋅ ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_caligraphic_L italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_Σ ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( bold_italic_y - ( bold_caligraphic_L [ italic_m ] + bold_italic_μ ) ) , (4.3)

and

kf\nonscript|\nonscript𝒚(𝒙1,𝒙2)=k(𝒙1,𝒙2)𝓛[k(𝒙1,)](𝓛k𝓛+𝚺)𝓛[k(,𝒙2)].superscript𝑘conditionalf\nonscript\nonscript𝒚subscript𝒙1subscript𝒙2𝑘subscript𝒙1subscript𝒙2𝓛superscriptdelimited-[]𝑘subscript𝒙1topsuperscript𝓛𝑘superscript𝓛𝚺𝓛delimited-[]𝑘subscript𝒙2k^{{\mathrm{f}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{y}}}({\bm{x% }}_{1},{\bm{x}}_{2})=k({\bm{x}}_{1},{\bm{x}}_{2})-{\bm{\mathcal{L}}}[k({\bm{x}% }_{1},\cdot)]^{\top}\left({\bm{\mathcal{L}}}k{\bm{\mathcal{L}}}^{\prime}+{\bm{% \Sigma}}\right)^{\dagger}{\bm{\mathcal{L}}}[k(\cdot,{\bm{x}}_{2})].italic_k start_POSTSUPERSCRIPT roman_f | bold_italic_y end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = italic_k ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - bold_caligraphic_L [ italic_k ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋅ ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_caligraphic_L italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_Σ ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_caligraphic_L [ italic_k ( ⋅ , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] . (4.4)

Finally, we turn to corollary 2, which is the result that is most widely-used throughout the literature (Graepel, 2003; Särkkä, 2011; Särkkä et al., 2013; Cockayne et al., 2017; Raissi et al., 2017; Agrell, 2019; Albert, 2019; Krämer et al., 2022). It shows how Gaussian processes can be conditioned on point evaluations of the image of their paths under a linear operator, provided that the linear operator is bounded and maps into a separable Banach function space, on which point evaluation is continuous. Moreover, it shows that, under these conditions, the image of the GP under the linear operator is itself a Gaussian process. Again, we introduce some notation to facilitate stating the result.

Notation 2.

Let assumption 1 hold and let i:𝔹𝔹i:subscript𝑖𝔹subscript𝔹𝑖\mathcal{L}_{i}\colon{\mathbb{B}}\to{\mathbb{B}}_{i}caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : blackboard_B → blackboard_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i=1,2𝑖12i=1,2italic_i = 1 , 2 be bounded linear operators mapping into real separable RKBSs 𝔹i𝕏isubscript𝔹𝑖superscriptsubscript𝕏𝑖{\mathbb{B}}_{i}\subset\mathbb{R}^{{\mathbb{X}}_{i}}blackboard_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT blackboard_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, respectively. In analogy to notation 1, we define the bivariate functions

k2::𝑘superscriptsubscript2absent\displaystyle k\mathcal{L}_{2}^{\prime}\colonitalic_k caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : 𝕏𝕏\displaystyle\mathbb{X}blackboard_X ×\displaystyle\times× 𝕏2subscript𝕏2\displaystyle\mathbb{X}_{2}blackboard_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,absent\displaystyle\to\mathbb{R},\,→ blackboard_R , (𝒙,\displaystyle({\bm{x}},\,( bold_italic_x , 𝒙2subscript𝒙2\displaystyle{\bm{x}}_{2}bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ))\displaystyle)) 2[k(𝒙,)](𝒙2),maps-toabsentsubscript2delimited-[]𝑘𝒙subscript𝒙2\displaystyle\mapsto\mathcal{L}_{2}[k({\bm{x}},\cdot)]({\bm{x}}_{2}),↦ caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_k ( bold_italic_x , ⋅ ) ] ( bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , (4.5)
1k::subscript1𝑘absent\displaystyle\mathcal{L}_{1}k\coloncaligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_k : 𝕏1subscript𝕏1\displaystyle\mathbb{X}_{1}blackboard_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ×\displaystyle\times× 𝕏𝕏\displaystyle\mathbb{X}blackboard_X ,absent\displaystyle\to\mathbb{R},\,→ blackboard_R , (𝒙1,\displaystyle({\bm{x}}_{1},\,( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 𝒙𝒙\displaystyle{\bm{x}}bold_italic_x ))\displaystyle)) 1[k(,𝒙)](𝒙1),andmaps-toabsentsubscript1delimited-[]𝑘𝒙subscript𝒙1and\displaystyle\mapsto\mathcal{L}_{1}[k(\cdot,{\bm{x}})]({\bm{x}}_{1}),\qquad% \text{and}↦ caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_k ( ⋅ , bold_italic_x ) ] ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , and (4.6)
1k2::subscript1𝑘superscriptsubscript2absent\displaystyle\mathcal{L}_{1}k\mathcal{L}_{2}^{\prime}\coloncaligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_k caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : 𝕏1subscript𝕏1\displaystyle\mathbb{X}_{1}blackboard_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ×\displaystyle\times× 𝕏2subscript𝕏2\displaystyle\mathbb{X}_{2}blackboard_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,absent\displaystyle\to\mathbb{R},\,→ blackboard_R , (𝒙1,\displaystyle({\bm{x}}_{1},\,( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 𝒙2subscript𝒙2\displaystyle{\bm{x}}_{2}bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ))\displaystyle)) 1[(k2)(,𝒙2)](𝒙1).maps-toabsentsubscript1delimited-[]𝑘superscriptsubscript2subscript𝒙2subscript𝒙1\displaystyle\mapsto\mathcal{L}_{1}[(k\mathcal{L}_{2}^{\prime})(\cdot,{\bm{x}}% _{2})]({\bm{x}}_{1}).↦ caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ ( italic_k caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( ⋅ , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) . (4.7)
Corollary 2.

Let assumption 1 hold and let :𝔹𝔹~:𝔹~𝔹\mathcal{L}\colon{\mathbb{B}}\to\tilde{{\mathbb{B}}}caligraphic_L : blackboard_B → over~ start_ARG blackboard_B end_ARG be a linear operator mapping into a real vector space 𝔹~𝕏~~𝔹superscript~𝕏\tilde{{\mathbb{B}}}\subset\mathbb{R}^{\tilde{{\mathbb{X}}}}over~ start_ARG blackboard_B end_ARG ⊂ blackboard_R start_POSTSUPERSCRIPT over~ start_ARG blackboard_X end_ARG end_POSTSUPERSCRIPT such that δ𝐱~subscript𝛿bold-~𝐱\delta_{\bm{\tilde{x}}}\circ\mathcal{L}italic_δ start_POSTSUBSCRIPT overbold_~ start_ARG bold_italic_x end_ARG end_POSTSUBSCRIPT ∘ caligraphic_L is bounded for all x~𝕏~~𝑥~𝕏\tilde{x}\in\tilde{{\mathbb{X}}}over~ start_ARG italic_x end_ARG ∈ over~ start_ARG blackboard_X end_ARG. Then

[f]𝒢𝒫([m],k).similar-todelimited-[]f𝒢𝒫delimited-[]𝑚𝑘superscript\mathcal{L}[{\mathrm{f}}]\sim{\operatorname{\mathcal{GP}}\left(\mathcal{L}[m],% \mathcal{L}k\mathcal{L}^{\prime}\right)}.caligraphic_L [ roman_f ] ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( caligraphic_L [ italic_m ] , caligraphic_L italic_k caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) . (4.8)

Let ϵ𝒩(𝛍,𝚺)similar-tobold-ϵ𝒩𝛍𝚺{\bm{\mathrm{\epsilon}}}\sim{\operatorname{\mathcal{N}}\left({\bm{\mu}},{\bm{% \Sigma}}\right)}bold_italic_ϵ ∼ caligraphic_N ( bold_italic_μ , bold_Σ ) with values in nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and ϵf{\bm{\mathrm{\epsilon}}}\perp\!\!\!\!\perp{\mathrm{f}}bold_italic_ϵ ⟂ ⟂ roman_f. Then, for 𝐗~=(𝐱~i)i=1n𝕏~n~𝐗superscriptsubscriptsubscript~𝐱𝑖𝑖1𝑛superscript~𝕏𝑛\tilde{{\bm{X}}}=(\tilde{{\bm{x}}}_{i})_{i=1}^{n}\in\tilde{{\mathbb{X}}}^{n}over~ start_ARG bold_italic_X end_ARG = ( over~ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ over~ start_ARG blackboard_X end_ARG start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝐲n𝐲superscript𝑛{\bm{y}}\in\mathbb{R}^{n}bold_italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT,

f\nonscript|\nonscript[f](𝑿~)+ϵ=𝒚𝒢𝒫(mf\nonscript|\nonscript𝒚,kf\nonscript|\nonscript𝒚)conditionalf\nonscript\nonscriptdelimited-[]f~𝑿bold-italic-ϵ𝒚similar-to𝒢𝒫superscript𝑚conditionalf\nonscript\nonscript𝒚superscript𝑘conditionalf\nonscript\nonscript𝒚{\mathrm{f}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\mathcal{L}[{% \mathrm{f}}](\tilde{{\bm{X}}})+{\bm{\mathrm{\epsilon}}}={\bm{y}}\sim{% \operatorname{\mathcal{GP}}\left(m^{{\mathrm{f}}\nonscript\>|\allowbreak% \nonscript\>\mathopen{}{\bm{y}}},k^{{\mathrm{f}}\nonscript\>|\allowbreak% \nonscript\>\mathopen{}{\bm{y}}}\right)}roman_f | caligraphic_L [ roman_f ] ( over~ start_ARG bold_italic_X end_ARG ) + bold_italic_ϵ = bold_italic_y ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( italic_m start_POSTSUPERSCRIPT roman_f | bold_italic_y end_POSTSUPERSCRIPT , italic_k start_POSTSUPERSCRIPT roman_f | bold_italic_y end_POSTSUPERSCRIPT ) (4.9)

with

mf\nonscript|\nonscript𝒚(𝒙)m(𝒙)+(k)(𝒙,𝑿~)((k)(𝑿~,𝑿~)+𝚺)(𝒚([m](X)+𝝁))superscript𝑚conditionalf\nonscript\nonscript𝒚𝒙𝑚𝒙𝑘superscriptsuperscript𝒙~𝑿topsuperscript𝑘superscript~𝑿~𝑿𝚺𝒚delimited-[]𝑚𝑋𝝁m^{{\mathrm{f}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{y}}}({\bm{x% }})\coloneqq m({\bm{x}})+(k\mathcal{L}^{\prime})({\bm{x}},\tilde{{\bm{X}}})^{% \top}\left((\mathcal{L}k\mathcal{L}^{\prime})(\tilde{{\bm{X}}},\tilde{{\bm{X}}% })+{\bm{\Sigma}}\right)^{\dagger}\left({\bm{y}}-\left(\mathcal{L}[m](X)+{\bm{% \mu}}\right)\right)italic_m start_POSTSUPERSCRIPT roman_f | bold_italic_y end_POSTSUPERSCRIPT ( bold_italic_x ) ≔ italic_m ( bold_italic_x ) + ( italic_k caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( bold_italic_x , over~ start_ARG bold_italic_X end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( ( caligraphic_L italic_k caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( over~ start_ARG bold_italic_X end_ARG , over~ start_ARG bold_italic_X end_ARG ) + bold_Σ ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( bold_italic_y - ( caligraphic_L [ italic_m ] ( italic_X ) + bold_italic_μ ) ) (4.10)

and

kf\nonscript|\nonscript𝒚(𝒙1,𝒙2)k(𝒙1,𝒙2)(k)(𝒙1,𝑿~)((k)(𝑿~,𝑿~)+𝚺)(k)(𝑿~,𝒙2)superscript𝑘conditionalf\nonscript\nonscript𝒚subscript𝒙1subscript𝒙2𝑘subscript𝒙1subscript𝒙2𝑘superscriptsuperscriptsubscript𝒙1~𝑿topsuperscript𝑘superscript~𝑿~𝑿𝚺𝑘~𝑿subscript𝒙2k^{{\mathrm{f}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{y}}}({\bm{x% }}_{1},{\bm{x}}_{2})\coloneqq k({\bm{x}}_{1},{\bm{x}}_{2})-(k\mathcal{L}^{% \prime})({\bm{x}}_{1},\tilde{{\bm{X}}})^{\top}\left((\mathcal{L}k\mathcal{L}^{% \prime})(\tilde{{\bm{X}}},\tilde{{\bm{X}}})+{\bm{\Sigma}}\right)^{\dagger}(% \mathcal{L}k)(\tilde{{\bm{X}}},{\bm{x}}_{2})italic_k start_POSTSUPERSCRIPT roman_f | bold_italic_y end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≔ italic_k ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - ( italic_k caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over~ start_ARG bold_italic_X end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( ( caligraphic_L italic_k caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( over~ start_ARG bold_italic_X end_ARG , over~ start_ARG bold_italic_X end_ARG ) + bold_Σ ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( caligraphic_L italic_k ) ( over~ start_ARG bold_italic_X end_ARG , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) (4.11)

If additionally 𝕏~=𝕏~𝕏𝕏\tilde{{\mathbb{X}}}={\mathbb{X}}over~ start_ARG blackboard_X end_ARG = blackboard_X, then

(f[f])𝒢𝒫((m[m]),(kkkk)).similar-tomatrixfdelimited-[]f𝒢𝒫matrix𝑚delimited-[]𝑚matrix𝑘𝑘superscript𝑘𝑘superscript\begin{pmatrix}{\mathrm{f}}\\ \mathcal{L}[{\mathrm{f}}]\end{pmatrix}\sim{\operatorname{\mathcal{GP}}\left(% \begin{pmatrix}m\\ \mathcal{L}[m]\end{pmatrix},\begin{pmatrix}k&k\mathcal{L}^{\prime}\\ \mathcal{L}k&\mathcal{L}k\mathcal{L}^{\prime}\end{pmatrix}\right)}.( start_ARG start_ROW start_CELL roman_f end_CELL end_ROW start_ROW start_CELL caligraphic_L [ roman_f ] end_CELL end_ROW end_ARG ) ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( ( start_ARG start_ROW start_CELL italic_m end_CELL end_ROW start_ROW start_CELL caligraphic_L [ italic_m ] end_CELL end_ROW end_ARG ) , ( start_ARG start_ROW start_CELL italic_k end_CELL start_CELL italic_k caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL caligraphic_L italic_k end_CELL start_CELL caligraphic_L italic_k caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) ) . (4.12)
Remark 6.

The assumptions about \mathcal{L}caligraphic_L from corollary 2 are fulfilled if 𝔹~~𝔹\tilde{{\mathbb{B}}}over~ start_ARG blackboard_B end_ARG is an RKBS and \mathcal{L}caligraphic_L is bounded. However, these conditions are not necessary.

Corollary 2 is the theoretical basis for most of section 3.2. For =id𝔹subscriptid𝔹\mathcal{L}=\operatorname{id}_{{\mathbb{B}}}caligraphic_L = roman_id start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT, we recover standard GP regression as a special case. Finally, both Theorem 1 and Corollary 2 apply also to vector-valued Gaussian processes.

Remark 7 (Multi-Output Gaussian Processes).

Theorems 1 and 2 also apply to multi-output GPs 𝐟𝐟{\bm{\mathrm{f}}}bold_f. In this case, we interpret the sample paths 𝐟(,ω):𝕏d:𝐟𝜔𝕏superscriptsuperscript𝑑{\bm{\mathrm{f}}}(\cdot,\omega)\colon{\mathbb{X}}\to\mathbb{R}^{d^{\prime}}bold_f ( ⋅ , italic_ω ) : blackboard_X → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT of the multi-output GP as sample paths f~(,ω):I×𝕏,f~((i,𝐱),ω)fi(𝐱,ω):~f𝜔formulae-sequence𝐼𝕏~f𝑖𝐱𝜔subscriptf𝑖𝐱𝜔\tilde{{\mathrm{f}}}(\cdot,\omega)\colon I\times{\mathbb{X}}\to\mathbb{R},\ % \tilde{{\mathrm{f}}}((i,{\bm{x}}),\omega)\coloneqq{\mathrm{f}}_{i}({\bm{x}},\omega)over~ start_ARG roman_f end_ARG ( ⋅ , italic_ω ) : italic_I × blackboard_X → blackboard_R , over~ start_ARG roman_f end_ARG ( ( italic_i , bold_italic_x ) , italic_ω ) ≔ roman_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ω ) of a regular GP with index set I×𝕏𝐼𝕏I\times{\mathbb{X}}\to\mathbb{R}italic_I × blackboard_X → blackboard_R, where I={1,,d}𝐼1superscript𝑑I=\{1,\dotsc,d^{\prime}\}italic_I = { 1 , … , italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } (see section 2.2). We also generalize notation like 𝓛𝐤𝓛𝓛𝐤superscript𝓛{\bm{\mathcal{L}}}{\bm{k}}{\bm{\mathcal{L}}}^{\prime}bold_caligraphic_L bold_italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT accordingly.

5 Conclusion

In this work, we developed a probabilistic framework for the solution of (systems of) linear partial differential equations, which can be interpreted as physics-informed Gaussian process regression. It enables the seamless fusion of (1) a-priori known, provable properties of the system of interest, (2) exact and partial mechanistic information, (3) subjective domain expertise, as well as, (4) noisy empirical measurements into a unified scientific model. This model outputs a consistent uncertainty estimate, which quantifies the inherent approximation error in addition to the uncertainty arising from partially-known physics, as well as limited-precision measurements. Our framework fundamentally relies on the closure of Gaussian processes under conditioning on observations of their sample paths through an arbitrary bounded linear operator. While this result has been used ubiquitously in the literature, a rigorous proof for linear operator observations, as needed in the PDE setting, did not exist prior to this work to the best of our knowledge. Our work generalizes and unifies several related formulations of GP-PDE inference. Importantly, our formulation extends these ideas to virtually all popular methods for PDE simulation, revealing them to be a form of Gaussian process inference and in turn clarifying the underlying (probabilistic) assumptions. More specifically, by choosing a specific prior and information operator in our framework, it recovers methods of weighted residuals, a popular family of numerical methods for the solution of (linear) PDEs, which includes generalized Galerkin methods such as finite element and spectral methods. This demonstrates that classical linear PDE solvers can be generalized in their functionality to include approximate input data and equipped with a structured uncertainty estimate. Our work outlines a general framework for the integration of mechanistic building blocks in the form of information operators derived from e.g. linear PDEs into probabilistic models. Our case study shows that the language of information operators is a powerful toolkit for aggregating heterogeneous sources of partial information in a joint probabilistic model, especially in the context of physics-informed machine learning. This opens up several interesting lines of research. For example, the choice of prior and information operator are not fixed and can be specifically chosen for the problem at hand. The design of adaptive information operators, which actively collect information based on the current belief about the solution could prove to be a promising research direction. Further, the uncertainty estimate about the solution could be used to inform experimental design choices. For example, in the case study from Section 3.2, the posterior belief can be used to optimize the locations of the digital thermal sensors in future CPU designs. Finally, it remains an open question whether this framework can be adapted to nonlinear partial differential equations in a similar manner to how many classic methods solve a sequence of linearized problems to approximate the solution of a nonlinear PDE.


Acknowledgments

MP, PH and JW gratefully acknowledge financial support by the European Research Council through ERC StG Action 757275 / PANAMA; the DFG Cluster of Excellence “Machine Learning - New Perspectives for Science”, EXC 2064/1, project number 390727645; the German Federal Ministry of Education and Research (BMBF) through the Tübingen AI Center (FKZ: 01IS18039A); and funds from the Ministry of Science, Research and Arts of the State of Baden-Württemberg. The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting MP and JW. IS thanks the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) for supporting this work by funding EXC 2075-390740016 under Germany’s Excellence Strategy. IS also acknowledges support by the Stuttgart Center for Simulation Science (SimTech).

Finally, the authors are grateful to Nathaël Da Costa and Filip Tronarp for many invaluable discussions concerning the theoretical part of this work.

A Proofs for Section 3.3

  • Proof of proposition 3

    By theorem 1, we have

    mi𝐮\nonscript|\nonscript𝑫^,𝒇^(𝒙)subscriptsuperscript𝑚conditional𝐮\nonscript\nonscript^𝑫^𝒇𝑖𝒙\displaystyle{m}^{{\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>% \mathopen{}\hat{{\bm{D}}},\hat{{\bm{f}}}}_{i}({\bm{x}})italic_m start_POSTSUPERSCRIPT bold_u | over^ start_ARG bold_italic_D end_ARG , over^ start_ARG bold_italic_f end_ARG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) =mi(𝒙)+(𝑫^𝓟m)[𝒌:,i(,𝒙)]((𝑫^𝓟m)𝒌(𝑫^𝓟m))1(𝒇^𝑫^𝓟m[𝒎])absentsubscript𝑚𝑖𝒙^𝑫subscript𝓟superscript𝑚superscriptdelimited-[]subscript𝒌:𝑖𝒙topsuperscript^𝑫subscript𝓟superscript𝑚𝒌superscript^𝑫subscript𝓟superscript𝑚1^𝒇^𝑫subscript𝓟superscript𝑚delimited-[]𝒎\displaystyle={m}_{i}({\bm{x}})+(\hat{{\bm{D}}}{\bm{\mathcal{P}}}_{\mathbb{R}^% {m}})[{\bm{k}}_{:,i}(\cdot,{\bm{x}})]^{\top}\left((\hat{{\bm{D}}}{\bm{\mathcal% {P}}}_{\mathbb{R}^{m}}){\bm{k}}(\hat{{\bm{D}}}{\bm{\mathcal{P}}}_{\mathbb{R}^{% m}})^{\prime}\right)^{-1}\left(\hat{{\bm{f}}}-\hat{{\bm{D}}}{\bm{\mathcal{P}}}% _{\mathbb{R}^{m}}[{\bm{m}}]\right)= italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) + ( over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) [ bold_italic_k start_POSTSUBSCRIPT : , italic_i end_POSTSUBSCRIPT ( ⋅ , bold_italic_x ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( ( over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) bold_italic_k ( over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_f end_ARG - over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ] )
    =mi(𝒙)+𝓟m[𝒌:,i(,𝒙)]𝑫^(𝑫^𝚺𝐜𝑫^)1𝑫^(𝑫^1𝒇^𝓟m[𝒎])absentsubscript𝑚𝑖𝒙subscript𝓟superscript𝑚superscriptdelimited-[]subscript𝒌:𝑖𝒙topsuperscript^𝑫topsuperscript^𝑫subscript𝚺𝐜superscript^𝑫top1^𝑫superscript^𝑫1^𝒇subscript𝓟superscript𝑚delimited-[]𝒎\displaystyle={m}_{i}({\bm{x}})+{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{k}}_{% :,i}(\cdot,{\bm{x}})]^{\top}\hat{{\bm{D}}}^{\top}\left(\hat{{\bm{D}}}{\bm{% \Sigma}}_{\bm{\mathrm{c}}}\hat{{\bm{D}}}^{\top}\right)^{-1}\hat{{\bm{D}}}\left% (\hat{{\bm{D}}}^{-1}\hat{{\bm{f}}}-{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{m}% }]\right)= italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) + bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_k start_POSTSUBSCRIPT : , italic_i end_POSTSUBSCRIPT ( ⋅ , bold_italic_x ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_D end_ARG bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_D end_ARG ( over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_f end_ARG - bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ] )
    =mi(𝒙)+𝓟m[𝒌:,i(,𝒙)]𝚺𝐜1(𝑫^1𝒇^𝓟m[𝒎]).absentsubscript𝑚𝑖𝒙subscript𝓟superscript𝑚superscriptdelimited-[]subscript𝒌:𝑖𝒙topsuperscriptsubscript𝚺𝐜1superscript^𝑫1^𝒇subscript𝓟superscript𝑚delimited-[]𝒎\displaystyle={m}_{i}({\bm{x}})+{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{k}}_{% :,i}(\cdot,{\bm{x}})]^{\top}{\bm{\Sigma}}_{\bm{\mathrm{c}}}^{-1}\left(\hat{{% \bm{D}}}^{-1}\hat{{\bm{f}}}-{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{m}}]% \right).= italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) + bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_k start_POSTSUBSCRIPT : , italic_i end_POSTSUBSCRIPT ( ⋅ , bold_italic_x ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_f end_ARG - bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ] ) .

    Since 𝒫𝕌^subscript𝒫^𝕌\mathcal{P}_{\hat{{\mathbb{U}}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT is a bounded projection, we have 𝕌=ran(𝒫𝕌^)ker(𝒫𝕌^)=𝕌^ker(𝒫𝕌^)𝕌direct-sumransubscript𝒫^𝕌kersubscript𝒫^𝕌direct-sum^𝕌kersubscript𝒫^𝕌{\mathbb{U}}=\operatorname{ran}(\mathcal{P}_{\hat{{\mathbb{U}}}})\oplus% \operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})=\hat{{\mathbb{U}}}\oplus% \operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})blackboard_U = roman_ran ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) ⊕ roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) = over^ start_ARG blackboard_U end_ARG ⊕ roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) (see Rudin, 1991, Section 5.16), where each 𝒖𝕌𝒖𝕌{\bm{u}}\in{\mathbb{U}}bold_italic_u ∈ blackboard_U decomposes uniquely into 𝒖=𝒖𝕌^+𝒖ker(𝒫𝕌^)𝒖subscript𝒖^𝕌subscript𝒖kersubscript𝒫^𝕌{\bm{u}}={\bm{u}}_{\hat{{\mathbb{U}}}}+{\bm{u}}_{\operatorname{ker}(\mathcal{P% }_{\hat{{\mathbb{U}}}})}bold_italic_u = bold_italic_u start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT + bold_italic_u start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT with 𝒖𝕌^𝕌^subscript𝒖^𝕌^𝕌{\bm{u}}_{\hat{{\mathbb{U}}}}\in\hat{{\mathbb{U}}}bold_italic_u start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ∈ over^ start_ARG blackboard_U end_ARG and 𝒖ker(𝒫𝕌^)ker(𝒫𝕌^)subscript𝒖kersubscript𝒫^𝕌kersubscript𝒫^𝕌{\bm{u}}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})}\in% \operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})bold_italic_u start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∈ roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ). It is clear that 𝒖𝕌^=𝒫𝕌^[𝒖],subscript𝒖^𝕌subscript𝒫^𝕌delimited-[]𝒖{\bm{u}}_{\hat{{\mathbb{U}}}}=\mathcal{P}_{\hat{{\mathbb{U}}}}[{\bm{u}}],bold_italic_u start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT = caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT [ bold_italic_u ] , and 𝒖ker(𝒫𝕌^)=(id𝒫𝕌^)[𝒖]=𝒫ker(𝒫𝕌^)[𝒖].subscript𝒖kersubscript𝒫^𝕌idsubscript𝒫^𝕌delimited-[]𝒖subscript𝒫kersubscript𝒫^𝕌delimited-[]𝒖{\bm{u}}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})}=\left(% \operatorname{id}-\mathcal{P}_{\hat{{\mathbb{U}}}}\right)[{\bm{u}}]=\mathcal{P% }_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})}[{\bm{u}}].bold_italic_u start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT = ( roman_id - caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) [ bold_italic_u ] = caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ bold_italic_u ] . This implies

    𝓟m[𝒎𝐮\nonscript|\nonscript𝑫^,𝒇^]subscript𝓟superscript𝑚delimited-[]superscript𝒎conditional𝐮\nonscript\nonscript^𝑫^𝒇\displaystyle{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{m}}^{{\bm{\mathrm{u}}}% \nonscript\>|\allowbreak\nonscript\>\mathopen{}\hat{{\bm{D}}},\hat{{\bm{f}}}}]bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m start_POSTSUPERSCRIPT bold_u | over^ start_ARG bold_italic_D end_ARG , over^ start_ARG bold_italic_f end_ARG end_POSTSUPERSCRIPT ] =𝓟m[𝒎]+𝓟m𝒌𝓟mmissing=𝚺𝐜𝚺𝐜1(𝑫^1𝒇^𝓟m[𝒎])absentsubscript𝓟superscript𝑚delimited-[]𝒎subscriptsubscript𝓟superscript𝑚𝒌superscriptsubscript𝓟superscript𝑚missingabsentsubscript𝚺𝐜superscriptsubscript𝚺𝐜1superscript^𝑫1^𝒇subscript𝓟superscript𝑚delimited-[]𝒎\displaystyle={\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{m}}]+\underbrace{{\bm{% \mathcal{P}}}_{\mathbb{R}^{m}}{\bm{k}}{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}^{% \prime}missing}_{={\bm{\Sigma}}_{\bm{\mathrm{c}}}}{\bm{\Sigma}}_{\bm{\mathrm{c% }}}^{-1}\left(\hat{{\bm{D}}}^{-1}\hat{{\bm{f}}}-{\bm{\mathcal{P}}}_{\mathbb{R}% ^{m}}[{\bm{m}}]\right)= bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ] + under⏟ start_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_italic_k bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_missing end_ARG start_POSTSUBSCRIPT = bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_f end_ARG - bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ] )
    =𝓟m[𝒎]+𝑫^1𝒇^𝓟m[𝒎]absentsubscript𝓟superscript𝑚delimited-[]𝒎superscript^𝑫1^𝒇subscript𝓟superscript𝑚delimited-[]𝒎\displaystyle={\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{m}}]+\hat{{\bm{D}}}^{-1% }\hat{{\bm{f}}}-{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{m}}]= bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ] + over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_f end_ARG - bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ]
    =𝑫^1𝒇^=𝒄MWR.absentsuperscript^𝑫1^𝒇superscript𝒄MWR\displaystyle=\hat{{\bm{D}}}^{-1}\hat{{\bm{f}}}={\bm{c}}^{\mathrm{MWR}}.= over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_f end_ARG = bold_italic_c start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT .

    Hence, we have

    𝒫𝕌^[𝒎𝐮\nonscript|\nonscript𝑫^,𝒇^]=i=1m(𝒫n[𝒎𝐮\nonscript|\nonscript𝑫^,𝒇^])iϕ(i)=i=1mciMWRϕ(i)=𝒖MWR𝕌^subscript𝒫^𝕌delimited-[]superscript𝒎conditional𝐮\nonscript\nonscript^𝑫^𝒇superscriptsubscript𝑖1𝑚subscriptsubscript𝒫superscript𝑛delimited-[]superscript𝒎conditional𝐮\nonscript\nonscript^𝑫^𝒇𝑖superscriptbold-italic-ϕ𝑖superscriptsubscript𝑖1𝑚subscriptsuperscript𝑐MWR𝑖superscriptbold-italic-ϕ𝑖superscript𝒖MWR^𝕌\displaystyle\mathcal{P}_{\hat{{\mathbb{U}}}}[{\bm{m}}^{{\bm{\mathrm{u}}}% \nonscript\>|\allowbreak\nonscript\>\mathopen{}\hat{{\bm{D}}},\hat{{\bm{f}}}}]% =\sum_{i=1}^{m}\left(\mathcal{P}_{\mathbb{R}^{n}}[{\bm{m}}^{{\bm{\mathrm{u}}}% \nonscript\>|\allowbreak\nonscript\>\mathopen{}\hat{{\bm{D}}},\hat{{\bm{f}}}}]% \right)_{i}{\bm{\phi}}^{(i)}=\sum_{i=1}^{m}{c}^{\mathrm{MWR}}_{i}{\bm{\phi}}^{% (i)}={\bm{u}}^{\mathrm{MWR}}\in\hat{{\mathbb{U}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT [ bold_italic_m start_POSTSUPERSCRIPT bold_u | over^ start_ARG bold_italic_D end_ARG , over^ start_ARG bold_italic_f end_ARG end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m start_POSTSUPERSCRIPT bold_u | over^ start_ARG bold_italic_D end_ARG , over^ start_ARG bold_italic_f end_ARG end_POSTSUPERSCRIPT ] ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = bold_italic_u start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT ∈ over^ start_ARG blackboard_U end_ARG

    and since 𝕌=𝕌^ker(𝒫𝕌^)𝕌direct-sum^𝕌kersubscript𝒫^𝕌{\mathbb{U}}=\hat{{\mathbb{U}}}\oplus\operatorname{ker}(\mathcal{P}_{\hat{{% \mathbb{U}}}})blackboard_U = over^ start_ARG blackboard_U end_ARG ⊕ roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ), the statement follows. Moreover, 𝒫n[𝒎𝐮\nonscript|\nonscript𝑫^,𝒇^]subscript𝒫superscript𝑛delimited-[]superscript𝒎conditional𝐮\nonscript\nonscript^𝑫^𝒇\mathcal{P}_{\mathbb{R}^{n}}[{\bm{m}}^{{\bm{\mathrm{u}}}\nonscript\>|% \allowbreak\nonscript\>\mathopen{}\hat{{\bm{D}}},\hat{{\bm{f}}}}]caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m start_POSTSUPERSCRIPT bold_u | over^ start_ARG bold_italic_D end_ARG , over^ start_ARG bold_italic_f end_ARG end_POSTSUPERSCRIPT ] is the mean of 𝐜\nonscript|\nonscript𝑫^𝐜𝒇^=𝟎conditional𝐜\nonscript\nonscript^𝑫𝐜^𝒇0{\bm{\mathrm{c}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\hat{{\bm{D}}}% {\bm{\mathrm{c}}}-\hat{{\bm{f}}}={\bm{0}}bold_c | over^ start_ARG bold_italic_D end_ARG bold_c - over^ start_ARG bold_italic_f end_ARG = bold_0 and its covariance matrix is given by

    𝚺𝐜\nonscript|\nonscript𝑫^,𝒇^superscript𝚺conditional𝐜\nonscript\nonscript^𝑫^𝒇\displaystyle{\bm{\Sigma}}^{{\bm{\mathrm{c}}}\nonscript\>|\allowbreak% \nonscript\>\mathopen{}\hat{{\bm{D}}},\hat{{\bm{f}}}}bold_Σ start_POSTSUPERSCRIPT bold_c | over^ start_ARG bold_italic_D end_ARG , over^ start_ARG bold_italic_f end_ARG end_POSTSUPERSCRIPT =𝚺𝐜𝚺𝐜𝑫^(𝑫^𝚺𝐜𝑫^)1𝑫^𝚺𝐜absentsubscript𝚺𝐜subscript𝚺𝐜superscript^𝑫topsuperscript^𝑫subscript𝚺𝐜superscript^𝑫top1^𝑫subscript𝚺𝐜\displaystyle={\bm{\Sigma}}_{\bm{\mathrm{c}}}-{\bm{\Sigma}}_{\bm{\mathrm{c}}}% \hat{{\bm{D}}}^{\top}\left(\hat{{\bm{D}}}{\bm{\Sigma}}_{\bm{\mathrm{c}}}\hat{{% \bm{D}}}^{\top}\right)^{-1}\hat{{\bm{D}}}{\bm{\Sigma}}_{\bm{\mathrm{c}}}= bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT - bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_D end_ARG bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_D end_ARG bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT
    =𝚺𝐜𝚺𝐜𝑫^(𝑫^)1𝚺𝐜1𝑫^1𝑫^𝚺𝐜absentsubscript𝚺𝐜subscript𝚺𝐜superscript^𝑫topsuperscriptsuperscript^𝑫top1superscriptsubscript𝚺𝐜1superscript^𝑫1^𝑫subscript𝚺𝐜\displaystyle={\bm{\Sigma}}_{\bm{\mathrm{c}}}-{\bm{\Sigma}}_{\bm{\mathrm{c}}}% \hat{{\bm{D}}}^{\top}(\hat{{\bm{D}}}^{\top})^{-1}{\bm{\Sigma}}_{\bm{\mathrm{c}% }}^{-1}\hat{{\bm{D}}}^{-1}\hat{{\bm{D}}}{\bm{\Sigma}}_{\bm{\mathrm{c}}}= bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT - bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_D end_ARG bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT
    =𝚺𝐜𝚺𝐜𝚺𝐜1𝚺𝐜=𝟎.absentsubscript𝚺𝐜subscript𝚺𝐜superscriptsubscript𝚺𝐜1subscript𝚺𝐜0\displaystyle={\bm{\Sigma}}_{\bm{\mathrm{c}}}-{\bm{\Sigma}}_{\bm{\mathrm{c}}}{% \bm{\Sigma}}_{\bm{\mathrm{c}}}^{-1}{\bm{\Sigma}}_{\bm{\mathrm{c}}}={\bm{0}}.= bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT - bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT = bold_0 .

    Consequently, 𝐜\nonscript|\nonscript𝑫^𝐜𝒇^=𝟎δ𝒄MWRconditional𝐜\nonscript\nonscript^𝑫𝐜^𝒇0similar-tosubscript𝛿superscript𝒄MWR{\bm{\mathrm{c}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}\hat{{\bm{D}}}% {\bm{\mathrm{c}}}-\hat{{\bm{f}}}={\bm{0}}\sim\delta_{{\bm{c}}^{\mathrm{MWR}}}bold_c | over^ start_ARG bold_italic_D end_ARG bold_c - over^ start_ARG bold_italic_f end_ARG = bold_0 ∼ italic_δ start_POSTSUBSCRIPT bold_italic_c start_POSTSUPERSCRIPT roman_MWR end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. ∎

  • Proof of corollary 4
    𝒫ker(𝒫𝕌^)[𝒎𝐮\nonscript|\nonscript𝑫^,𝒇^](𝒙)subscript𝒫kersubscript𝒫^𝕌delimited-[]superscript𝒎conditional𝐮\nonscript\nonscript^𝑫^𝒇𝒙\displaystyle\mathcal{P}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})% }[{\bm{m}}^{{\bm{\mathrm{u}}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}% \hat{{\bm{D}}},\hat{{\bm{f}}}}]({\bm{x}})caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ bold_italic_m start_POSTSUPERSCRIPT bold_u | over^ start_ARG bold_italic_D end_ARG , over^ start_ARG bold_italic_f end_ARG end_POSTSUPERSCRIPT ] ( bold_italic_x )
    =\displaystyle=\ = 𝒫ker(𝒫𝕌^)[𝒎]missing=0(𝒙)+(δ𝒙𝒫ker(𝒫𝕌^))𝒌(𝑫^𝓟m)((𝑫^𝓟m)𝒌(𝑫^𝓟m))1(𝒇^𝑫^𝓟m[𝒎])subscriptsubscript𝒫kersubscript𝒫^𝕌delimited-[]𝒎missingabsent0𝒙subscript𝛿𝒙subscript𝒫kersubscript𝒫^𝕌𝒌superscript^𝑫subscript𝓟superscript𝑚superscript^𝑫subscript𝓟superscript𝑚𝒌superscript^𝑫subscript𝓟superscript𝑚1^𝒇^𝑫subscript𝓟superscript𝑚delimited-[]𝒎\displaystyle\underbrace{\mathcal{P}_{\operatorname{ker}(\mathcal{P}_{\hat{{% \mathbb{U}}}})}[{\bm{m}}]missing}_{=0}({\bm{x}})+(\delta_{\bm{x}}\circ\mathcal% {P}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})}){\bm{k}}(\hat{{\bm{% D}}}{\bm{\mathcal{P}}}_{\mathbb{R}^{m}})^{\prime}\left((\hat{{\bm{D}}}{\bm{% \mathcal{P}}}_{\mathbb{R}^{m}}){\bm{k}}(\hat{{\bm{D}}}{\bm{\mathcal{P}}}_{% \mathbb{R}^{m}})^{\prime}\right)^{-1}\left(\hat{{\bm{f}}}-\hat{{\bm{D}}}{\bm{% \mathcal{P}}}_{\mathbb{R}^{m}}[{\bm{m}}]\right)under⏟ start_ARG caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ bold_italic_m ] roman_missing end_ARG start_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT ( bold_italic_x ) + ( italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∘ caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ) bold_italic_k ( over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( ( over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) bold_italic_k ( over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_f end_ARG - over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ] )
    =\displaystyle=\ = δ𝒙[𝒫ker(𝒫𝕌^)𝒌𝓟m=𝟎]𝑫^((𝑫^𝓟m)𝒌(𝑫^𝓟m))1(𝒇^𝑫^𝓟m[𝒎])=𝟎subscript𝛿𝒙delimited-[]subscriptsubscript𝒫kersubscript𝒫^𝕌𝒌superscriptsubscript𝓟superscript𝑚absent0superscript^𝑫topsuperscript^𝑫subscript𝓟superscript𝑚𝒌superscript^𝑫subscript𝓟superscript𝑚1^𝒇^𝑫subscript𝓟superscript𝑚delimited-[]𝒎0\displaystyle\delta_{\bm{x}}[\underbrace{\mathcal{P}_{\operatorname{ker}(% \mathcal{P}_{\hat{{\mathbb{U}}}})}{\bm{k}}{\bm{\mathcal{P}}}_{\mathbb{R}^{m}}^% {\prime}}_{={\bm{0}}}]\hat{{\bm{D}}}^{\top}\left((\hat{{\bm{D}}}{\bm{\mathcal{% P}}}_{\mathbb{R}^{m}}){\bm{k}}(\hat{{\bm{D}}}{\bm{\mathcal{P}}}_{\mathbb{R}^{m% }})^{\prime}\right)^{-1}\left(\hat{{\bm{f}}}-\hat{{\bm{D}}}{\bm{\mathcal{P}}}_% {\mathbb{R}^{m}}[{\bm{m}}]\right)={\bm{0}}italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ under⏟ start_ARG caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT bold_italic_k bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT = bold_0 end_POSTSUBSCRIPT ] over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( ( over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) bold_italic_k ( over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_f end_ARG - over^ start_ARG bold_italic_D end_ARG bold_caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ bold_italic_m ] ) = bold_0

  • Proof of proposition 5

    The process 𝐮𝐮{\bm{\mathrm{u}}}bold_u can be constructed as the sum of independent samples from the processes 𝒫𝕌^[𝐮~]subscript𝒫^𝕌delimited-[]~𝐮\mathcal{P}_{\hat{{\mathbb{U}}}}[\tilde{{\bm{\mathrm{u}}}}]caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT [ over~ start_ARG bold_u end_ARG ] and 𝒫𝕌^[𝐮~𝒎~]subscript𝒫^𝕌delimited-[]~𝐮~𝒎\mathcal{P}_{\hat{{\mathbb{U}}}}[\tilde{{\bm{\mathrm{u}}}}-\tilde{{\bm{m}}}]caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT [ over~ start_ARG bold_u end_ARG - over~ start_ARG bold_italic_m end_ARG ]. This proves that the sample paths lie in 𝕌𝕌{\mathbb{U}}blackboard_U. Since 𝒫𝕌^subscript𝒫^𝕌\mathcal{P}_{\hat{{\mathbb{U}}}}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT is idempotent, we have 𝒫ker(𝒫𝕌^)𝒫𝕌^=𝒫𝕌^𝒫𝕌^2=𝒫𝕌^𝒫𝕌^=0subscript𝒫kersubscript𝒫^𝕌subscript𝒫^𝕌subscript𝒫^𝕌superscriptsubscript𝒫^𝕌2subscript𝒫^𝕌subscript𝒫^𝕌0\mathcal{P}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})}\mathcal{P}_% {\hat{{\mathbb{U}}}}=\mathcal{P}_{\hat{{\mathbb{U}}}}-\mathcal{P}_{\hat{{% \mathbb{U}}}}^{2}=\mathcal{P}_{\hat{{\mathbb{U}}}}-\mathcal{P}_{\hat{{\mathbb{% U}}}}=0caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT = caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT - caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT - caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT = 0 and 𝒫n𝒫𝕌^=(m𝕌^)1𝒫𝕌^2=(m𝕌^)1𝒫𝕌^=𝒫n.subscript𝒫superscript𝑛subscript𝒫^𝕌superscriptsuperscriptsubscriptsuperscript𝑚^𝕌1superscriptsubscript𝒫^𝕌2superscriptsuperscriptsubscriptsuperscript𝑚^𝕌1subscript𝒫^𝕌subscript𝒫superscript𝑛\mathcal{P}_{\mathbb{R}^{n}}\mathcal{P}_{\hat{{\mathbb{U}}}}=(\mathcal{I}_{% \mathbb{R}^{m}}^{\hat{{\mathbb{U}}}})^{-1}\mathcal{P}_{\hat{{\mathbb{U}}}}^{2}% =(\mathcal{I}_{\mathbb{R}^{m}}^{\hat{{\mathbb{U}}}})^{-1}\mathcal{P}_{\hat{{% \mathbb{U}}}}=\mathcal{P}_{\mathbb{R}^{n}}.caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT = ( caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( caligraphic_I start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT = caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT . It follows that

    𝒫ker(𝒫𝕌^)𝒌𝒫nsubscript𝒫kersubscript𝒫^𝕌𝒌superscriptsubscript𝒫superscript𝑛\displaystyle\mathcal{P}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})% }{\bm{k}}\mathcal{P}_{\mathbb{R}^{n}}^{*}caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT bold_italic_k caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =𝒫ker(𝒫𝕌^)𝒌~𝒫n𝒫ker(𝒫𝕌^)𝒫𝕌^=0𝒌~absentsubscript𝒫kersubscript𝒫^𝕌~𝒌superscriptsubscript𝒫superscript𝑛subscriptsubscript𝒫kersubscript𝒫^𝕌subscript𝒫^𝕌absent0~𝒌\displaystyle=\mathcal{P}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}}% )}\tilde{{\bm{k}}}\mathcal{P}_{\mathbb{R}^{n}}^{*}-\underbrace{\mathcal{P}_{% \operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})}\mathcal{P}_{\hat{{% \mathbb{U}}}}}_{=0}\tilde{{\bm{k}}}= caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT over~ start_ARG bold_italic_k end_ARG caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - under⏟ start_ARG caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT over~ start_ARG bold_italic_k end_ARG
    𝒫ker(𝒫𝕌^)𝒌~𝒫𝕌^𝒫n+2𝒫ker(𝒫𝕌^)𝒫𝕌^=0𝒌~𝒫𝕌^𝒫nsubscript𝒫kersubscript𝒫^𝕌~𝒌superscriptsubscript𝒫^𝕌superscriptsubscript𝒫superscript𝑛2subscriptsubscript𝒫kersubscript𝒫^𝕌subscript𝒫^𝕌absent0~𝒌superscriptsubscript𝒫^𝕌superscriptsubscript𝒫superscript𝑛\displaystyle\qquad-\mathcal{P}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb% {U}}}})}\tilde{{\bm{k}}}\mathcal{P}_{\hat{{\mathbb{U}}}}^{*}\mathcal{P}_{% \mathbb{R}^{n}}^{*}+2\underbrace{\mathcal{P}_{\operatorname{ker}(\mathcal{P}_{% \hat{{\mathbb{U}}}})}\mathcal{P}_{\hat{{\mathbb{U}}}}}_{=0}\tilde{{\bm{k}}}% \mathcal{P}_{\hat{{\mathbb{U}}}}^{*}\mathcal{P}_{\mathbb{R}^{n}}^{*}- caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT over~ start_ARG bold_italic_k end_ARG caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 2 under⏟ start_ARG caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT over~ start_ARG bold_italic_k end_ARG caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
    =𝒫ker(𝒫𝕌^)𝒌~𝒫n𝒫ker(𝒫𝕌^)𝒌~(𝒫n𝒫𝕌^)=𝒫n=0.absentsubscript𝒫kersubscript𝒫^𝕌~𝒌superscriptsubscript𝒫superscript𝑛subscript𝒫kersubscript𝒫^𝕌~𝒌superscriptsubscriptsuperscriptsubscript𝒫superscript𝑛subscript𝒫^𝕌absentsubscript𝒫superscript𝑛0\displaystyle=\mathcal{P}_{\operatorname{ker}(\mathcal{P}_{\hat{{\mathbb{U}}}}% )}\tilde{{\bm{k}}}\mathcal{P}_{\mathbb{R}^{n}}^{*}-\mathcal{P}_{\operatorname{% ker}(\mathcal{P}_{\hat{{\mathbb{U}}}})}\tilde{{\bm{k}}}\underbrace{\left(% \mathcal{P}_{\mathbb{R}^{n}}\mathcal{P}_{\hat{{\mathbb{U}}}}\right)^{*}}_{=% \mathcal{P}_{\mathbb{R}^{n}}}^{*}=0.= caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT over~ start_ARG bold_italic_k end_ARG caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - caligraphic_P start_POSTSUBSCRIPT roman_ker ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT over~ start_ARG bold_italic_k end_ARG under⏟ start_ARG ( caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT over^ start_ARG blackboard_U end_ARG end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT = caligraphic_P start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 0 .

B Proofs for Section 4

Using the rules of linear-Gaussian inference (Bishop, 2006), we can easily see that

ff\displaystyle{\mathrm{f}}roman_f 𝒢𝒫(m,k)similar-toabsent𝒢𝒫𝑚𝑘\displaystyle\sim{\operatorname{\mathcal{GP}}\left(m,k\right)}∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( italic_m , italic_k )
𝑨f(𝑿)𝑨f𝑿\displaystyle{\bm{A}}{\mathrm{f}}({\bm{X}})bold_italic_A roman_f ( bold_italic_X ) 𝒩(𝑨m(𝑿),𝑨k(𝑿,𝑿)𝑨)similar-toabsent𝒩𝑨𝑚𝑿𝑨𝑘𝑿𝑿superscript𝑨top\displaystyle\sim{\operatorname{\mathcal{N}}\left({\bm{A}}m({\bm{X}}),{\bm{A}}% k({\bm{X}},{\bm{X}}){\bm{A}}^{\top}\right)}∼ caligraphic_N ( bold_italic_A italic_m ( bold_italic_X ) , bold_italic_A italic_k ( bold_italic_X , bold_italic_X ) bold_italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT )
f\nonscript|\nonscript𝑨f(𝑿)+𝐛=𝒚conditionalf\nonscript\nonscript𝑨f𝑿𝐛𝒚\displaystyle{\mathrm{f}}\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{A% }}{\mathrm{f}}({\bm{X}})+{\bm{\mathrm{b}}}={\bm{y}}roman_f | bold_italic_A roman_f ( bold_italic_X ) + bold_b = bold_italic_y 𝒢𝒫(mf𝒚,kf𝒚),similar-toabsent𝒢𝒫superscript𝑚conditionalf𝒚superscript𝑘conditionalf𝒚\displaystyle\sim{\operatorname{\mathcal{GP}}\left(m^{{\mathrm{f}}\mid{\bm{y}}% },k^{{\mathrm{f}}\mid{\bm{y}}}\right)},∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( italic_m start_POSTSUPERSCRIPT roman_f ∣ bold_italic_y end_POSTSUPERSCRIPT , italic_k start_POSTSUPERSCRIPT roman_f ∣ bold_italic_y end_POSTSUPERSCRIPT ) ,

where 𝑨m×n𝑨superscript𝑚𝑛{\bm{A}}\in\mathbb{R}^{m\times n}bold_italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT, 𝑿=(𝒙i)i=1n𝕏𝑿superscriptsubscriptsubscript𝒙𝑖𝑖1𝑛𝕏{\bm{X}}=({\bm{x}}_{i})_{i=1}^{n}\in{\mathbb{X}}bold_italic_X = ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ blackboard_X, 𝐛𝒩(𝝁,𝚺)similar-to𝐛𝒩𝝁𝚺{\bm{\mathrm{b}}}\sim{\operatorname{\mathcal{N}}\left({\bm{\mu}},{\bm{\Sigma}}% \right)}bold_b ∼ caligraphic_N ( bold_italic_μ , bold_Σ ) with 𝐛f{\bm{\mathrm{b}}}\perp\!\!\!\!\perp{\mathrm{f}}bold_b ⟂ ⟂ roman_f and

mf𝒚(𝒙)superscript𝑚conditionalf𝒚𝒙\displaystyle m^{{\mathrm{f}}\mid{\bm{y}}}({\bm{x}})italic_m start_POSTSUPERSCRIPT roman_f ∣ bold_italic_y end_POSTSUPERSCRIPT ( bold_italic_x ) m(𝒙)+k(𝒙,𝑿)𝑨(𝑨k(𝑿,𝑿)𝑨+𝚺)(𝒚(𝑨m+𝝁))absent𝑚𝒙𝑘𝒙𝑿superscript𝑨topsuperscript𝑨𝑘𝑿𝑿superscript𝑨top𝚺𝒚𝑨𝑚𝝁\displaystyle\coloneqq m({\bm{x}})+k({\bm{x}},{\bm{X}}){\bm{A}}^{\top}({\bm{A}% }k({\bm{X}},{\bm{X}}){\bm{A}}^{\top}+{\bm{\Sigma}})^{\dagger}({\bm{y}}-({\bm{A% }}m+{\bm{\mu}}))≔ italic_m ( bold_italic_x ) + italic_k ( bold_italic_x , bold_italic_X ) bold_italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_A italic_k ( bold_italic_X , bold_italic_X ) bold_italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( bold_italic_y - ( bold_italic_A italic_m + bold_italic_μ ) )
kf𝒚(𝒙1,𝒙2)superscript𝑘conditionalf𝒚subscript𝒙1subscript𝒙2\displaystyle k^{{\mathrm{f}}\mid{\bm{y}}}({\bm{x}}_{1},{\bm{x}}_{2})italic_k start_POSTSUPERSCRIPT roman_f ∣ bold_italic_y end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) k(𝒙1,𝒙2)k(𝒙1,𝑿)𝑨(𝑨k(𝑿,𝑿)𝑨+𝚺)𝑨k(𝑿,𝒙2).absent𝑘subscript𝒙1subscript𝒙2𝑘subscript𝒙1𝑿superscript𝑨topsuperscript𝑨𝑘𝑿𝑿superscript𝑨top𝚺𝑨𝑘𝑿subscript𝒙2\displaystyle\coloneqq k({\bm{x}}_{1},{\bm{x}}_{2})-k({\bm{x}}_{1},{\bm{X}}){% \bm{A}}^{\top}({\bm{A}}k({\bm{X}},{\bm{X}}){\bm{A}}^{\top}+{\bm{\Sigma}})^{% \dagger}{\bm{A}}k({\bm{X}},{\bm{x}}_{2}).≔ italic_k ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - italic_k ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_X ) bold_italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_A italic_k ( bold_italic_X , bold_italic_X ) bold_italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_italic_A italic_k ( bold_italic_X , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) .

It is tempting to think that the above also extends to more general linear transformations of ff{\mathrm{f}}roman_f such as differentiation at a point 𝒙𝕏𝒙𝕏{\bm{x}}\in{\mathbb{X}}bold_italic_x ∈ blackboard_X and integration. Unfortunately, this is not the case, since the result from (Bishop, 2006) heavily uses the fact that, by definition, evaluations of the Gaussian process at a finite set of points follow a joint Gaussian distribution. However, differentiation at a point and integration are examples of linear functionals, i.e. linear maps from a vector space of functions to the real numbers, which operate on an (uncountably) infinite subset of random variables.

To generalize the result above to general linear operators 𝓛𝓛{\bm{\mathcal{L}}}bold_caligraphic_L mapping the paths of ff{\mathrm{f}}roman_f into nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, we will take the following route:

  1. 1.

    In section B.2, we will show that, under certain conditions on ff{\mathrm{f}}roman_f and 𝕏𝕏{\mathbb{X}}blackboard_X, the function ωf(,ω)maps-to𝜔f𝜔\omega\mapsto{\mathrm{f}}(\cdot,\omega)italic_ω ↦ roman_f ( ⋅ , italic_ω ) is a Gaussian random variable with values in a separable Banach space 𝔹𝔹{\mathbb{B}}blackboard_B of real-valued functions on 𝕏𝕏{\mathbb{X}}blackboard_X. We introduce Gaussian random variables on separable Banach spaces and their essential properties in section B.1.

  2. 2.

    Under the assumption that 𝓛𝓛{\bm{\mathcal{L}}}bold_caligraphic_L is continuous, we can use the transformation properties of Gaussian random variables on separable Banach spaces (see lemma 11) to show that 𝓛[f]𝓛delimited-[]f{\bm{\mathcal{L}}}[{\mathrm{f}}]bold_caligraphic_L [ roman_f ] and for 𝑿𝕏m𝑿superscript𝕏𝑚{\bm{X}}\in{\mathbb{X}}^{m}bold_italic_X ∈ blackboard_X start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT also

    (f(𝑿)𝓛[f])matrixf𝑿𝓛delimited-[]f\begin{pmatrix}{\mathrm{f}}({\bm{X}})\\ {\bm{\mathcal{L}}}[{\mathrm{f}}]\end{pmatrix}( start_ARG start_ROW start_CELL roman_f ( bold_italic_X ) end_CELL end_ROW start_ROW start_CELL bold_caligraphic_L [ roman_f ] end_CELL end_ROW end_ARG )

    are Gaussian random variables on nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and m+nsuperscript𝑚𝑛\mathbb{R}^{m+n}blackboard_R start_POSTSUPERSCRIPT italic_m + italic_n end_POSTSUPERSCRIPT, respectively.

  3. 3.

    Finally, in section B.3, we can then use the well-known linear-Gaussian inference theorem (Bishop, 2006) to show that f\nonscript|\nonscript𝓛[f]=𝒚\left.{\mathrm{f}}\nonscript\>\middle|\allowbreak\nonscript\>\mathopen{}{\bm{% \mathcal{L}}}[{\mathrm{f}}]={\bm{y}}\right.roman_f | bold_caligraphic_L [ roman_f ] = bold_italic_y is again a Gaussian process.

In the following, (𝔹)𝔹\mathcal{B}\left({\mathbb{B}}\right)caligraphic_B ( blackboard_B ) denotes the Borel σ𝜎\sigmaitalic_σ-algebra generated by the norm topology on a Banach space 𝔹𝔹{\mathbb{B}}blackboard_B.

B.1 Gaussian Measures on Separable Banach Spaces

As stated before, in many cases, the function ωf(,ω)maps-to𝜔f𝜔\omega\mapsto{\mathrm{f}}(\cdot,\omega)italic_ω ↦ roman_f ( ⋅ , italic_ω ) will often turn out to be a Gaussian random variable with values in an infinite-dimensional separable Banach space 𝔹paths(f)pathsf𝔹{\mathbb{B}}\supseteq\operatorname{paths}\left({\mathrm{f}}\right)blackboard_B ⊇ roman_paths ( roman_f ) of real-valued functions on 𝕏𝕏{\mathbb{X}}blackboard_X (see proposition 20).

Definition 8.

Let 𝔹𝔹{\mathbb{B}}blackboard_B be a real separable Banach space. A Borel probability measure μ𝜇\muitalic_μ on (𝔹,(𝔹))𝔹𝔹({\mathbb{B}},\mathcal{B}\left({\mathbb{B}}\right))( blackboard_B , caligraphic_B ( blackboard_B ) ) is called Gaussian if every continuous linear functional l𝔹𝑙superscript𝔹l\in{\mathbb{B}}^{\prime}italic_l ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a univariate Gaussian random variable. A 𝔹𝔹{\mathbb{B}}blackboard_B-valued random variable is called Gaussian if its law is Gaussian.

Just as for Gaussian random variables on Euclidean vector space nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, we can define a mean and covariance (operator) for their counterparts on general separable Banach spaces.

Proposition 9.

Let bb{\mathrm{b}}roman_b be a Gaussian random variable on (Ω,,P)ΩP(\Omega,\mathcal{F},\mathrm{P})( roman_Ω , caligraphic_F , roman_P ) with values in a real separable Banach space 𝔹𝔹{\mathbb{B}}blackboard_B. Then there is a unique mb𝔹subscript𝑚b𝔹m_{\mathrm{b}}\in{\mathbb{B}}italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ∈ blackboard_B such that l[mb]=𝔼b[l[b]]𝑙delimited-[]subscript𝑚bsubscript𝔼b𝑙delimited-[]bl[m_{\mathrm{b}}]=\operatorname{\mathbb{E}}_{{\mathrm{b}}}\left[l[{\mathrm{b}}% ]\right]italic_l [ italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ] = blackboard_E start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT [ italic_l [ roman_b ] ] for any l𝔹𝑙superscript𝔹l\in{\mathbb{B}}^{\prime}italic_l ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. We refer to mbsubscript𝑚bm_{\mathrm{b}}italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT as the mean (vector) of bb{\mathrm{b}}roman_b. Moreover, there is a unique bounded linear operator 𝒞b:𝔹𝔹:subscript𝒞bsuperscript𝔹𝔹\mathcal{C}_{\mathrm{b}}\colon{\mathbb{B}}^{\prime}\to{\mathbb{B}}caligraphic_C start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT : blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT → blackboard_B such that l2[𝒞b[l1]]=Covb[l1[b],l2[b]]subscript𝑙2delimited-[]subscript𝒞bdelimited-[]subscript𝑙1subscriptCovbsubscript𝑙1delimited-[]bsubscript𝑙2delimited-[]bl_{2}[\mathcal{C}_{\mathrm{b}}[l_{1}]]=\operatorname{Cov}_{{\mathrm{b}}}\left[% l_{1}[{\mathrm{b}}],l_{2}[{\mathrm{b}}]\right]italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ caligraphic_C start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT [ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ] = roman_Cov start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT [ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ roman_b ] , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ roman_b ] ] for any l1,l2𝔹subscript𝑙1subscript𝑙2superscript𝔹l_{1},l_{2}\in{\mathbb{B}}^{\prime}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, the so-called covariance operator of bb{\mathrm{b}}roman_b.

  • Proof

    Fernique’s theorem (Da Prato and Zabczyk, 1992, Theorem 2.7) implies that b𝔹Lp(Ω,P)subscriptdelimited-∥∥b𝔹subscript𝐿𝑝ΩP\lVert{\mathrm{b}}\rVert_{{\mathbb{B}}}\in L_{p}(\Omega,\mathrm{P})∥ roman_b ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT ∈ italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( roman_Ω , roman_P ) for all p1𝑝subscriptabsent1p\in\mathbb{N}_{\geq 1}italic_p ∈ blackboard_N start_POSTSUBSCRIPT ≥ 1 end_POSTSUBSCRIPT. By assumption, bb{\mathrm{b}}roman_b is measurable and 𝔹ran(b)ranb𝔹{\mathbb{B}}\supset\operatorname{ran}({\mathrm{b}})blackboard_B ⊃ roman_ran ( roman_b ) is separable, which means that bb{\mathrm{b}}roman_b is strongly measurable (Yosida, 1995, Section V.4, Pettis’ Theorem). Since b𝔹L1(Ω,P)subscriptdelimited-∥∥b𝔹subscript𝐿1ΩP\lVert{\mathrm{b}}\rVert_{{\mathbb{B}}}\in L_{1}(\Omega,\mathrm{P})∥ roman_b ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT ∈ italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( roman_Ω , roman_P ), it follows that bb{\mathrm{b}}roman_b is Bochner integrable (Yosida, 1995, Section V.5, Theorem 1). Let l𝔹𝑙superscript𝔹l\in{\mathbb{B}}^{\prime}italic_l ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. By Corollary 2 in Yosida (1995, Section V.5) we then have

    𝔼b[l[b]]=Ωl[b(ω)]dP(ω)=l[Ωb(ω)dP(ω)mb].subscript𝔼b𝑙delimited-[]bsubscriptΩ𝑙delimited-[]b𝜔dP𝜔𝑙delimited-[]subscriptsubscriptΩb𝜔dP𝜔absentsubscript𝑚b\displaystyle\operatorname{\mathbb{E}}_{{\mathrm{b}}}\left[l[{\mathrm{b}}]% \right]=\int_{\Omega}l\left[{\mathrm{b}}(\omega)\right]\,\mathrm{d}\mathrm{P}% \left(\omega\right)=l\bigg{[}\underbrace{\int_{\Omega}{\mathrm{b}}(\omega)\,% \mathrm{d}\mathrm{P}\left(\omega\right)}_{\eqqcolon m_{\mathrm{b}}}\bigg{]}.blackboard_E start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT [ italic_l [ roman_b ] ] = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_l [ roman_b ( italic_ω ) ] roman_dP ( italic_ω ) = italic_l [ under⏟ start_ARG ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT roman_b ( italic_ω ) roman_dP ( italic_ω ) end_ARG start_POSTSUBSCRIPT ≕ italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] .

    Now assume that there is another m~b𝔹subscript~𝑚b𝔹\tilde{m}_{\mathrm{b}}\in{\mathbb{B}}over~ start_ARG italic_m end_ARG start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ∈ blackboard_B with l[m~b]=𝔼b[l[b]].𝑙delimited-[]subscript~𝑚bsubscript𝔼b𝑙delimited-[]bl[\tilde{m}_{\mathrm{b}}]=\operatorname{\mathbb{E}}_{{\mathrm{b}}}\left[l[{% \mathrm{b}}]\right].italic_l [ over~ start_ARG italic_m end_ARG start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ] = blackboard_E start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT [ italic_l [ roman_b ] ] . Then

    0=𝔼b[l[b]]𝔼b[l[b]]=l[m~b]l[mb]=l[m~bmb]0subscript𝔼b𝑙delimited-[]bsubscript𝔼b𝑙delimited-[]b𝑙delimited-[]subscript~𝑚b𝑙delimited-[]subscript𝑚b𝑙delimited-[]subscript~𝑚bsubscript𝑚b0=\operatorname{\mathbb{E}}_{{\mathrm{b}}}\left[l[{\mathrm{b}}]\right]-% \operatorname{\mathbb{E}}_{{\mathrm{b}}}\left[l[{\mathrm{b}}]\right]=l[\tilde{% m}_{\mathrm{b}}]-l[m_{\mathrm{b}}]=l[\tilde{m}_{\mathrm{b}}-m_{\mathrm{b}}]0 = blackboard_E start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT [ italic_l [ roman_b ] ] - blackboard_E start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT [ italic_l [ roman_b ] ] = italic_l [ over~ start_ARG italic_m end_ARG start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ] - italic_l [ italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ] = italic_l [ over~ start_ARG italic_m end_ARG start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ]

    and since this holds for all l𝔹𝑙superscript𝔹l\in{\mathbb{B}}^{\prime}italic_l ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, it follows that m~b=mbsubscript~𝑚bsubscript𝑚b\tilde{m}_{\mathrm{b}}=m_{\mathrm{b}}over~ start_ARG italic_m end_ARG start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT = italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT.

    Let l1𝔹subscript𝑙1superscript𝔹l_{1}\in{\mathbb{B}}^{\prime}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Then ωl1[b(ω)mb](b(ω)mb)maps-to𝜔subscript𝑙1delimited-[]b𝜔subscript𝑚bb𝜔subscript𝑚b\omega\mapsto l_{1}[{\mathrm{b}}(\omega)-m_{\mathrm{b}}]({\mathrm{b}}(\omega)-% m_{\mathrm{b}})italic_ω ↦ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ] ( roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ) is clearly weakly measurable and, since 𝔹𝔹{\mathbb{B}}blackboard_B is separable, also strongly measurable (Yosida, 1995, Section V.4, Pettis’ Theorem). As above, Fernique’s theorem shows that b𝔹L2(Ω,P)subscriptdelimited-∥∥b𝔹subscript𝐿2ΩP\lVert{\mathrm{b}}\rVert_{{\mathbb{B}}}\in L_{2}(\Omega,\mathrm{P})∥ roman_b ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT ∈ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_Ω , roman_P ). By the triangle inequality in 𝔹𝔹{\mathbb{B}}blackboard_B and the fact that PP\mathrm{P}roman_P is a probability measure, we also have b()mb𝔹L2(Ω,P)subscriptdelimited-∥∥bsubscript𝑚b𝔹subscript𝐿2ΩP\lVert{\mathrm{b}}(\cdot)-m_{\mathrm{b}}\rVert_{{\mathbb{B}}}\in L_{2}(\Omega,% \mathrm{P})∥ roman_b ( ⋅ ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT ∈ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_Ω , roman_P ). It follows that

    Ωl1[b(ω)mb](b(ω)mb)𝔹dP(ω)subscriptΩsubscriptdelimited-∥∥subscript𝑙1delimited-[]b𝜔subscript𝑚bb𝜔subscript𝑚b𝔹dP𝜔\displaystyle\int_{\Omega}\lVert l_{1}[{\mathrm{b}}(\omega)-m_{\mathrm{b}}]({% \mathrm{b}}(\omega)-m_{\mathrm{b}})\rVert_{{\mathbb{B}}}\,\mathrm{d}\mathrm{P}% \left(\omega\right)∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT ∥ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ] ( roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT roman_dP ( italic_ω ) =Ω|l1[b(ω)mb]|b(ω)mb𝔹dP(ω)absentsubscriptΩsubscript𝑙1delimited-[]b𝜔subscript𝑚bsubscriptdelimited-∥∥b𝜔subscript𝑚b𝔹dP𝜔\displaystyle=\int_{\Omega}\left\lvert l_{1}[{\mathrm{b}}(\omega)-m_{\mathrm{b% }}]\right\rvert\lVert{\mathrm{b}}(\omega)-m_{\mathrm{b}}\rVert_{{\mathbb{B}}}% \,\mathrm{d}\mathrm{P}\left(\omega\right)= ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT | italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ] | ∥ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT roman_dP ( italic_ω )
    l1𝔹Ωb(ω)mb𝔹b(ω)mb𝔹dP(ω)absentsubscriptdelimited-∥∥subscript𝑙1superscript𝔹subscriptΩsubscriptdelimited-∥∥b𝜔subscript𝑚b𝔹subscriptdelimited-∥∥b𝜔subscript𝑚b𝔹dP𝜔\displaystyle\leq\lVert l_{1}\rVert_{{\mathbb{B}}^{\prime}}\int_{\Omega}\lVert% {\mathrm{b}}(\omega)-m_{\mathrm{b}}\rVert_{{\mathbb{B}}}\lVert{\mathrm{b}}(% \omega)-m_{\mathrm{b}}\rVert_{{\mathbb{B}}}\,\mathrm{d}\mathrm{P}\left(\omega\right)≤ ∥ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT ∥ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT ∥ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT roman_dP ( italic_ω )
    =l1𝔹ωb(ω)mb𝔹L2(Ω,P)2<,absentsubscriptdelimited-∥∥subscript𝑙1superscript𝔹superscriptsubscriptdelimited-∥∥maps-to𝜔subscriptdelimited-∥∥b𝜔subscript𝑚b𝔹subscript𝐿2ΩP2\displaystyle=\lVert l_{1}\rVert_{{\mathbb{B}}^{\prime}}\big{\lVert}\omega% \mapsto\lVert{\mathrm{b}}(\omega)-m_{\mathrm{b}}\rVert_{{\mathbb{B}}}\rVert_{L% _{2}(\Omega,\mathrm{P})}^{2}<\infty,= ∥ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_ω ↦ ∥ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_Ω , roman_P ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < ∞ ,

    where l1𝔹1<subscriptdelimited-∥∥subscript𝑙1superscriptsubscript𝔹1\lVert l_{1}\rVert_{{\mathbb{B}}_{1}^{\prime}}<\infty∥ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT blackboard_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT < ∞, since l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is continuous. Let l2𝔹subscript𝑙2superscript𝔹l_{2}\in{\mathbb{B}}^{\prime}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Again by Corollary 2 in Yosida (1995, Section V.5), we find that

    Cov[l1[b],l2[b]]Covsubscript𝑙1delimited-[]bsubscript𝑙2delimited-[]b\displaystyle\operatorname{Cov}\left[l_{1}[{\mathrm{b}}],l_{2}[{\mathrm{b}}]\right]roman_Cov [ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ roman_b ] , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ roman_b ] ] =Ωl1[b(ω)mb]l2[b(ω)mb]dP(ω)absentsubscriptΩsubscript𝑙1delimited-[]b𝜔subscript𝑚bsubscript𝑙2delimited-[]b𝜔subscript𝑚bdP𝜔\displaystyle=\int_{\Omega}l_{1}[{\mathrm{b}}(\omega)-m_{\mathrm{b}}]l_{2}[{% \mathrm{b}}(\omega)-m_{\mathrm{b}}]\,\mathrm{d}\mathrm{P}\left(\omega\right)= ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ] italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ] roman_dP ( italic_ω )
    =l2[Ωl1[b(ω)mb](b(ω)mb)𝒞[l1]dP(ω)].absentsubscript𝑙2delimited-[]subscriptΩsubscriptsubscript𝑙1delimited-[]b𝜔subscript𝑚bb𝜔subscript𝑚babsent𝒞delimited-[]subscript𝑙1dP𝜔\displaystyle=l_{2}\bigg{[}\int_{\Omega}\underbrace{l_{1}[{\mathrm{b}}(\omega)% -m_{\mathrm{b}}]({\mathrm{b}}(\omega)-m_{\mathrm{b}})}_{\eqqcolon\mathcal{C}[l% _{1}]}\,\mathrm{d}\mathrm{P}\left(\omega\right)\bigg{]}.= italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT under⏟ start_ARG italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ] ( roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT ≕ caligraphic_C [ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT roman_dP ( italic_ω ) ] .

    𝒞bsubscript𝒞b\mathcal{C}_{\mathrm{b}}caligraphic_C start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT is bounded, since

    𝒞b[l1]𝔹subscriptdelimited-∥∥subscript𝒞bdelimited-[]subscript𝑙1𝔹\displaystyle\lVert\mathcal{C}_{\mathrm{b}}[l_{1}]\rVert_{{\mathbb{B}}}∥ caligraphic_C start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT [ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT Ωl1[b(ω)mb](b(ω)mb)𝔹dP(ω)absentsubscriptΩsubscriptdelimited-∥∥subscript𝑙1delimited-[]b𝜔subscript𝑚bb𝜔subscript𝑚b𝔹dP𝜔\displaystyle\leq\int_{\Omega}\lVert l_{1}[{\mathrm{b}}(\omega)-m_{\mathrm{b}}% ]({\mathrm{b}}(\omega)-m_{\mathrm{b}})\rVert_{{\mathbb{B}}}\,\mathrm{d}\mathrm% {P}\left(\omega\right)≤ ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT ∥ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ] ( roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT roman_dP ( italic_ω )
    l1𝔹ωb(ω)mb𝔹L2(Ω,P)2𝒞b,absentsubscriptdelimited-∥∥subscript𝑙1superscript𝔹subscriptsuperscriptsubscriptdelimited-∥∥maps-to𝜔subscriptdelimited-∥∥b𝜔subscript𝑚b𝔹subscript𝐿2ΩP2absentdelimited-∥∥subscript𝒞b\displaystyle\leq\lVert l_{1}\rVert_{{\mathbb{B}}^{\prime}}\underbrace{\big{% \lVert}\omega\mapsto\lVert{\mathrm{b}}(\omega)-m_{\mathrm{b}}\rVert_{{\mathbb{% B}}}\rVert_{L_{2}(\Omega,\mathrm{P})}^{2}}_{\eqqcolon\lVert\mathcal{C}_{% \mathrm{b}}\rVert},≤ ∥ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT under⏟ start_ARG ∥ italic_ω ↦ ∥ roman_b ( italic_ω ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_Ω , roman_P ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT ≕ ∥ caligraphic_C start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ∥ end_POSTSUBSCRIPT ,

    where 𝒞b<delimited-∥∥subscript𝒞b\lVert\mathcal{C}_{\mathrm{b}}\rVert<\infty∥ caligraphic_C start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ∥ < ∞ because b()mb𝔹L2(Ω,P)subscriptdelimited-∥∥bsubscript𝑚b𝔹subscript𝐿2ΩP\lVert{\mathrm{b}}(\cdot)-m_{\mathrm{b}}\rVert_{{\mathbb{B}}}\in L_{2}(\Omega,% \mathrm{P})∥ roman_b ( ⋅ ) - italic_m start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT ∈ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_Ω , roman_P ). Uniqueness follows from an argument analogous to the one used to prove uniqueness of the mean. ∎

Remark 10.

One can show that the mean and the covariance operator of a Gaussian random variable with values in a separable Banach space identify its law uniquely. Hence, we often write 𝒩(m,𝒞)𝒩𝑚𝒞{\operatorname{\mathcal{N}}\left(m,\mathcal{C}\right)}caligraphic_N ( italic_m , caligraphic_C ) to denote Gaussian measures on separable Banach spaces.

B.1.1 Continuous Affine Transformations

Just as their finite-dimensional counterparts, Gaussian random variables with values in separable Banach spaces are closed under linear transformations as long as they are continuous. Moreover, the expressions for the transformed mean and covariance operator are analogous to the finite-dimensional case. For instance, we will use this result to show that 𝓛[f]𝓛delimited-[]f{\bm{\mathcal{L}}}[{\mathrm{f}}]bold_caligraphic_L [ roman_f ] is an nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT-valued random variable.

Lemma 11.

Let b𝒩(m,𝒞)similar-tob𝒩𝑚𝒞{\mathrm{b}}\sim{\operatorname{\mathcal{N}}\left(m,\mathcal{C}\right)}roman_b ∼ caligraphic_N ( italic_m , caligraphic_C ) be a Gaussian random variable on (Ω,,P)ΩP(\Omega,\mathcal{F},\mathrm{P})( roman_Ω , caligraphic_F , roman_P ) with values in a real separable Banach space 𝔹𝔹{\mathbb{B}}blackboard_B. Let :𝔹𝔹~:𝔹~𝔹\mathcal{L}\colon{\mathbb{B}}\to\tilde{{\mathbb{B}}}caligraphic_L : blackboard_B → over~ start_ARG blackboard_B end_ARG be a bounded linear operator mapping into another real separable Banach space 𝔹~~𝔹\tilde{{\mathbb{B}}}over~ start_ARG blackboard_B end_ARG. Then 𝓛[b]𝒩([m],𝒞).similar-to𝓛delimited-[]b𝒩delimited-[]𝑚𝒞superscript{\bm{\mathcal{L}}}[{\mathrm{b}}]\sim{\operatorname{\mathcal{N}}\left(\mathcal{% L}[m],\mathcal{L}\mathcal{C}\mathcal{L}^{\prime}\right)}.bold_caligraphic_L [ roman_b ] ∼ caligraphic_N ( caligraphic_L [ italic_m ] , caligraphic_L caligraphic_C caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) .

  • Proof

    \mathcal{L}caligraphic_L is continuous and hence Borel measurable, which means that [b]delimited-[]b\mathcal{L}[{\mathrm{b}}]caligraphic_L [ roman_b ] is a 𝔹~~𝔹\tilde{{\mathbb{B}}}over~ start_ARG blackboard_B end_ARG-valued random variable. Moreover, for l~𝔹~~𝑙superscript~𝔹\tilde{l}\in\tilde{{\mathbb{B}}}^{\prime}over~ start_ARG italic_l end_ARG ∈ over~ start_ARG blackboard_B end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we have ll~𝔹𝑙~𝑙superscript𝔹l\coloneqq\tilde{l}\circ\mathcal{L}\in{\mathbb{B}}^{\prime}italic_l ≔ over~ start_ARG italic_l end_ARG ∘ caligraphic_L ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and hence l[b]=l~[[b]]𝑙delimited-[]b~𝑙delimited-[]delimited-[]bl[{\mathrm{b}}]=\tilde{l}[\mathcal{L}[{\mathrm{b}}]]italic_l [ roman_b ] = over~ start_ARG italic_l end_ARG [ caligraphic_L [ roman_b ] ] is Gaussian. This implies that [b]delimited-[]b\mathcal{L}[{\mathrm{b}}]caligraphic_L [ roman_b ] is a 𝔹~~𝔹\tilde{{\mathbb{B}}}over~ start_ARG blackboard_B end_ARG-valued Gaussian random variable. Moreover, we have 𝔼b[l[b]]=l[m]=l~[[m]],subscript𝔼b𝑙delimited-[]b𝑙delimited-[]𝑚~𝑙delimited-[]delimited-[]𝑚\operatorname{\mathbb{E}}_{{\mathrm{b}}}\left[l[{\mathrm{b}}]\right]=l[m]=% \tilde{l}[\mathcal{L}[m]],blackboard_E start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT [ italic_l [ roman_b ] ] = italic_l [ italic_m ] = over~ start_ARG italic_l end_ARG [ caligraphic_L [ italic_m ] ] , i.e. [m]delimited-[]𝑚\mathcal{L}[m]caligraphic_L [ italic_m ] is the mean of [b]delimited-[]b\mathcal{L}[{\mathrm{b}}]caligraphic_L [ roman_b ]. Let l~1,l~2𝔹subscript~𝑙1subscript~𝑙2superscript𝔹\tilde{l}_{1},\tilde{l}_{2}\in{\mathbb{B}}^{\prime}over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and define lil~i𝔹~subscript𝑙𝑖subscript~𝑙𝑖superscript~𝔹l_{i}\coloneqq\tilde{l}_{i}\circ\mathcal{L}\in\tilde{{\mathbb{B}}}^{\prime}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ caligraphic_L ∈ over~ start_ARG blackboard_B end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for i=1,2𝑖12i=1,2italic_i = 1 , 2. Then

    Covb[l1[b],l2[b]]=l2[𝒞[l1]]=(l~2)[𝒞[l~1]]=l~2[(C)[l~1]],subscriptCovbsubscript𝑙1delimited-[]bsubscript𝑙2delimited-[]bsubscript𝑙2delimited-[]𝒞delimited-[]subscript𝑙1subscript~𝑙2delimited-[]𝒞delimited-[]subscript~𝑙1subscript~𝑙2delimited-[]𝐶superscriptdelimited-[]subscript~𝑙1\operatorname{Cov}_{{\mathrm{b}}}\left[l_{1}[{\mathrm{b}}],l_{2}[{\mathrm{b}}]% \right]=l_{2}[\mathcal{C}[l_{1}]]=(\tilde{l}_{2}\circ\mathcal{L})[\mathcal{C}[% \tilde{l}_{1}\circ\mathcal{L}]]=\tilde{l}_{2}[(\mathcal{L}C\mathcal{L}^{\prime% })[\tilde{l}_{1}]],roman_Cov start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT [ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ roman_b ] , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ roman_b ] ] = italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ caligraphic_C [ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ] = ( over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∘ caligraphic_L ) [ caligraphic_C [ over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∘ caligraphic_L ] ] = over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ ( caligraphic_L italic_C caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) [ over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ] ,

    i.e. C𝐶superscript\mathcal{L}C\mathcal{L}^{\prime}caligraphic_L italic_C caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the covariance operator of [b]delimited-[]b\mathcal{L}[{\mathrm{b}}]caligraphic_L [ roman_b ]. ∎

B.2 Gaussian Processes as Gaussian Random Functions

We now introduced all necessary preliminaries to show that, under certain assumptions on ff{\mathrm{f}}roman_f and 𝕏𝕏{\mathbb{X}}blackboard_X, the function ωf(,ω)maps-to𝜔f𝜔\omega\mapsto{\mathrm{f}}(\cdot,\omega)italic_ω ↦ roman_f ( ⋅ , italic_ω ) is a Gaussian random variable with values in a special kind of separable Banach space, namely a reproducing kernel Banach space (RKBS):

Definition 12 (Lin et al. 2022, Definition 2.1).

A reproducing kernel Banach space (RKBS) (𝔹,𝔹)𝔹subscriptdelimited-∥∥𝔹({\mathbb{B}},\lVert\cdot\rVert_{{\mathbb{B}}})( blackboard_B , ∥ ⋅ ∥ start_POSTSUBSCRIPT blackboard_B end_POSTSUBSCRIPT ) is a Banach space of real-valued functions on a nonempty set 𝕏𝕏{\mathbb{X}}blackboard_X, on which the point evaluation functionals δ𝐱subscript𝛿𝐱\delta_{\bm{x}}italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT for 𝐱𝕏𝐱𝕏{\bm{x}}\in{\mathbb{X}}bold_italic_x ∈ blackboard_X are continuous.

We formulate theorems 1 and 2, under the following set of assumptions: See 1

Generalizing an observation from Rajput and Cambanis (1972, Remark 1), it becomes clear that assumption 1 is often not about the GP ff{\mathrm{f}}roman_f itself, but rather about the topology of the space 𝔹𝔹{\mathbb{B}}blackboard_B. Denote by sclw(𝕃){𝔹\nonscript|\nonscript{k}k𝕃:kw}subscriptscl𝑤𝕃conditional-setsuperscript𝔹\nonscript:\nonscriptsubscriptsubscript𝑘𝑘𝕃subscript𝑤subscript𝑘\operatorname{scl}_{w*}(\mathbb{L})\coloneqq\{\ell\in{\mathbb{B}}^{\prime}% \nonscript\>|\allowbreak\nonscript\>\mathopen{}\exists\{\ell_{k}\}_{k\in% \mathbb{N}}\subset\mathbb{L}\colon\ell_{k}\to_{w*}\ell\}roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT ( blackboard_L ) ≔ { roman_ℓ ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ∃ { roman_ℓ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ blackboard_N end_POSTSUBSCRIPT ⊂ blackboard_L : roman_ℓ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT roman_ℓ } the weak-* sequential closure101010The weak-* sequential closure of 𝕃𝕃\mathbb{L}blackboard_L is not to be confused with the weak-* closure of 𝕃𝕃\mathbb{L}blackboard_L. The two notions coincide if 𝔹superscript𝔹{\mathbb{B}}^{\prime}blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT equipped with the weak-* topology is a sequential space, but for many of the dual spaces considered in this work this does not hold.of a subset 𝕃𝔹𝕃superscript𝔹\mathbb{L}\subset{\mathbb{B}}^{\prime}blackboard_L ⊂ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of the continuous dual space of a Banach space 𝔹𝔹{\mathbb{B}}blackboard_B. Moreover, sclwn(𝕃)=sclwn1(𝕃)superscriptsubscriptscl𝑤𝑛𝕃superscriptsubscriptscl𝑤𝑛1𝕃\operatorname{scl}_{w*}^{n}(\mathbb{L})=\operatorname{scl}_{w*}^{n-1}(\mathbb{% L})roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( blackboard_L ) = roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( blackboard_L ) and sclw0(𝕃)=𝕃superscriptsubscriptscl𝑤0𝕃𝕃\operatorname{scl}_{w*}^{0}(\mathbb{L})=\mathbb{L}roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( blackboard_L ) = blackboard_L.

Theorem 13.

Let 𝔹𝕏𝔹superscript𝕏{\mathbb{B}}\subset\mathbb{R}^{{\mathbb{X}}}blackboard_B ⊂ blackboard_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT be a real separable RKBS and define 𝕃δspan{δ𝐱}𝐱𝕏𝔹\mathbb{L}_{\delta}\coloneqq\operatorname{span}\{\delta_{\bm{x}}\}_{{\bm{x}}% \in{\mathbb{X}}}\subset{\mathbb{B}}^{\prime}blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ≔ roman_span { italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT } start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_X end_POSTSUBSCRIPT ⊂ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. If there is an n0𝑛subscript0n\in\mathbb{N}_{0}italic_n ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that 𝔹=sclwn(𝕃δ)superscript𝔹superscriptsubscriptscl𝑤𝑛subscript𝕃𝛿{\mathbb{B}}^{\prime}=\operatorname{scl}_{w*}^{n}(\mathbb{L}_{\delta})blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ), then assumption 1 holds for any GP ff{\mathrm{f}}roman_f with paths in 𝔹𝔹{\mathbb{B}}blackboard_B.

  • Proof

    First, we show by induction on n0𝑛subscript0n\in\mathbb{N}_{0}italic_n ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT that ωl[f(,ω)]maps-to𝜔𝑙delimited-[]f𝜔\omega\mapsto l[{\mathrm{f}}(\cdot,\omega)]italic_ω ↦ italic_l [ roman_f ( ⋅ , italic_ω ) ] is a Gaussian random variable for every lsclwn(𝕃δ)𝑙superscriptsubscriptscl𝑤𝑛subscript𝕃𝛿l\in\operatorname{scl}_{w*}^{n}(\mathbb{L}_{\delta})italic_l ∈ roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ). For n=0𝑛0n=0italic_n = 0, we have sclwn(𝕃δ)=𝕃δsuperscriptsubscriptscl𝑤𝑛subscript𝕃𝛿subscript𝕃𝛿\operatorname{scl}_{w*}^{n}(\mathbb{L}_{\delta})=\mathbb{L}_{\delta}roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) = blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT. Hence, for every lsclwn(𝕃δ)=𝕃δ𝑙superscriptsubscriptscl𝑤𝑛subscript𝕃𝛿subscript𝕃𝛿l\in\operatorname{scl}_{w*}^{n}(\mathbb{L}_{\delta})=\mathbb{L}_{\delta}italic_l ∈ roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) = blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT, there are m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N, {αk}k=1msuperscriptsubscriptsubscript𝛼𝑘𝑘1𝑚\{\alpha_{k}\}_{k=1}^{m}\subset\mathbb{R}{ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ⊂ blackboard_R, and {𝒙k}k=1m𝕏superscriptsubscriptsubscript𝒙𝑘𝑘1𝑚𝕏\{{\bm{x}}_{k}\}_{k=1}^{m}\subset{\mathbb{X}}{ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ⊂ blackboard_X such that

    l[f(,ω)]=k=1mαkf(𝒙k,ω).𝑙delimited-[]f𝜔superscriptsubscript𝑘1𝑚subscript𝛼𝑘fsubscript𝒙𝑘𝜔l[{\mathrm{f}}(\cdot,\omega)]=\sum_{k=1}^{m}\alpha_{k}{\mathrm{f}}({\bm{x}}_{k% },\omega).italic_l [ roman_f ( ⋅ , italic_ω ) ] = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ω ) .

    Since ff{\mathrm{f}}roman_f is a GP, f(𝒙1,),,f(𝒙m,)fsubscript𝒙1fsubscript𝒙𝑚{\mathrm{f}}({\bm{x}}_{1},\cdot),\dotsc,{\mathrm{f}}({\bm{x}}_{m},\cdot)roman_f ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋅ ) , … , roman_f ( bold_italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , ⋅ ) is jointly Gaussian, which implies that ωl[f(,ω)]maps-to𝜔𝑙delimited-[]f𝜔\omega\mapsto l[{\mathrm{f}}(\cdot,\omega)]italic_ω ↦ italic_l [ roman_f ( ⋅ , italic_ω ) ] is a Gaussian random variable. Now let n>0𝑛0n>0italic_n > 0 and fix lsclwn(𝕃δ)𝑙superscriptsubscriptscl𝑤𝑛subscript𝕃𝛿l\in\operatorname{scl}_{w*}^{n}(\mathbb{L}_{\delta})italic_l ∈ roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ). Then there is a sequence {lk}ksclwn1(𝕃δ)subscriptsubscript𝑙𝑘𝑘superscriptsubscriptscl𝑤𝑛1subscript𝕃𝛿\{l_{k}\}_{k\in\mathbb{N}}\subset\operatorname{scl}_{w*}^{n-1}(\mathbb{L}_{% \delta}){ italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ blackboard_N end_POSTSUBSCRIPT ⊂ roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) such that lk[f]l[f]subscript𝑙𝑘delimited-[]𝑓𝑙delimited-[]𝑓l_{k}[f]\to l[f]italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_f ] → italic_l [ italic_f ] as k𝑘k\to\inftyitalic_k → ∞ for every f𝔹𝑓𝔹f\in{\mathbb{B}}italic_f ∈ blackboard_B. By the inductive hypothesis, we know that ωlk[f(,ω)]maps-to𝜔subscript𝑙𝑘delimited-[]f𝜔\omega\mapsto l_{k}[{\mathrm{f}}(\cdot,\omega)]italic_ω ↦ italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ roman_f ( ⋅ , italic_ω ) ] is Gaussian for every k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N. It follows that the function ωl[f(,ω)]=limklk[f(,ω)]maps-to𝜔𝑙delimited-[]f𝜔subscript𝑘subscript𝑙𝑘delimited-[]f𝜔\omega\mapsto l[{\mathrm{f}}(\cdot,\omega)]=\lim_{k\to\infty}l_{k}[{\mathrm{f}% }(\cdot,\omega)]italic_ω ↦ italic_l [ roman_f ( ⋅ , italic_ω ) ] = roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ roman_f ( ⋅ , italic_ω ) ] is measurable (Klenke, 2014, Theorem 1.92). Moreover, as the pointwise limit of Gaussian random variables is again a Gaussian random variable, we conclude that ωl[f(,ω)]maps-to𝜔𝑙delimited-[]f𝜔\omega\mapsto l[{\mathrm{f}}(\cdot,\omega)]italic_ω ↦ italic_l [ roman_f ( ⋅ , italic_ω ) ] is Gaussian.

    Under the assumption that 𝔹=sclwn(𝕃δ)superscript𝔹superscriptsubscriptscl𝑤𝑛subscript𝕃𝛿{\mathbb{B}}^{\prime}=\operatorname{scl}_{w*}^{n}(\mathbb{L}_{\delta})blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) for some n0𝑛subscript0n\in\mathbb{N}_{0}italic_n ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the above shows that ωl[f(,ω)]maps-to𝜔𝑙delimited-[]f𝜔\omega\mapsto l[{\mathrm{f}}(\cdot,\omega)]italic_ω ↦ italic_l [ roman_f ( ⋅ , italic_ω ) ] is a Gaussian random variable for every l𝔹𝑙superscript𝔹l\in{\mathbb{B}}^{\prime}italic_l ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. In particular, the map ωl[f(,ω)]maps-to𝜔𝑙delimited-[]f𝜔\omega\mapsto l[{\mathrm{f}}(\cdot,\omega)]italic_ω ↦ italic_l [ roman_f ( ⋅ , italic_ω ) ] is \mathcal{F}caligraphic_F-()\mathcal{B}\left(\mathbb{R}\right)caligraphic_B ( blackboard_R )-measurable for all l𝔹𝑙superscript𝔹l\in{\mathbb{B}}^{\prime}italic_l ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, i.e. ωf(,ω)maps-to𝜔f𝜔\omega\mapsto{\mathrm{f}}(\cdot,\omega)italic_ω ↦ roman_f ( ⋅ , italic_ω ) is weakly measurable. By the separability of 𝔹𝔹{\mathbb{B}}blackboard_B, it follows that ωf(,ω)maps-to𝜔f𝜔\omega\mapsto{\mathrm{f}}(\cdot,\omega)italic_ω ↦ roman_f ( ⋅ , italic_ω ) is \mathcal{F}caligraphic_F-(𝔹)𝔹\mathcal{B}\left({\mathbb{B}}\right)caligraphic_B ( blackboard_B )-measurable (Bogachev, 1998, Theorem A.3.7). This shows that ωf(,ω)maps-to𝜔f𝜔\omega\mapsto{\mathrm{f}}(\cdot,\omega)italic_ω ↦ roman_f ( ⋅ , italic_ω ) is a 𝔹𝔹{\mathbb{B}}blackboard_B-valued Gaussian random variable. ∎

Corollary 14.

Let 𝔹𝕏𝔹superscript𝕏{\mathbb{B}}\subset\mathbb{R}^{{\mathbb{X}}}blackboard_B ⊂ blackboard_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT be a real separable RKBS. If 𝕃δspan{δ𝐱}𝐱𝕏\mathbb{L}_{\delta}\coloneqq\operatorname{span}\{\delta_{\bm{x}}\}_{{\bm{x}}% \in{\mathbb{X}}}blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ≔ roman_span { italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT } start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_X end_POSTSUBSCRIPT lies weak-* sequentially dense111111As before, a weak-* sequentially dense set is not to be confused with a weak-* dense set, since the dual spaces considered in this work are not necessarily sequential w.r.t. the weak-* topology.in 𝔹superscript𝔹{\mathbb{B}}^{\prime}blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, then assumption 1 holds for any GP ff{\mathrm{f}}roman_f with paths in 𝔹𝔹{\mathbb{B}}blackboard_B.

In the following, we show that theorem 13 applies to three important classes of Banach spaces, which often arise in the study of Gaussian Processes and PDEs.

Proposition 15.

The set 𝕃δspan{δ𝐱}𝐱𝕏\mathbb{L}_{\delta}\coloneqq\operatorname{span}\{\delta_{\bm{x}}\}_{{\bm{x}}% \in{\mathbb{X}}}blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ≔ roman_span { italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT } start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_X end_POSTSUBSCRIPT is weak-* sequentially dense in the dual ksuperscriptsubscript𝑘{\mathbb{H}}_{k}^{\prime}blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of any separable RKHS k𝕏subscript𝑘superscript𝕏{\mathbb{H}}_{k}\subset\mathbb{R}^{{\mathbb{X}}}blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT.

  • Proof

    Let lk𝑙superscriptsubscript𝑘l\in{\mathbb{H}}_{k}^{\prime}italic_l ∈ blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. By the Riesz representation theorem, there is hksubscript𝑘h\in{\mathbb{H}}_{k}italic_h ∈ blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT such that l=h,k𝑙subscriptsubscript𝑘l=\langle h,\cdot\rangle_{{\mathbb{H}}_{k}}italic_l = ⟨ italic_h , ⋅ ⟩ start_POSTSUBSCRIPT blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Since span{k(,𝒙)}x𝕏\operatorname{span}\{k(\cdot,{\bm{x}})\}_{x\in{\mathbb{X}}}roman_span { italic_k ( ⋅ , bold_italic_x ) } start_POSTSUBSCRIPT italic_x ∈ blackboard_X end_POSTSUBSCRIPT lies dense in ksubscript𝑘{\mathbb{H}}_{k}blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (Steinwart and Christmann, 2008, Theorem 4.21), there is {hi}ispan{k(,𝒙)}x𝕏\{h_{i}\}_{i\in\mathbb{N}}\subset\operatorname{span}\{k(\cdot,{\bm{x}})\}_{x% \in{\mathbb{X}}}{ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT ⊂ roman_span { italic_k ( ⋅ , bold_italic_x ) } start_POSTSUBSCRIPT italic_x ∈ blackboard_X end_POSTSUBSCRIPT such that hihsubscript𝑖h_{i}\to hitalic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_h. For every i𝑖i\in\mathbb{N}italic_i ∈ blackboard_N, define lihi,ksubscript𝑙𝑖subscriptsubscript𝑖subscript𝑘l_{i}\coloneqq\langle h_{i},\cdot\rangle_{{\mathbb{H}}_{k}}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ ⟨ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ⟩ start_POSTSUBSCRIPT blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT and note that {li}i𝕃δsubscriptsubscript𝑙𝑖𝑖subscript𝕃𝛿\{l_{i}\}_{i\in\mathbb{N}}\subset\mathbb{L}_{\delta}{ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT ⊂ blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT by the reproducing property. By the continuity of the inner product it follows that

    l[f]=h,fk=limihi,fk=limihi,fk=limili[f]𝑙delimited-[]𝑓subscript𝑓subscript𝑘subscriptsubscript𝑖subscript𝑖𝑓subscript𝑘subscript𝑖subscriptsubscript𝑖𝑓subscript𝑘subscript𝑖subscript𝑙𝑖delimited-[]𝑓l[f]=\langle h,f\rangle_{{\mathbb{H}}_{k}}=\left\langle\lim_{i\to\infty}h_{i},% f\right\rangle_{{\mathbb{H}}_{k}}=\lim_{i\to\infty}\langle h_{i},f\rangle_{{% \mathbb{H}}_{k}}=\lim_{i\to\infty}l_{i}[f]italic_l [ italic_f ] = ⟨ italic_h , italic_f ⟩ start_POSTSUBSCRIPT blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ⟨ roman_lim start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_f ⟩ start_POSTSUBSCRIPT blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT ⟨ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_f ⟩ start_POSTSUBSCRIPT blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_f ]

    for every fk𝑓subscript𝑘f\in{\mathbb{H}}_{k}italic_f ∈ blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. ∎

Proposition 16.

The set 𝕃δspan{δ𝐱}𝐱𝕏\mathbb{L}_{\delta}\coloneqq\operatorname{span}\{\delta_{\bm{x}}\}_{{\bm{x}}% \in{\mathbb{X}}}blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ≔ roman_span { italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT } start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_X end_POSTSUBSCRIPT is weak-* sequentially dense in the dual 𝔹superscript𝔹{\mathbb{B}}^{\prime}blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of the space 𝔹=C(𝕏)𝔹𝐶𝕏{\mathbb{B}}=C({\mathbb{X}})blackboard_B = italic_C ( blackboard_X ) of real-valued continuous functions on a compact metric space (𝕏,d𝕏)𝕏subscript𝑑𝕏({\mathbb{X}},d_{\mathbb{X}})( blackboard_X , italic_d start_POSTSUBSCRIPT blackboard_X end_POSTSUBSCRIPT ).

  • Proof

    Let lC(𝕏)𝑙𝐶superscript𝕏l\in C({\mathbb{X}})^{\prime}italic_l ∈ italic_C ( blackboard_X ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. By the Riesz-Markov theorem (Aliprantis and Border, 2006, Corollary 14.15), there is a signed Borel measure λ𝜆\lambdaitalic_λ on 𝕏𝕏{\mathbb{X}}blackboard_X such that l[f]=𝕏f(𝒙)dλ(𝒙)𝑙delimited-[]𝑓subscript𝕏𝑓𝒙differential-d𝜆𝒙l[f]=\int_{{\mathbb{X}}}f({\bm{x}})\,\mathrm{d}\lambda\left({\bm{x}}\right)italic_l [ italic_f ] = ∫ start_POSTSUBSCRIPT blackboard_X end_POSTSUBSCRIPT italic_f ( bold_italic_x ) roman_d italic_λ ( bold_italic_x ). We need to show that there are {lk}k𝕃δsubscriptsubscript𝑙𝑘𝑘subscript𝕃𝛿\{l_{k}\}_{k\in\mathbb{N}}\subset\mathbb{L}_{\delta}{ italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ blackboard_N end_POSTSUBSCRIPT ⊂ blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT such that l[f]lk[f]𝑙delimited-[]𝑓subscript𝑙𝑘delimited-[]𝑓l[f]\to l_{k}[f]italic_l [ italic_f ] → italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_f ] for every fC(𝕏)𝑓𝐶𝕏f\in C({\mathbb{X}})italic_f ∈ italic_C ( blackboard_X ). To do so, we modify a construction from Alt (2012, Paragraph 4.22). For S𝕏𝑆𝕏S\subset{\mathbb{X}}italic_S ⊂ blackboard_X, define diam(S)sup𝒙,𝒙0Sd𝕏(𝒙,𝒙0)diam𝑆subscriptsupremum𝒙subscript𝒙0𝑆subscript𝑑𝕏𝒙subscript𝒙0\operatorname{diam}(S)\coloneqq\sup_{{\bm{x}},{\bm{x}}_{0}\in S}d_{\mathbb{X}}% ({\bm{x}},{\bm{x}}_{0})roman_diam ( italic_S ) ≔ roman_sup start_POSTSUBSCRIPT bold_italic_x , bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ italic_S end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT blackboard_X end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). Since 𝕏𝕏{\mathbb{X}}blackboard_X is compact, there is a finite open cover {S~i(k)}i=1n(k)superscriptsubscriptsubscriptsuperscript~𝑆𝑘𝑖𝑖1superscript𝑛𝑘\{\tilde{S}^{(k)}_{i}\}_{i=1}^{n^{(k)}}{ over~ start_ARG italic_S end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT of 𝕏𝕏{\mathbb{X}}blackboard_X with diam(S~(k))<1kdiamsuperscript~𝑆𝑘1𝑘\operatorname{diam}(\tilde{S}^{(k)})<\frac{1}{k}roman_diam ( over~ start_ARG italic_S end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) < divide start_ARG 1 end_ARG start_ARG italic_k end_ARG for every k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N. Then {Si(k)}i=1n(k)𝕏superscriptsubscriptsubscriptsuperscript𝑆𝑘𝑖𝑖1superscript𝑛𝑘𝕏\{S^{(k)}_{i}\}_{i=1}^{n^{(k)}}\subset{\mathbb{X}}{ italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ⊂ blackboard_X with

    Si(k)S~i(k)j<iS~j(k)subscriptsuperscript𝑆𝑘𝑖subscriptsuperscript~𝑆𝑘𝑖subscript𝑗𝑖subscriptsuperscript~𝑆𝑘𝑗S^{(k)}_{i}\coloneqq\tilde{S}^{(k)}_{i}\setminus\bigcup_{j<i}\tilde{S}^{(k)}_{j}italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ over~ start_ARG italic_S end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∖ ⋃ start_POSTSUBSCRIPT italic_j < italic_i end_POSTSUBSCRIPT over~ start_ARG italic_S end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT

    is also a cover of 𝕏𝕏{\mathbb{X}}blackboard_X and Si(k)(𝕏)subscriptsuperscript𝑆𝑘𝑖𝕏S^{(k)}_{i}\in\mathcal{B}\left({\mathbb{X}}\right)italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_B ( blackboard_X ). Now choose 𝒙i(k)Si(k)subscriptsuperscript𝒙𝑘𝑖subscriptsuperscript𝑆𝑘𝑖{\bm{x}}^{(k)}_{i}\in S^{(k)}_{i}bold_italic_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for all i=1,,n(k)𝑖1superscript𝑛𝑘i=1,\dotsc,n^{(k)}italic_i = 1 , … , italic_n start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT (w.l.o.g. Si(k)subscriptsuperscript𝑆𝑘𝑖S^{(k)}_{i}\neq\emptysetitalic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ ∅). For any fC(𝕏)𝑓𝐶𝕏f\in C({\mathbb{X}})italic_f ∈ italic_C ( blackboard_X ), we define

    fki=1n(k)f(𝒙i(k))χSi(k).subscript𝑓𝑘superscriptsubscript𝑖1superscript𝑛𝑘𝑓subscriptsuperscript𝒙𝑘𝑖subscript𝜒subscriptsuperscript𝑆𝑘𝑖f_{k}\coloneqq\sum_{i=1}^{n^{(k)}}f({\bm{x}}^{(k)}_{i})\chi_{S^{(k)}_{i}}.italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≔ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f ( bold_italic_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_χ start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

    Note that, since the fksubscript𝑓𝑘f_{k}italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are simple, we have

    l[fk]=𝕏fk(𝒙)dλ(𝒙)=i=1n(k)f(𝒙i(k))λ(Si(k))lk[f],𝑙delimited-[]subscript𝑓𝑘subscript𝕏subscript𝑓𝑘𝒙differential-d𝜆𝒙superscriptsubscript𝑖1superscript𝑛𝑘𝑓subscriptsuperscript𝒙𝑘𝑖𝜆subscriptsuperscript𝑆𝑘𝑖subscript𝑙𝑘delimited-[]𝑓l[f_{k}]=\int_{{\mathbb{X}}}f_{k}({\bm{x}})\,\mathrm{d}\lambda\left({\bm{x}}% \right)=\sum_{i=1}^{n^{(k)}}f({\bm{x}}^{(k)}_{i})\lambda(S^{(k)}_{i})\eqqcolon l% _{k}[f],italic_l [ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = ∫ start_POSTSUBSCRIPT blackboard_X end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) roman_d italic_λ ( bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f ( bold_italic_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_λ ( italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≕ italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_f ] ,

    where lk𝕃δsubscript𝑙𝑘subscript𝕃𝛿l_{k}\in\mathbb{L}_{\delta}italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT. We will now show that fkfsubscript𝑓𝑘𝑓f_{k}\to fitalic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → italic_f uniformly, since this implies lk[f]=l[fk]l[f]subscript𝑙𝑘delimited-[]𝑓𝑙delimited-[]subscript𝑓𝑘𝑙delimited-[]𝑓l_{k}[f]=l[f_{k}]\to l[f]italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_f ] = italic_l [ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] → italic_l [ italic_f ] by the dominated convergence theorem.

    By the Heine-Cantor theorem, fC(𝕏)𝑓𝐶𝕏f\in C({\mathbb{X}})italic_f ∈ italic_C ( blackboard_X ) is uniformly continuous. Thus, for every ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, there is δ(ϵ)>0𝛿italic-ϵ0\delta(\epsilon)>0italic_δ ( italic_ϵ ) > 0 such that |f(𝒙)f(𝒙0)|<ϵ𝑓𝒙𝑓subscript𝒙0italic-ϵ\lvert f({\bm{x}})-f({\bm{x}}_{0})\rvert<\epsilon| italic_f ( bold_italic_x ) - italic_f ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) | < italic_ϵ for 𝒙,𝒙0𝕏𝒙subscript𝒙0𝕏{\bm{x}},{\bm{x}}_{0}\in{\mathbb{X}}bold_italic_x , bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_X with d𝕏(𝒙,𝒙0)<δ(ϵ)subscript𝑑𝕏𝒙subscript𝒙0𝛿italic-ϵd_{\mathbb{X}}({\bm{x}},{\bm{x}}_{0})<\delta(\epsilon)italic_d start_POSTSUBSCRIPT blackboard_X end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) < italic_δ ( italic_ϵ ). Now fix ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0. Then for k>1δ(ϵ)𝑘1𝛿italic-ϵk>\frac{1}{\delta(\epsilon)}italic_k > divide start_ARG 1 end_ARG start_ARG italic_δ ( italic_ϵ ) end_ARG and any 𝒙Si(k)𝒙subscriptsuperscript𝑆𝑘𝑖{\bm{x}}\in S^{(k)}_{i}bold_italic_x ∈ italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we have

    d𝕏(𝒙,𝒙i(k))<diam(Si(k))<1k<δ(ϵ)subscript𝑑𝕏𝒙subscriptsuperscript𝒙𝑘𝑖diamsubscriptsuperscript𝑆𝑘𝑖1𝑘𝛿italic-ϵd_{\mathbb{X}}({\bm{x}},{\bm{x}}^{(k)}_{i})<\operatorname{diam}(S^{(k)}_{i})<% \frac{1}{k}<\delta(\epsilon)italic_d start_POSTSUBSCRIPT blackboard_X end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < roman_diam ( italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < divide start_ARG 1 end_ARG start_ARG italic_k end_ARG < italic_δ ( italic_ϵ )

    and thus |f(𝒙)f(𝒙i(k))|<ϵ𝑓𝒙𝑓subscriptsuperscript𝒙𝑘𝑖italic-ϵ\lvert f({\bm{x}})-f({\bm{x}}^{(k)}_{i})\rvert<\epsilon| italic_f ( bold_italic_x ) - italic_f ( bold_italic_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | < italic_ϵ. Consequently,

    ffk=maxi=1,,n(k)sup𝒙Si(k)|f(𝒙)f(𝒙i(k))|<maxi=1,,n(k)ϵ=ϵ.subscriptdelimited-∥∥𝑓subscript𝑓𝑘subscript𝑖1superscript𝑛𝑘subscriptsupremum𝒙subscriptsuperscript𝑆𝑘𝑖𝑓𝒙𝑓subscriptsuperscript𝒙𝑘𝑖subscript𝑖1superscript𝑛𝑘italic-ϵitalic-ϵ\lVert f-f_{k}\rVert_{\infty}=\max_{i=1,\dotsc,n^{(k)}}\sup_{{\bm{x}}\in S^{(k% )}_{i}}\lvert f({\bm{x}})-f({\bm{x}}^{(k)}_{i})\rvert<\max_{i=1,\dotsc,n^{(k)}% }\epsilon=\epsilon.∥ italic_f - italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_n start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT bold_italic_x ∈ italic_S start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_f ( bold_italic_x ) - italic_f ( bold_italic_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | < roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_n start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ϵ = italic_ϵ .

The Banach spaces Ck(𝕏¯)superscript𝐶𝑘¯𝕏C^{k}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) of k𝑘kitalic_k-times differentiable functions on an open and bounded domain 𝕏d𝕏superscript𝑑{\mathbb{X}}\subset\mathbb{R}^{d}blackboard_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with bounded and uniformly continuous partial derivatives and their subspaces (in particular the Hölder spaces) appear naturally in the study of strong solutions to linear PDEs. However, to allow for a flexible prior construction, we define a generalization of these spaces.

Definition 17.

Let 𝕏d𝕏superscript𝑑{\mathbb{X}}\subset\mathbb{R}^{d}blackboard_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be open and bounded and let A0d𝐴superscriptsubscript0𝑑A\subset\mathbb{N}_{0}^{d}italic_A ⊂ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a non-empty downward closed set of multi-indices, i.e. 𝛃A𝛃𝐴{\bm{\beta}}\in Abold_italic_β ∈ italic_A implies 𝛂A𝛂𝐴{\bm{\alpha}}\in Abold_italic_α ∈ italic_A for every 𝛂0d𝛂superscriptsubscript0𝑑{\bm{\alpha}}\in\mathbb{N}_{0}^{d}bold_italic_α ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with 𝛂𝛃𝛂𝛃{\bm{\alpha}}\leq{\bm{\beta}}bold_italic_α ≤ bold_italic_β. We define CA(𝕏¯)superscript𝐶𝐴¯𝕏C^{A}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) as the space of real-valued functions f𝑓fitalic_f on 𝕏𝕏{\mathbb{X}}blackboard_X, for which all partial derivatives D𝛂fsuperscriptD𝛂𝑓\mathrm{D}^{{\bm{\alpha}}}froman_D start_POSTSUPERSCRIPT bold_italic_α end_POSTSUPERSCRIPT italic_f with 𝛂A𝛂𝐴{\bm{\alpha}}\in Abold_italic_α ∈ italic_A are bounded and uniformly continuous.

Remark 18.

One can show that CA(𝕏¯)superscript𝐶𝐴¯𝕏C^{A}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) equipped with the norm

fCA(𝕏¯)max𝜶Asup𝒙𝕏|D𝜶f(𝒙)|subscriptdelimited-∥∥𝑓superscript𝐶𝐴¯𝕏subscript𝜶𝐴subscriptsupremum𝒙𝕏superscriptD𝜶𝑓𝒙\lVert f\rVert_{C^{A}(\overline{{\mathbb{X}}})}\coloneqq\max_{{\bm{\alpha}}\in A% }\sup_{{\bm{x}}\in{\mathbb{X}}}\lvert\mathrm{D}^{{\bm{\alpha}}}f\left({\bm{x}}% \right)\rvert∥ italic_f ∥ start_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) end_POSTSUBSCRIPT ≔ roman_max start_POSTSUBSCRIPT bold_italic_α ∈ italic_A end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_X end_POSTSUBSCRIPT | roman_D start_POSTSUPERSCRIPT bold_italic_α end_POSTSUPERSCRIPT italic_f ( bold_italic_x ) |

is a separable Banach space. Since every partial derivative D𝛂fsuperscriptD𝛂𝑓\mathrm{D}^{{\bm{\alpha}}}froman_D start_POSTSUPERSCRIPT bold_italic_α end_POSTSUPERSCRIPT italic_f with 𝛂A𝛂𝐴{\bm{\alpha}}\in Abold_italic_α ∈ italic_A is bounded and uniformly continuous, it has a unique, bounded, continuous extension to the closure 𝕏¯¯𝕏\overline{{\mathbb{X}}}over¯ start_ARG blackboard_X end_ARG of 𝕏𝕏{\mathbb{X}}blackboard_X (Adams and Fournier, 2003).

Proposition 19.

Let CA(𝕏¯)superscript𝐶𝐴¯𝕏C^{A}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) be the separable Banach space of real-valued functions on an open and bounded domain 𝕏d𝕏superscript𝑑{\mathbb{X}}\subset\mathbb{R}^{d}blackboard_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with bounded and uniformly continuous partial derivatives D𝛂fsuperscriptD𝛂𝑓\mathrm{D}^{{\bm{\alpha}}}froman_D start_POSTSUPERSCRIPT bold_italic_α end_POSTSUPERSCRIPT italic_f for all fCA(𝕏¯)𝑓superscript𝐶𝐴¯𝕏f\in C^{A}(\overline{{\mathbb{X}}})italic_f ∈ italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) and 𝛂A𝛂𝐴{\bm{\alpha}}\in Abold_italic_α ∈ italic_A. Then CA(𝕏¯)=sclwm+1(𝕃δ)superscript𝐶𝐴superscript¯𝕏superscriptsubscriptscl𝑤𝑚1subscript𝕃𝛿C^{A}(\overline{{\mathbb{X}}})^{\prime}=\operatorname{scl}_{w*}^{m+1}(\mathbb{% L}_{\delta})italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT ( blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ), where 𝕃δspan{δ𝐱}𝐱𝕏CA(𝕏¯)\mathbb{L}_{\delta}\coloneqq\operatorname{span}\{\delta_{\bm{x}}\}_{{\bm{x}}% \in{\mathbb{X}}}\subset C^{A}(\overline{{\mathbb{X}}})^{\prime}blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ≔ roman_span { italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT } start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_X end_POSTSUBSCRIPT ⊂ italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and m=max𝛂A|𝛂|𝑚subscript𝛂𝐴𝛂m=\max_{{\bm{\alpha}}\in A}\lvert{\bm{\alpha}}\rvertitalic_m = roman_max start_POSTSUBSCRIPT bold_italic_α ∈ italic_A end_POSTSUBSCRIPT | bold_italic_α |.

  • Proof

    In the following, we adapt the proof of Theorem 3.9 in (Adams and Fournier, 2003). We choose an arbitrary ordering 𝜶1,,𝜶nsubscript𝜶1subscript𝜶𝑛{\bm{\alpha}}_{1},\dotsc,{\bm{\alpha}}_{n}bold_italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT of the multi-indices in A𝐴Aitalic_A, i.e. A={𝜶1,,𝜶n}𝐴subscript𝜶1subscript𝜶𝑛A=\{{\bm{\alpha}}_{1},\dotsc,{\bm{\alpha}}_{n}\}italic_A = { bold_italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, where n=|A|𝑛𝐴n=\lvert A\rvertitalic_n = | italic_A |. Let

    𝕎{(D𝜶1f,,D𝜶nf):fCA(𝕏¯)}C(𝕏¯)n,𝕎conditional-setsuperscriptDsubscript𝜶1𝑓superscriptDsubscript𝜶𝑛𝑓𝑓superscript𝐶𝐴¯𝕏𝐶superscript¯𝕏𝑛{\mathbb{W}}\coloneqq\{(\mathrm{D}^{{\bm{\alpha}}_{1}}f,\dotsc,\mathrm{D}^{{% \bm{\alpha}}_{n}}f)\colon f\in C^{A}(\overline{{\mathbb{X}}})\}\subset C(% \overline{{\mathbb{X}}})^{n},blackboard_W ≔ { ( roman_D start_POSTSUPERSCRIPT bold_italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f , … , roman_D start_POSTSUPERSCRIPT bold_italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ) : italic_f ∈ italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) } ⊂ italic_C ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ,

    where we interpret D𝜶ifsuperscriptDsubscript𝜶𝑖𝑓\mathrm{D}^{{\bm{\alpha}}_{i}}froman_D start_POSTSUPERSCRIPT bold_italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f as a function defined on the closure 𝕏¯¯𝕏\overline{{\mathbb{X}}}over¯ start_ARG blackboard_X end_ARG of 𝕏𝕏{\mathbb{X}}blackboard_X by the unique continuous extension mentioned in remark 18. We equip C(𝕏¯)n𝐶superscript¯𝕏𝑛C(\overline{{\mathbb{X}}})^{n}italic_C ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝕎C(𝕏¯)n𝕎𝐶superscript¯𝕏𝑛{\mathbb{W}}\subset C(\overline{{\mathbb{X}}})^{n}blackboard_W ⊂ italic_C ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with the norm

    𝒇C(𝕏¯)nmaxi=1,,nfiC(𝕏¯).\lVert{\bm{f}}\rVert_{C(\overline{{\mathbb{X}}})^{n}}\coloneqq\max_{i=1,\dotsc% ,n}\lVert{f}_{i}\rVert_{C(\overline{{\mathbb{X}}})}.∥ bold_italic_f ∥ start_POSTSUBSCRIPT italic_C ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≔ roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_n end_POSTSUBSCRIPT ∥ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_C ( over¯ start_ARG blackboard_X end_ARG ) end_POSTSUBSCRIPT .

    Then (C(𝕏¯)n,C(𝕏¯)n)𝐶superscript¯𝕏𝑛subscriptdelimited-∥∥𝐶superscript¯𝕏𝑛(C(\overline{{\mathbb{X}}})^{n},\lVert\cdot\rVert_{C(\overline{{\mathbb{X}}})^% {n}})( italic_C ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , ∥ ⋅ ∥ start_POSTSUBSCRIPT italic_C ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) is a separable Banach space (Adams and Fournier, 2003, Theorem 1.23). Let :CA(𝕏¯)𝕎:superscript𝐶𝐴¯𝕏𝕎\mathcal{I}\colon C^{A}(\overline{{\mathbb{X}}})\to{\mathbb{W}}caligraphic_I : italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) → blackboard_W be the linear operator defined by [f]i=D𝜶ifsubscriptdelimited-[]𝑓𝑖superscriptDsubscript𝜶𝑖𝑓\mathcal{I}[f]_{i}=\mathrm{D}^{{\bm{\alpha}}_{i}}fcaligraphic_I [ italic_f ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_D start_POSTSUPERSCRIPT bold_italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f. The operator \mathcal{I}caligraphic_I is surjective and norm-preserving, and hence an isometric isomorphism. It follows that 𝕎𝕎{\mathbb{W}}blackboard_W is a closed subspace of C(𝕏¯)n𝐶superscript¯𝕏𝑛C(\overline{{\mathbb{X}}})^{n}italic_C ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

    Fix lCA(𝕏¯)𝑙superscript𝐶𝐴superscript¯𝕏l\in C^{A}(\overline{{\mathbb{X}}})^{\prime}italic_l ∈ italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Then l1𝕎𝑙superscript1superscript𝕎l\circ\mathcal{I}^{-1}\in{\mathbb{W}}^{\prime}italic_l ∘ caligraphic_I start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∈ blackboard_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. By the Hahn-Banach theorem, there is a continuous extension l~(C(𝕏¯)n)~𝑙superscript𝐶superscript¯𝕏𝑛\tilde{l}\in(C(\overline{{\mathbb{X}}})^{n})^{\prime}over~ start_ARG italic_l end_ARG ∈ ( italic_C ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of l1𝑙superscript1l\circ\mathcal{I}^{-1}italic_l ∘ caligraphic_I start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT to C(𝕏¯)n𝐶superscript¯𝕏𝑛C(\overline{{\mathbb{X}}})^{n}italic_C ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. This means that there are l~1,,l~nC(𝕏¯)subscript~𝑙1subscript~𝑙𝑛𝐶superscript¯𝕏\tilde{l}_{1},\dotsc,\tilde{l}_{n}\in C(\overline{{\mathbb{X}}})^{\prime}over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ italic_C ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, such that

    l=(l1)=l~=i=1nl~iD𝜶i.𝑙𝑙superscript1~𝑙superscriptsubscript𝑖1𝑛subscript~𝑙𝑖superscriptDsubscript𝜶𝑖l=(l\circ\mathcal{I}^{-1})\circ\mathcal{I}=\tilde{l}\circ\mathcal{I}=\sum_{i=1% }^{n}\tilde{l}_{i}\circ\mathrm{D}^{{\bm{\alpha}}_{i}}.italic_l = ( italic_l ∘ caligraphic_I start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ∘ caligraphic_I = over~ start_ARG italic_l end_ARG ∘ caligraphic_I = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ roman_D start_POSTSUPERSCRIPT bold_italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

    By proposition 16, there are {l~ij}j𝕃δsubscriptsubscript~𝑙𝑖𝑗𝑗subscript𝕃𝛿\{\tilde{l}_{ij}\}_{j\in\mathbb{N}}\subset\mathbb{L}_{\delta}{ over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT ⊂ blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT such that l~ij[f]l~i[f]subscript~𝑙𝑖𝑗delimited-[]𝑓subscript~𝑙𝑖delimited-[]𝑓\tilde{l}_{ij}[f]\to\tilde{l}_{i}[f]over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT [ italic_f ] → over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_f ] as j𝑗j\to\inftyitalic_j → ∞ for all fC(𝕏¯)𝑓𝐶¯𝕏f\in C(\overline{{\mathbb{X}}})italic_f ∈ italic_C ( over¯ start_ARG blackboard_X end_ARG ). Consequently, there are nijsubscript𝑛𝑖𝑗n_{ij}\in\mathbb{N}italic_n start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ blackboard_N, {cijk}k=1nijsuperscriptsubscriptsubscript𝑐𝑖𝑗𝑘𝑘1subscript𝑛𝑖𝑗\{c_{ijk}\}_{k=1}^{n_{ij}}\subset\mathbb{R}{ italic_c start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⊂ blackboard_R, and {𝒙ijk}k=1nij𝕏¯superscriptsubscriptsubscript𝒙𝑖𝑗𝑘𝑘1subscript𝑛𝑖𝑗¯𝕏\{{\bm{x}}_{ijk}\}_{k=1}^{n_{ij}}\subset\overline{{\mathbb{X}}}{ bold_italic_x start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⊂ over¯ start_ARG blackboard_X end_ARG such that

    l[f]=limji=1nl~j(i)[D𝜶if]=limji=1nk=1nijcjk(i)D𝜶if(𝒙ijk)𝑙delimited-[]𝑓subscript𝑗superscriptsubscript𝑖1𝑛subscriptsuperscript~𝑙𝑖𝑗delimited-[]superscriptDsubscript𝜶𝑖𝑓subscript𝑗superscriptsubscript𝑖1𝑛superscriptsubscript𝑘1subscript𝑛𝑖𝑗subscriptsuperscript𝑐𝑖𝑗𝑘superscriptDsubscript𝜶𝑖𝑓subscript𝒙𝑖𝑗𝑘l[f]=\lim_{j\to\infty}\sum_{i=1}^{n}\tilde{l}^{(i)}_{j}[\mathrm{D}^{{\bm{% \alpha}}_{i}}f]=\lim_{j\to\infty}\sum_{i=1}^{n}\sum_{k=1}^{n_{ij}}c^{(i)}_{jk}% \mathrm{D}^{{\bm{\alpha}}_{i}}f({\bm{x}}_{ijk})italic_l [ italic_f ] = roman_lim start_POSTSUBSCRIPT italic_j → ∞ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over~ start_ARG italic_l end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT [ roman_D start_POSTSUPERSCRIPT bold_italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ] = roman_lim start_POSTSUBSCRIPT italic_j → ∞ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT roman_D start_POSTSUPERSCRIPT bold_italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT )

    for all fCA(𝕏¯)𝑓superscript𝐶𝐴¯𝕏f\in C^{A}(\overline{{\mathbb{X}}})italic_f ∈ italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ). We will detail the remainder of the proof only for |𝜶|1𝜶1\lvert{\bm{\alpha}}\rvert\leq 1| bold_italic_α | ≤ 1 for all 𝜶A𝜶𝐴{\bm{\alpha}}\in Abold_italic_α ∈ italic_A, since the proof of the general statement is a straightforward yet laborious extension of this special case. Assume without loss of generality that 𝜶1=𝟎subscript𝜶10{\bm{\alpha}}_{1}={\bm{0}}bold_italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_0 and 𝜶i=𝒆i1subscript𝜶𝑖subscript𝒆𝑖1{\bm{\alpha}}_{i}={\bm{e}}_{i-1}bold_italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_e start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT for 2ind+12𝑖𝑛𝑑12\leq i\leq n\leq d+12 ≤ italic_i ≤ italic_n ≤ italic_d + 1. Then,

    l[f]𝑙delimited-[]𝑓\displaystyle l[f]italic_l [ italic_f ] =limj0i=1nk=1nij0cij0kD𝜶if(𝒙ij0k)absentsubscriptsubscript𝑗0superscriptsubscript𝑖1𝑛superscriptsubscript𝑘1subscript𝑛𝑖subscript𝑗0subscript𝑐𝑖subscript𝑗0𝑘superscriptDsubscript𝜶𝑖𝑓subscript𝒙𝑖subscript𝑗0𝑘\displaystyle=\lim_{j_{0}\to\infty}\sum_{i=1}^{n}\sum_{k=1}^{n_{ij_{0}}}c_{ij_% {0}k}\mathrm{D}^{{\bm{\alpha}}_{i}}f({\bm{x}}_{ij_{0}k})= roman_lim start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT → ∞ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_D start_POSTSUPERSCRIPT bold_italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
    =limj0k=1nij0c1j0kf(𝒙1j0k)+i=2ncij0kD𝒆i1f(𝒙ij0k)absentsubscriptsubscript𝑗0superscriptsubscript𝑘1subscript𝑛𝑖subscript𝑗0subscript𝑐1subscript𝑗0𝑘𝑓subscript𝒙1subscript𝑗0𝑘superscriptsubscript𝑖2𝑛subscript𝑐𝑖subscript𝑗0𝑘superscriptDsubscript𝒆𝑖1𝑓subscript𝒙𝑖subscript𝑗0𝑘\displaystyle=\lim_{j_{0}\to\infty}\sum_{k=1}^{n_{ij_{0}}}c_{1j_{0}k}f({\bm{x}% }_{1j_{0}k})+\sum_{i=2}^{n}c_{ij_{0}k}\mathrm{D}^{{\bm{e}}_{i-1}}f({\bm{x}}_{% ij_{0}k})= roman_lim start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT → ∞ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 1 italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_f ( bold_italic_x start_POSTSUBSCRIPT 1 italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_D start_POSTSUPERSCRIPT bold_italic_e start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
    =limj0limj1k=1nij0c1j0kf(𝒙1j0k)+i=2ncij0khj1(f(𝒙ij0k+hj1𝒆i1)f(𝒙ij0k))lj0j1[f]absentsubscriptsubscript𝑗0subscriptsubscript𝑗1subscriptsuperscriptsubscript𝑘1subscript𝑛𝑖subscript𝑗0subscript𝑐1subscript𝑗0𝑘𝑓subscript𝒙1subscript𝑗0𝑘superscriptsubscript𝑖2𝑛subscript𝑐𝑖subscript𝑗0𝑘subscriptsubscript𝑗1𝑓subscript𝒙𝑖subscript𝑗0𝑘subscriptsubscript𝑗1subscript𝒆𝑖1𝑓subscript𝒙𝑖subscript𝑗0𝑘absentsubscript𝑙subscript𝑗0subscript𝑗1delimited-[]𝑓\displaystyle=\lim_{j_{0}\to\infty}\lim_{j_{1}\to\infty}\underbrace{\sum_{k=1}% ^{n_{ij_{0}}}c_{1j_{0}k}f({\bm{x}}_{1j_{0}k})+\sum_{i=2}^{n}\frac{c_{ij_{0}k}}% {h_{j_{1}}}\left(f({\bm{x}}_{ij_{0}k}+h_{j_{1}}{\bm{e}}_{i-1})-f({\bm{x}}_{ij_% {0}k})\right)}_{\eqqcolon l_{j_{0}j_{1}}[f]}= roman_lim start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT → ∞ end_POSTSUBSCRIPT roman_lim start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → ∞ end_POSTSUBSCRIPT under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 1 italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_f ( bold_italic_x start_POSTSUBSCRIPT 1 italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_c start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ( italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_italic_e start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) - italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) end_ARG start_POSTSUBSCRIPT ≕ italic_l start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f ] end_POSTSUBSCRIPT

    for any null sequence {hj}jsubscriptsubscript𝑗𝑗\{h_{j}\}_{j\in\mathbb{N}}\subset\mathbb{R}{ italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT ⊂ blackboard_R and all fCA(𝕏¯)𝑓superscript𝐶𝐴¯𝕏f\in C^{A}(\overline{{\mathbb{X}}})italic_f ∈ italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ). Since lj0j1𝕃δsubscript𝑙subscript𝑗0subscript𝑗1subscript𝕃𝛿l_{j_{0}j_{1}}\in\mathbb{L}_{\delta}italic_l start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT, it follows that lsclw2(𝕃δ)𝑙superscriptsubscriptscl𝑤2subscript𝕃𝛿l\in\operatorname{scl}_{w*}^{2}(\mathbb{L}_{\delta})italic_l ∈ roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ). The general case, i.e. lsclwm+1(𝕃δ)𝑙superscriptsubscriptscl𝑤𝑚1subscript𝕃𝛿l\in\operatorname{scl}_{w*}^{m+1}(\mathbb{L}_{\delta})italic_l ∈ roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT ( blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) with mmaxi=1,,n|𝜶i|𝑚subscript𝑖1𝑛subscript𝜶𝑖m\coloneqq\max_{i=1,\dotsc,n}\lvert{\bm{\alpha}}_{i}\rvertitalic_m ≔ roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_n end_POSTSUBSCRIPT | bold_italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |, can be shown by induction on m0𝑚subscript0m\in\mathbb{N}_{0}italic_m ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Hence, CA(𝕏¯)=sclwm+1(𝕃δ)superscript𝐶𝐴superscript¯𝕏superscriptsubscriptscl𝑤𝑚1subscript𝕃𝛿C^{A}(\overline{{\mathbb{X}}})^{\prime}=\operatorname{scl}_{w*}^{m+1}(\mathbb{% L}_{\delta})italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_scl start_POSTSUBSCRIPT italic_w ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT ( blackboard_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ). ∎

Having investigated conditions under which ωf(,ω)maps-to𝜔f𝜔\omega\mapsto{\mathrm{f}}(\cdot,\omega)italic_ω ↦ roman_f ( ⋅ , italic_ω ) is a Gaussian random variable, it remains to show what its mean and covariance operator are. Perhaps unsurprisingly, it turns out that they are strongly related to the mean and covariance function of the GP.

Proposition 20.

Let assumption 1 hold. Then m𝔹𝑚𝔹m\in{\mathbb{B}}italic_m ∈ blackboard_B, k(𝐱,)𝔹𝑘𝐱𝔹k({\bm{x}},\cdot)\in{\mathbb{B}}italic_k ( bold_italic_x , ⋅ ) ∈ blackboard_B for all 𝐱𝕏𝐱𝕏{\bm{x}}\in{\mathbb{X}}bold_italic_x ∈ blackboard_X, and the mean and covariance operator of ωf(,ω)maps-to𝜔f𝜔\omega\mapsto{\mathrm{f}}(\cdot,\omega)italic_ω ↦ roman_f ( ⋅ , italic_ω ) are given by m𝑚mitalic_m and

𝒞k:𝔹𝔹,l𝒞k[l](𝒙)=l[k(𝒙,)],:subscript𝒞𝑘formulae-sequencesuperscript𝔹𝔹maps-to𝑙subscript𝒞𝑘delimited-[]𝑙𝒙𝑙delimited-[]𝑘𝒙\mathcal{C}_{k}\colon{\mathbb{B}}^{\prime}\to{\mathbb{B}},\quad l\mapsto% \mathcal{C}_{k}[l]({\bm{x}})=l[k({\bm{x}},\cdot)],caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT : blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT → blackboard_B , italic_l ↦ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_l ] ( bold_italic_x ) = italic_l [ italic_k ( bold_italic_x , ⋅ ) ] , (B.1)

respectively.

  • Proof

    Since ωf(,ω)maps-to𝜔f𝜔\omega\mapsto{\mathrm{f}}(\cdot,\omega)italic_ω ↦ roman_f ( ⋅ , italic_ω ) is Gaussian, its mean and covariance operator mfsubscript𝑚fm_{\mathrm{f}}italic_m start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT and 𝒞fsubscript𝒞f\mathcal{C}_{\mathrm{f}}caligraphic_C start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT exist by proposition 9 and we have m(𝒙)=𝔼P[f(𝒙)]=𝔼f[δ𝒙[f]]=δ𝒙[mf]𝑚𝒙subscript𝔼Pf𝒙subscript𝔼fsubscript𝛿𝒙delimited-[]fsubscript𝛿𝒙delimited-[]subscript𝑚fm({\bm{x}})=\operatorname{\mathbb{E}}_{\mathrm{P}}\left[{\mathrm{f}}({\bm{x}})% \right]=\operatorname{\mathbb{E}}_{{\mathrm{f}}}\left[\delta_{\bm{x}}[{\mathrm% {f}}]\right]=\delta_{\bm{x}}[m_{\mathrm{f}}]italic_m ( bold_italic_x ) = blackboard_E start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ roman_f ( bold_italic_x ) ] = blackboard_E start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ roman_f ] ] = italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ italic_m start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT ] for all 𝒙𝕏𝒙𝕏{\bm{x}}\in{\mathbb{X}}bold_italic_x ∈ blackboard_X and

    k(𝒙1,𝒙2)=CovP[f(𝒙1),f(𝒙2)]=Covf[δ𝒙1[f],δ𝒙2[f]]=𝒞f[δ𝒙1](𝒙2)𝑘subscript𝒙1subscript𝒙2subscriptCovPfsubscript𝒙1fsubscript𝒙2subscriptCovfsubscript𝛿subscript𝒙1delimited-[]fsubscript𝛿subscript𝒙2delimited-[]fsubscript𝒞fdelimited-[]subscript𝛿subscript𝒙1subscript𝒙2k({\bm{x}}_{1},{\bm{x}}_{2})=\operatorname{Cov}_{\mathrm{P}}\left[{\mathrm{f}}% ({\bm{x}}_{1}),{\mathrm{f}}({\bm{x}}_{2})\right]=\operatorname{Cov}_{{\mathrm{% f}}}\left[\delta_{{\bm{x}}_{1}}[{\mathrm{f}}],\delta_{{\bm{x}}_{2}}[{\mathrm{f% }}]\right]=\mathcal{C}_{\mathrm{f}}[\delta_{{\bm{x}}_{1}}]({\bm{x}}_{2})italic_k ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = roman_Cov start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ roman_f ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , roman_f ( bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] = roman_Cov start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_f ] , italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_f ] ] = caligraphic_C start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] ( bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )

    for all 𝒙1,𝒙2𝕏subscript𝒙1subscript𝒙2𝕏{\bm{x}}_{1},{\bm{x}}_{2}\in{\mathbb{X}}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_X, since all point evaluation functionals are continuous on 𝔹𝔹{\mathbb{B}}blackboard_B. Hence, m=mf𝔹𝑚subscript𝑚f𝔹m=m_{\mathrm{f}}\in{\mathbb{B}}italic_m = italic_m start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT ∈ blackboard_B and k(𝒙,)=𝒞f[δ𝒙]𝔹𝑘𝒙subscript𝒞fdelimited-[]subscript𝛿𝒙𝔹k({\bm{x}},\cdot)=\mathcal{C}_{\mathrm{f}}[\delta_{\bm{x}}]\in{\mathbb{B}}italic_k ( bold_italic_x , ⋅ ) = caligraphic_C start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ] ∈ blackboard_B for all 𝒙𝕏𝒙𝕏{\bm{x}}\in{\mathbb{X}}bold_italic_x ∈ blackboard_X. Additionally, for all l𝔹𝑙superscript𝔹l\in{\mathbb{B}}^{\prime}italic_l ∈ blackboard_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 𝒙𝕏𝒙𝕏{\bm{x}}\in{\mathbb{X}}bold_italic_x ∈ blackboard_X,

    𝒞f[l](𝒙)=Covf[l[f],δ𝒙[f]]=Covf[δ𝒙[f],l[f]]=l[𝒞f[δ𝒙]]=l[k(𝒙,)]=𝒞k[l](𝒙).subscript𝒞fdelimited-[]𝑙𝒙subscriptCovf𝑙delimited-[]fsubscript𝛿𝒙delimited-[]fsubscriptCovfsubscript𝛿𝒙delimited-[]f𝑙delimited-[]f𝑙delimited-[]subscript𝒞fdelimited-[]subscript𝛿𝒙𝑙delimited-[]𝑘𝒙subscript𝒞𝑘delimited-[]𝑙𝒙\displaystyle\mathcal{C}_{\mathrm{f}}[l]({\bm{x}})=\operatorname{Cov}_{{% \mathrm{f}}}\left[l[{\mathrm{f}}],\delta_{\bm{x}}[{\mathrm{f}}]\right]=% \operatorname{Cov}_{{\mathrm{f}}}\left[\delta_{\bm{x}}[{\mathrm{f}}],l[{% \mathrm{f}}]\right]=l[\mathcal{C}_{\mathrm{f}}[\delta_{\bm{x}}]]=l[k({\bm{x}},% \cdot)]=\mathcal{C}_{k}[l]({\bm{x}}).caligraphic_C start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT [ italic_l ] ( bold_italic_x ) = roman_Cov start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT [ italic_l [ roman_f ] , italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ roman_f ] ] = roman_Cov start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ roman_f ] , italic_l [ roman_f ] ] = italic_l [ caligraphic_C start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ] ] = italic_l [ italic_k ( bold_italic_x , ⋅ ) ] = caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_l ] ( bold_italic_x ) .

    This shows that 𝒞f=𝒞ksubscript𝒞fsubscript𝒞𝑘\mathcal{C}_{\mathrm{f}}=\mathcal{C}_{k}caligraphic_C start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT = caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. ∎

B.3 Proof of Theorem 1

Using the results from sections B.1 and B.2, particularly propositions 20 and 11, we can now conduct the proof of theorems 1 and 2 as outlined in the beginning of section B.

The main theorem deals with the case in which we observe the GP through a finite number of linear functionals. This happens when conditioning on integral observations or on (Galerkin) projections as in sections 3.2 and 3.3. See 1

  • Proof

    By lemma 11, 𝓛[f]𝓛delimited-[]f{\bm{\mathcal{L}}}[{\mathrm{f}}]bold_caligraphic_L [ roman_f ] is a Gaussian random variable with mean 𝓛[m]𝓛delimited-[]𝑚{\bm{\mathcal{L}}}[m]bold_caligraphic_L [ italic_m ] and covariance matrix 𝚺𝚺{\bm{\Sigma}}bold_Σ with

    Σij=i[𝒞[j]]=i[xj[k(x,)]]=(𝓛k𝓛)ij,subscriptΣ𝑖𝑗subscript𝑖delimited-[]𝒞delimited-[]subscript𝑗subscript𝑖delimited-[]maps-to𝑥subscript𝑗delimited-[]𝑘𝑥subscript𝓛𝑘superscript𝓛𝑖𝑗{\Sigma}_{ij}={\mathcal{L}}_{i}[\mathcal{C}[{\mathcal{L}}_{j}]]={\mathcal{L}}_% {i}[x\mapsto{\mathcal{L}}_{j}[k(x,\cdot)]]=({\bm{\mathcal{L}}}k{\bm{\mathcal{L% }}}^{\prime})_{ij},roman_Σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ caligraphic_C [ caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] ] = caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_x ↦ caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT [ italic_k ( italic_x , ⋅ ) ] ] = ( bold_caligraphic_L italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ,

    where we used propositions 9 and 20. This proves equation 4.1. Now let 𝑿=(𝒙i)i=1m𝕏m𝑿superscriptsubscriptsubscript𝒙𝑖𝑖1𝑚superscript𝕏𝑚{\bm{X}}=({\bm{x}}_{i})_{i=1}^{m}\in{\mathbb{X}}^{m}bold_italic_X = ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∈ blackboard_X start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT and consider

    L~:Um+n,f(f(𝑿)𝓛[f]).:~𝐿formulae-sequence𝑈superscript𝑚𝑛maps-to𝑓matrix𝑓𝑿𝓛delimited-[]𝑓\tilde{L}\colon U\to\mathbb{R}^{m+n},f\mapsto\begin{pmatrix}f({\bm{X}})\\ {\bm{\mathcal{L}}}[f]\end{pmatrix}.over~ start_ARG italic_L end_ARG : italic_U → blackboard_R start_POSTSUPERSCRIPT italic_m + italic_n end_POSTSUPERSCRIPT , italic_f ↦ ( start_ARG start_ROW start_CELL italic_f ( bold_italic_X ) end_CELL end_ROW start_ROW start_CELL bold_caligraphic_L [ italic_f ] end_CELL end_ROW end_ARG ) .

    L~~𝐿\tilde{L}over~ start_ARG italic_L end_ARG is linear and bounded. Hence, by lemma 11 and the stability properties of independent Gaussian random variables on m+nsuperscript𝑚𝑛\mathbb{R}^{m+n}blackboard_R start_POSTSUPERSCRIPT italic_m + italic_n end_POSTSUPERSCRIPT,

    (f(𝑿)𝓛[f]+ϵ)=𝓛~[f]+(𝟎m×n𝑰n)ϵ𝒩((m(𝑿)𝓛[m]+𝝁),(k(𝑿,𝑿)𝚺𝑿,𝓛𝚺𝓛,𝑿𝓛k𝓛+𝚺)),matrixf𝑿𝓛delimited-[]fbold-italic-ϵ~𝓛delimited-[]fmatrixsubscript0𝑚𝑛subscript𝑰𝑛bold-italic-ϵsimilar-to𝒩matrix𝑚𝑿𝓛delimited-[]𝑚𝝁matrix𝑘𝑿𝑿superscript𝚺𝑿𝓛superscript𝚺𝓛𝑿𝓛𝑘superscript𝓛𝚺\begin{pmatrix}{\mathrm{f}}({\bm{X}})\\ {\bm{\mathcal{L}}}[{\mathrm{f}}]+{\bm{\mathrm{\epsilon}}}\end{pmatrix}=\tilde{% {\bm{\mathcal{L}}}}[{\mathrm{f}}]+\begin{pmatrix}{\bm{0}}_{m\times n}\\ {\bm{I}}_{n}\end{pmatrix}{\bm{\mathrm{\epsilon}}}\sim{\operatorname{\mathcal{N% }}\left(\begin{pmatrix}m({\bm{X}})\\ {\bm{\mathcal{L}}}[m]+{\bm{\mu}}\end{pmatrix},\begin{pmatrix}k({\bm{X}},{\bm{X% }})&{\bm{\Sigma}}^{{\bm{X}},{\bm{\mathcal{L}}}}\\ {\bm{\Sigma}}^{{\bm{\mathcal{L}}},{\bm{X}}}&{\bm{\mathcal{L}}}k{\bm{\mathcal{L% }}}^{\prime}+{\bm{\Sigma}}\end{pmatrix}\right)},( start_ARG start_ROW start_CELL roman_f ( bold_italic_X ) end_CELL end_ROW start_ROW start_CELL bold_caligraphic_L [ roman_f ] + bold_italic_ϵ end_CELL end_ROW end_ARG ) = over~ start_ARG bold_caligraphic_L end_ARG [ roman_f ] + ( start_ARG start_ROW start_CELL bold_0 start_POSTSUBSCRIPT italic_m × italic_n end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) bold_italic_ϵ ∼ caligraphic_N ( ( start_ARG start_ROW start_CELL italic_m ( bold_italic_X ) end_CELL end_ROW start_ROW start_CELL bold_caligraphic_L [ italic_m ] + bold_italic_μ end_CELL end_ROW end_ARG ) , ( start_ARG start_ROW start_CELL italic_k ( bold_italic_X , bold_italic_X ) end_CELL start_CELL bold_Σ start_POSTSUPERSCRIPT bold_italic_X , bold_caligraphic_L end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_Σ start_POSTSUPERSCRIPT bold_caligraphic_L , bold_italic_X end_POSTSUPERSCRIPT end_CELL start_CELL bold_caligraphic_L italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_Σ end_CELL end_ROW end_ARG ) ) ,

    where

    Σi,j𝑿,𝓛=δ𝒙i[𝒞[j]]=𝒞[j](𝒙i)=j[k(𝒙i,)]=(δ𝑿k𝓛)i,jsubscriptsuperscriptΣ𝑿𝓛𝑖𝑗subscript𝛿subscript𝒙𝑖delimited-[]𝒞delimited-[]subscript𝑗𝒞delimited-[]subscript𝑗subscript𝒙𝑖subscript𝑗delimited-[]𝑘subscript𝒙𝑖subscriptsubscript𝛿𝑿𝑘superscript𝓛𝑖𝑗{\Sigma}^{{\bm{X}},{\bm{\mathcal{L}}}}_{i,j}=\delta_{{\bm{x}}_{i}}[\mathcal{C}% [{\mathcal{L}}_{j}]]=\mathcal{C}[{\mathcal{L}}_{j}]({\bm{x}}_{i})={\mathcal{L}% }_{j}[k({\bm{x}}_{i},\cdot)]=(\delta_{{\bm{X}}}k{\bm{\mathcal{L}}}^{\prime})_{% i,j}roman_Σ start_POSTSUPERSCRIPT bold_italic_X , bold_caligraphic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ caligraphic_C [ caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] ] = caligraphic_C [ caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT [ italic_k ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) ] = ( italic_δ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT

    and 𝚺𝓛,𝑿=(𝚺𝑿,𝓛)superscript𝚺𝓛𝑿superscriptsuperscript𝚺𝑿𝓛top{\bm{\Sigma}}^{{\bm{\mathcal{L}}},{\bm{X}}}=({\bm{\Sigma}}^{{\bm{X}},{\bm{% \mathcal{L}}}})^{\top}bold_Σ start_POSTSUPERSCRIPT bold_caligraphic_L , bold_italic_X end_POSTSUPERSCRIPT = ( bold_Σ start_POSTSUPERSCRIPT bold_italic_X , bold_caligraphic_L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. By the well-known conditioning theorem for Gaussian random variables in m+nsuperscript𝑚𝑛\mathbb{R}^{m+n}blackboard_R start_POSTSUPERSCRIPT italic_m + italic_n end_POSTSUPERSCRIPT, we arrive at

    f(𝑿)\nonscript|\nonscript𝓛[f]+ϵ=𝒚𝒩(𝝁f(𝑿)\nonscript|\nonscript𝒚,𝚺f(𝑿)\nonscript|\nonscript𝒚),conditionalf𝑿\nonscript\nonscript𝓛delimited-[]fbold-italic-ϵ𝒚similar-to𝒩superscript𝝁conditionalf𝑿\nonscript\nonscript𝒚superscript𝚺conditionalf𝑿\nonscript\nonscript𝒚{\mathrm{f}}({\bm{X}})\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{% \mathcal{L}}}[{\mathrm{f}}]+{\bm{\mathrm{\epsilon}}}={\bm{y}}\sim{% \operatorname{\mathcal{N}}\left({\bm{\mu}}^{{\mathrm{f}}({\bm{X}})\nonscript\>% |\allowbreak\nonscript\>\mathopen{}{\bm{y}}},{\bm{\Sigma}}^{{\mathrm{f}}({\bm{% X}})\nonscript\>|\allowbreak\nonscript\>\mathopen{}{\bm{y}}}\right)},roman_f ( bold_italic_X ) | bold_caligraphic_L [ roman_f ] + bold_italic_ϵ = bold_italic_y ∼ caligraphic_N ( bold_italic_μ start_POSTSUPERSCRIPT roman_f ( bold_italic_X ) | bold_italic_y end_POSTSUPERSCRIPT , bold_Σ start_POSTSUPERSCRIPT roman_f ( bold_italic_X ) | bold_italic_y end_POSTSUPERSCRIPT ) ,

    with

    𝝁f(𝑿)\nonscript|\nonscript𝒚superscript𝝁conditionalf𝑿\nonscript\nonscript𝒚\displaystyle{\bm{\mu}}^{{\mathrm{f}}({\bm{X}})\nonscript\>|\allowbreak% \nonscript\>\mathopen{}{\bm{y}}}bold_italic_μ start_POSTSUPERSCRIPT roman_f ( bold_italic_X ) | bold_italic_y end_POSTSUPERSCRIPT =m(𝑿)+(δ𝑿k𝓛)(𝓛k𝓛+𝚺)(𝒚(𝓛[m]+𝝁))absent𝑚𝑿subscript𝛿𝑿𝑘superscript𝓛superscript𝓛𝑘superscript𝓛𝚺𝒚𝓛delimited-[]𝑚𝝁\displaystyle=m({\bm{X}})+(\delta_{{\bm{X}}}k{\bm{\mathcal{L}}}^{\prime})({\bm% {\mathcal{L}}}k{\bm{\mathcal{L}}}^{\prime}+{\bm{\Sigma}})^{\dagger}({\bm{y}}-(% {\bm{\mathcal{L}}}[m]+{\bm{\mu}}))= italic_m ( bold_italic_X ) + ( italic_δ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( bold_caligraphic_L italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_Σ ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( bold_italic_y - ( bold_caligraphic_L [ italic_m ] + bold_italic_μ ) )
    and
    𝚺f(𝑿)\nonscript|\nonscript𝒚superscript𝚺conditionalf𝑿\nonscript\nonscript𝒚\displaystyle{\bm{\Sigma}}^{{\mathrm{f}}({\bm{X}})\nonscript\>|\allowbreak% \nonscript\>\mathopen{}{\bm{y}}}bold_Σ start_POSTSUPERSCRIPT roman_f ( bold_italic_X ) | bold_italic_y end_POSTSUPERSCRIPT =k(𝑿,𝑿)(δ𝑿k𝓛)(𝓛k𝓛+𝚺)(𝓛kδ𝑿).absent𝑘𝑿𝑿subscript𝛿𝑿𝑘superscript𝓛superscript𝓛𝑘superscript𝓛𝚺𝓛𝑘superscriptsubscript𝛿𝑿\displaystyle=k({\bm{X}},{\bm{X}})-(\delta_{{\bm{X}}}k{\bm{\mathcal{L}}}^{% \prime})({\bm{\mathcal{L}}}k{\bm{\mathcal{L}}}^{\prime}+{\bm{\Sigma}})^{% \dagger}({\bm{\mathcal{L}}}k\delta_{{\bm{X}}}^{\prime}).= italic_k ( bold_italic_X , bold_italic_X ) - ( italic_δ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( bold_caligraphic_L italic_k bold_caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_Σ ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( bold_caligraphic_L italic_k italic_δ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) .

    This shows that f={ωf(𝒙,ω)}𝒙𝕏fsubscriptmaps-to𝜔f𝒙𝜔𝒙𝕏{\mathrm{f}}=\{\omega\mapsto{\mathrm{f}}({\bm{x}},\omega)\}_{{\bm{x}}\in{% \mathbb{X}}}roman_f = { italic_ω ↦ roman_f ( bold_italic_x , italic_ω ) } start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_X end_POSTSUBSCRIPT is a Gaussian process on the probability space

    (Ω,,P(\nonscript|\nonscript𝓛[f]+ϵ=𝒚))(\Omega,\mathcal{F},\mathrm{P}\left(\cdot\nonscript\>\middle|\allowbreak% \nonscript\>\mathopen{}{\bm{\mathcal{L}}}[{\mathrm{f}}]+{\bm{\mathrm{\epsilon}% }}={\bm{y}}\right))( roman_Ω , caligraphic_F , roman_P ( ⋅ | bold_caligraphic_L [ roman_f ] + bold_italic_ϵ = bold_italic_y ) )

    where P(\nonscript|\nonscript𝓛[f]+ϵ=𝒚)\mathrm{P}\left(\cdot\nonscript\>\middle|\allowbreak\nonscript\>\mathopen{}{% \bm{\mathcal{L}}}[{\mathrm{f}}]+{\bm{\mathrm{\epsilon}}}={\bm{y}}\right)roman_P ( ⋅ | bold_caligraphic_L [ roman_f ] + bold_italic_ϵ = bold_italic_y ) is a regular conditional probability whose existence is guaranteed, since nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is Polish. The mean and covariance function of the conditional process evaluated at 𝒙isubscript𝒙𝑖{\bm{x}}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are given by μif(𝑿)\nonscript|\nonscript𝒚subscriptsuperscript𝜇conditionalf𝑿\nonscript\nonscript𝒚𝑖{\mu}^{{\mathrm{f}}({\bm{X}})\nonscript\>|\allowbreak\nonscript\>\mathopen{}{% \bm{y}}}_{i}italic_μ start_POSTSUPERSCRIPT roman_f ( bold_italic_X ) | bold_italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Σi,if(𝑿)\nonscript|\nonscript𝒚subscriptsuperscriptΣconditionalf𝑿\nonscript\nonscript𝒚𝑖𝑖{\Sigma}^{{\mathrm{f}}({\bm{X}})\nonscript\>|\allowbreak\nonscript\>\mathopen{% }{\bm{y}}}_{i,i}roman_Σ start_POSTSUPERSCRIPT roman_f ( bold_italic_X ) | bold_italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT. Since the points 𝑿𝑿{\bm{X}}bold_italic_X were chosen arbitrarily, this holds for any 𝒙𝕏𝒙𝕏{\bm{x}}\in{\mathbb{X}}bold_italic_x ∈ blackboard_X, which proves equations 4.3 and 4.4. ∎

Finally, we address the archetypical case, in which both the prior ff{\mathrm{f}}roman_f and the prior predictive [f]+ϵdelimited-[]fbold-italic-ϵ\mathcal{L}[{\mathrm{f}}]+{\bm{\mathrm{\epsilon}}}caligraphic_L [ roman_f ] + bold_italic_ϵ are Gaussian processes. This happens if the linear operator maps into a function space, in which point evaluation is continuous. In this article, this case occurred in sections 3.1 and 3.2, where we inferred the strong solution of a PDE from observations of the PDE residual at a finite number of domain points. See 2

  • Proof

    The linear operator [](𝑿~):𝔹n:delimited-[]~𝑿𝔹superscript𝑛\mathcal{L}[\cdot](\tilde{{\bm{X}}})\colon{\mathbb{B}}\to\mathbb{R}^{n}caligraphic_L [ ⋅ ] ( over~ start_ARG bold_italic_X end_ARG ) : blackboard_B → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is bounded, since [](𝑿~)i=δ𝒙~idelimited-[]subscript~𝑿𝑖subscript𝛿subscript~𝒙𝑖\mathcal{L}[\cdot](\tilde{{\bm{X}}})_{i}=\delta_{\tilde{{\bm{x}}}_{i}}\circ% \mathcal{L}caligraphic_L [ ⋅ ] ( over~ start_ARG bold_italic_X end_ARG ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT over~ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_L is bounded by assumption. Hence, equations 4.8, 4.9, 4.10 and 4.11 follow directly from theorem 1. Now let 𝑿=(𝒙i)i=1m+m𝕏m+m𝑿superscriptsubscriptsubscript𝒙𝑖𝑖1𝑚superscript𝑚superscript𝕏𝑚superscript𝑚{\bm{X}}=({\bm{x}}_{i})_{i=1}^{m+m^{\prime}}\in{\mathbb{X}}^{m+m^{\prime}}bold_italic_X = ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m + italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∈ blackboard_X start_POSTSUPERSCRIPT italic_m + italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. Then the linear operator

    𝓛𝑿:𝔹m+m,f(f(𝒙1)f(𝒙m)[f](𝒙m+1)[f](𝒙m+m)):subscript𝓛𝑿formulae-sequence𝔹superscript𝑚superscript𝑚maps-to𝑓superscriptmatrix𝑓subscript𝒙1𝑓subscript𝒙𝑚delimited-[]𝑓subscript𝒙𝑚1delimited-[]𝑓subscript𝒙𝑚superscript𝑚top{\bm{\mathcal{L}}}_{\bm{X}}\colon{\mathbb{B}}\to\mathbb{R}^{m+m^{\prime}},f% \mapsto\begin{pmatrix}f({\bm{x}}_{1})&\ldots&f({\bm{x}}_{m})&\mathcal{L}[f]({% \bm{x}}_{m+1})&\ldots&\mathcal{L}[f]({\bm{x}}_{m+m^{\prime}})\end{pmatrix}^{\top}bold_caligraphic_L start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT : blackboard_B → blackboard_R start_POSTSUPERSCRIPT italic_m + italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , italic_f ↦ ( start_ARG start_ROW start_CELL italic_f ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL start_CELL … end_CELL start_CELL italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_CELL start_CELL caligraphic_L [ italic_f ] ( bold_italic_x start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT ) end_CELL start_CELL … end_CELL start_CELL caligraphic_L [ italic_f ] ( bold_italic_x start_POSTSUBSCRIPT italic_m + italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT

    is bounded and 𝓛𝑿[f]subscript𝓛𝑿delimited-[]f{\bm{\mathcal{L}}}_{\bm{X}}[{\mathrm{f}}]bold_caligraphic_L start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT [ roman_f ] is Gaussian by theorem 1. This implies that {ω𝐟(𝒙,ω)}𝒙𝕏subscriptmaps-to𝜔superscript𝐟𝒙𝜔𝒙𝕏\{\omega\mapsto{\bm{\mathrm{f}}}^{\mathcal{L}}({\bm{x}},\omega)\}_{{\bm{x}}\in% {\mathbb{X}}}{ italic_ω ↦ bold_f start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT ( bold_italic_x , italic_ω ) } start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_X end_POSTSUBSCRIPT with

    𝐟(𝒙,ω)(f(𝒙,ω)[f(,ω)](𝒙))superscript𝐟𝒙𝜔matrix𝑓𝒙𝜔delimited-[]𝑓𝜔𝒙{\bm{\mathrm{f}}}^{\mathcal{L}}({\bm{x}},\omega)\coloneqq\begin{pmatrix}f({\bm% {x}},\omega)\\ \mathcal{L}[f(\cdot,\omega)]({\bm{x}})\end{pmatrix}bold_f start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT ( bold_italic_x , italic_ω ) ≔ ( start_ARG start_ROW start_CELL italic_f ( bold_italic_x , italic_ω ) end_CELL end_ROW start_ROW start_CELL caligraphic_L [ italic_f ( ⋅ , italic_ω ) ] ( bold_italic_x ) end_CELL end_ROW end_ARG )

    is a 2-output Gaussian process. By lemmas 11, 9 and 20, its mean function is given by

    𝒎(𝒙)=(𝔼P[δ𝒙[f]]𝔼P[(δ𝒙)[f]])=(m(𝒙)[m](𝒙))superscript𝒎𝒙matrixsubscript𝔼Psubscript𝛿𝒙delimited-[]fsubscript𝔼Psubscript𝛿𝒙delimited-[]fmatrix𝑚𝒙delimited-[]𝑚𝒙{\bm{m}}^{\mathcal{L}}({\bm{x}})=\begin{pmatrix}\operatorname{\mathbb{E}}_{% \mathrm{P}}\left[\delta_{\bm{x}}[{\mathrm{f}}]\right]\\ \operatorname{\mathbb{E}}_{\mathrm{P}}\left[(\delta_{\bm{x}}\circ\mathcal{L})[% {\mathrm{f}}]\right]\end{pmatrix}=\begin{pmatrix}m({\bm{x}})\\ \mathcal{L}[m]({\bm{x}})\end{pmatrix}bold_italic_m start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT ( bold_italic_x ) = ( start_ARG start_ROW start_CELL blackboard_E start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ roman_f ] ] end_CELL end_ROW start_ROW start_CELL blackboard_E start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ ( italic_δ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∘ caligraphic_L ) [ roman_f ] ] end_CELL end_ROW end_ARG ) = ( start_ARG start_ROW start_CELL italic_m ( bold_italic_x ) end_CELL end_ROW start_ROW start_CELL caligraphic_L [ italic_m ] ( bold_italic_x ) end_CELL end_ROW end_ARG )

    and its covariance function is given by

    𝒌(𝒙1,𝒙2)superscript𝒌subscript𝒙1subscript𝒙2\displaystyle{\bm{k}}^{\mathcal{L}}({\bm{x}}_{1},{\bm{x}}_{2})bold_italic_k start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) =(CovP[δ𝒙1[f],δ𝒙2[f]]CovP[δ𝒙1[f],(δ𝒙2)[f]]CovP[(δ𝒙1)[f],δ𝒙2[f]]CovP[(δ𝒙1)[f],(δ𝒙2)[f]])absentmatrixsubscriptCovPsubscript𝛿subscript𝒙1delimited-[]fsubscript𝛿subscript𝒙2delimited-[]fsubscriptCovPsubscript𝛿subscript𝒙1delimited-[]fsubscript𝛿subscript𝒙2delimited-[]fsubscriptCovPsubscript𝛿subscript𝒙1delimited-[]fsubscript𝛿subscript𝒙2delimited-[]fsubscriptCovPsubscript𝛿subscript𝒙1delimited-[]fsubscript𝛿subscript𝒙2delimited-[]f\displaystyle=\begin{pmatrix}\operatorname{Cov}_{\mathrm{P}}\left[\delta_{{\bm% {x}}_{1}}[{\mathrm{f}}],\delta_{{\bm{x}}_{2}}[{\mathrm{f}}]\right]&% \operatorname{Cov}_{\mathrm{P}}\left[\delta_{{\bm{x}}_{1}}[{\mathrm{f}}],(% \delta_{{\bm{x}}_{2}}\circ\mathcal{L})[{\mathrm{f}}]\right]\\ \operatorname{Cov}_{\mathrm{P}}\left[(\delta_{{\bm{x}}_{1}}\circ\mathcal{L})[{% \mathrm{f}}],\delta_{{\bm{x}}_{2}}[{\mathrm{f}}]\right]&\operatorname{Cov}_{% \mathrm{P}}\left[(\delta_{{\bm{x}}_{1}}\circ\mathcal{L})[{\mathrm{f}}],(\delta% _{{\bm{x}}_{2}}\circ\mathcal{L})[{\mathrm{f}}]\right]\end{pmatrix}= ( start_ARG start_ROW start_CELL roman_Cov start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_f ] , italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_f ] ] end_CELL start_CELL roman_Cov start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_f ] , ( italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_L ) [ roman_f ] ] end_CELL end_ROW start_ROW start_CELL roman_Cov start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ ( italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_L ) [ roman_f ] , italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_f ] ] end_CELL start_CELL roman_Cov start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ ( italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_L ) [ roman_f ] , ( italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_L ) [ roman_f ] ] end_CELL end_ROW end_ARG )
    =(δ𝒙1[𝒞k[δ𝒙2]]δ𝒙1[𝒞k[δ𝒙2]](δ𝒙1)[𝒞k[δ𝒙2]](δ𝒙1)[𝒞k[δ𝒙2]])absentmatrixsubscript𝛿subscript𝒙1delimited-[]subscript𝒞𝑘delimited-[]subscript𝛿subscript𝒙2subscript𝛿subscript𝒙1delimited-[]subscript𝒞𝑘delimited-[]subscript𝛿subscript𝒙2subscript𝛿subscript𝒙1delimited-[]subscript𝒞𝑘delimited-[]subscript𝛿subscript𝒙2subscript𝛿subscript𝒙1delimited-[]subscript𝒞𝑘delimited-[]subscript𝛿subscript𝒙2\displaystyle=\begin{pmatrix}\delta_{{\bm{x}}_{1}}[\mathcal{C}_{k}[\delta_{{% \bm{x}}_{2}}]]&\delta_{{\bm{x}}_{1}}[\mathcal{C}_{k}[\delta_{{\bm{x}}_{2}}% \circ\mathcal{L}]]\\ (\delta_{{\bm{x}}_{1}}\circ\mathcal{L})[\mathcal{C}_{k}[\delta_{{\bm{x}}_{2}}]% ]&(\delta_{{\bm{x}}_{1}}\circ\mathcal{L})[\mathcal{C}_{k}[\delta_{{\bm{x}}_{2}% }\circ\mathcal{L}]]\end{pmatrix}= ( start_ARG start_ROW start_CELL italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] ] end_CELL start_CELL italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_L ] ] end_CELL end_ROW start_ROW start_CELL ( italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_L ) [ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] ] end_CELL start_CELL ( italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_L ) [ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_L ] ] end_CELL end_ROW end_ARG )
    =(k(𝒙1,𝒙2)(k)(𝒙1,𝒙2)(k)(𝒙1,𝒙2)(k)(𝒙1,𝒙2)).absentmatrix𝑘subscript𝒙1subscript𝒙2𝑘superscriptsubscript𝒙1subscript𝒙2𝑘subscript𝒙1subscript𝒙2𝑘superscriptsubscript𝒙1subscript𝒙2\displaystyle=\begin{pmatrix}k({\bm{x}}_{1},{\bm{x}}_{2})&(k\mathcal{L}^{% \prime})({\bm{x}}_{1},{\bm{x}}_{2})\\ (\mathcal{L}k)({\bm{x}}_{1},{\bm{x}}_{2})&(\mathcal{L}k\mathcal{L}^{\prime})({% \bm{x}}_{1},{\bm{x}}_{2})\end{pmatrix}.= ( start_ARG start_ROW start_CELL italic_k ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL start_CELL ( italic_k caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL ( caligraphic_L italic_k ) ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL start_CELL ( caligraphic_L italic_k caligraphic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ) .

    This proves equation 4.12. ∎

B.4 On Prior Selection

In order to apply theorem 1 in practice, we need to construct our GP prior ff{\mathrm{f}}roman_f such that

  1. 1.

    ωf(,ω)maps-to𝜔f𝜔\omega\mapsto{\mathrm{f}}(\cdot,\omega)italic_ω ↦ roman_f ( ⋅ , italic_ω ) is a Gaussian random variable on some suitably chosen RKBS 𝔹𝔹{\mathbb{B}}blackboard_B, and

  2. 2.

    𝓛:𝔹n:𝓛𝔹superscript𝑛{\bm{\mathcal{L}}}\colon{\mathbb{B}}\to\mathbb{R}^{n}bold_caligraphic_L : blackboard_B → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is bounded.

Luckily, we can use existing results about the path spaces of Gaussian processes together with theorems 13, 15, 16 and 19 to verify these assumptions. It is tempting to choose 𝔹=k𝔹subscript𝑘{\mathbb{B}}={\mathbb{H}}_{k}blackboard_B = blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, i.e. the RKHS of the GP’s kernel k𝑘kitalic_k. However, this is only valid if ksubscript𝑘{\mathbb{H}}_{k}blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is finite-dimensional.

Remark 21.

Let f𝒢𝒫(m,k)similar-tof𝒢𝒫𝑚𝑘{\mathrm{f}}\sim{\operatorname{\mathcal{GP}}\left(m,k\right)}roman_f ∼ start_OPFUNCTION caligraphic_G caligraphic_P end_OPFUNCTION ( italic_m , italic_k ) be a Gaussian process with index set 𝕏𝕏{\mathbb{X}}blackboard_X and let ksubscript𝑘{\mathbb{H}}_{k}blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT be the RKHS of the covariance function k𝑘kitalic_k. If dimk=dimensionsubscript𝑘\dim{\mathbb{H}}_{k}=\inftyroman_dim blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∞, then the sample paths of ff{\mathrm{f}}roman_f almost surely do not lie in ksubscript𝑘{\mathbb{H}}_{k}blackboard_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. We refer to (Kanagawa et al., 2018, Section 4) and Steinwart (2019) for more details on RKHS sample spaces.

In the following, we will give example constructions of appropriate priors for GP regression tasks with linear operator observations.

B.4.1 Priors for GP Regression with Linear Operator Observations

The spaces CA(𝕏¯)superscript𝐶𝐴¯𝕏C^{A}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) from definition 17, particularly C𝜷(𝕏¯)superscript𝐶𝜷¯𝕏C^{{\bm{\beta}}}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT bold_italic_β end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) with A{𝜶0d\nonscript|\nonscript𝜶𝜷}𝐴conditional-set𝜶superscriptsubscript0𝑑\nonscript\nonscript𝜶𝜷A\coloneqq\{{\bm{\alpha}}\in\mathbb{N}_{0}^{d}\nonscript\>|\allowbreak% \nonscript\>\mathopen{}{\bm{\alpha}}\leq{\bm{\beta}}\}italic_A ≔ { bold_italic_α ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT | bold_italic_α ≤ bold_italic_β } and Ck(𝕏¯)superscript𝐶𝑘¯𝕏C^{k}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) with A{𝜶0d\nonscript|\nonscript|𝜶|k}𝐴conditional-set𝜶superscriptsubscript0𝑑\nonscript\nonscript𝜶𝑘A\coloneqq\{{\bm{\alpha}}\in\mathbb{N}_{0}^{d}\nonscript\>|\allowbreak% \nonscript\>\mathopen{}\lvert{\bm{\alpha}}\rvert\leq k\}italic_A ≔ { bold_italic_α ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT | | bold_italic_α | ≤ italic_k }, are useful sample spaces for many GP regression tasks, since a large number of practically relevant observation functionals, including point evaluations of the paths and their partial derivatives, as well as integrals of the paths, are bounded on these spaces. Even though the functions in CA(𝕏¯)superscript𝐶𝐴¯𝕏C^{A}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) are, technically speaking, only defined on the open and bounded set 𝕏d𝕏superscript𝑑{\mathbb{X}}\subset\mathbb{R}^{d}blackboard_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we can treat them as functions on the closure 𝕏¯¯𝕏\overline{{\mathbb{X}}}over¯ start_ARG blackboard_X end_ARG of 𝕏𝕏{\mathbb{X}}blackboard_X by continuous extension (see remark 18 for more details). In other words, we can evaluate functions in CA(𝕏¯)superscript𝐶𝐴¯𝕏C^{A}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) on the boundary 𝕏𝕏\partial{\mathbb{X}}∂ blackboard_X of 𝕏𝕏{\mathbb{X}}blackboard_X.

To fulfill assumption 1, it remains to verify that the sample paths of a given GP prior (almost surely) lie in CA(𝕏¯)superscript𝐶𝐴¯𝕏C^{A}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ). Under the assumption that mCA(𝕏¯)𝑚superscript𝐶𝐴¯𝕏m\in C^{A}(\overline{{\mathbb{X}}})italic_m ∈ italic_C start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ), Da Costa et al. (2023) show that this can be done by studying the regularity of the covariance function k𝑘kitalic_k. They also provide readily applicable results for a wide variety of covariance functions used in practice.

Example 8 (Tensor Products of Matérn Covariances).

Tensor products of 1D Matérn covariance functions

k𝝂(𝒙1,𝒙2)=i=1dkνi(x1,i,x2,i)subscript𝑘𝝂subscript𝒙1subscript𝒙2superscriptsubscriptproduct𝑖1𝑑subscript𝑘subscript𝜈𝑖subscript𝑥1𝑖subscript𝑥2𝑖k_{{\bm{\nu}}}({\bm{x}}_{1},{\bm{x}}_{2})=\prod_{i=1}^{d}k_{{\nu}_{i}}({x}_{1,% i},{x}_{2,i})italic_k start_POSTSUBSCRIPT bold_italic_ν end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT )

are a particularly convenient choice of prior covariance function, since their hyperparameters directly control the differentiability of the sample paths independently for each input dimension. For an open and bounded domain 𝕏d𝕏superscript𝑑{\mathbb{X}}\subset\mathbb{R}^{d}blackboard_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, Propositions 10 and 21 in Da Costa et al. (2023) imply that samples from a Gaussian process with mean function m𝑚mitalic_m and covariance function k𝛎subscript𝑘𝛎k_{{\bm{\nu}}}italic_k start_POSTSUBSCRIPT bold_italic_ν end_POSTSUBSCRIPT lie in C𝛃(𝕏¯)superscript𝐶𝛃¯𝕏C^{{\bm{\beta}}}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT bold_italic_β end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) (with probability 1) if νi>βisubscript𝜈𝑖subscript𝛽𝑖{\nu}_{i}>{\beta}_{i}italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and mC𝛃(𝕏¯)𝑚superscript𝐶𝛃¯𝕏m\in C^{{\bm{\beta}}}(\overline{{\mathbb{X}}})italic_m ∈ italic_C start_POSTSUPERSCRIPT bold_italic_β end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ). Any point-evaluated partial derivative fD𝛂f(𝐱)maps-to𝑓superscriptD𝛂𝑓𝐱f\mapsto\mathrm{D}^{{\bm{\alpha}}}f\left({\bm{x}}\right)italic_f ↦ roman_D start_POSTSUPERSCRIPT bold_italic_α end_POSTSUPERSCRIPT italic_f ( bold_italic_x ) with 𝛂𝛃𝛂𝛃{\bm{\alpha}}\leq{\bm{\beta}}bold_italic_α ≤ bold_italic_β and 𝐱𝕏¯𝐱¯𝕏{\bm{x}}\in\overline{{\mathbb{X}}}bold_italic_x ∈ over¯ start_ARG blackboard_X end_ARG is continuous on C𝛃(𝕏¯)superscript𝐶𝛃¯𝕏C^{{\bm{\beta}}}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT bold_italic_β end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ).

In section 3.2, we use tensor products of Matérn covariance functions to construct the GP priors. In particular, we choose νi=52=2+12subscript𝜈𝑖52212{\nu}_{i}=\frac{5}{2}=2+\frac{1}{2}italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 5 end_ARG start_ARG 2 end_ARG = 2 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG, which implies that the sample paths of the prior lie in C(2,2)(𝕏¯)superscript𝐶22¯𝕏C^{(2,2)}(\overline{{\mathbb{X}}})italic_C start_POSTSUPERSCRIPT ( 2 , 2 ) end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) and that all point-evaluated differential operators of order 2absent2\leq 2≤ 2 are continuous linear functionals on the sample space. Hence, the assumptions of corollary 2 are fulfilled, which means that the inference procedure used in this section is supported by our theoretical results above.

The sample paths of Gaussian processes with multivariate Matérn covariance functions kνsubscript𝑘𝜈k_{\nu}italic_k start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT (almost surely) lie in the Banach space 𝔹=Cp(𝕏¯)𝔹superscript𝐶𝑝¯𝕏{\mathbb{B}}=C^{p}(\overline{{\mathbb{X}}})blackboard_B = italic_C start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) if ν>p𝜈𝑝\nu>pitalic_ν > italic_p (Da Costa et al., 2023, Proposition 10).

For Gaussian processes with smooth covariance functions like the Gaussian/exponentiated quadratic or the rational quadratic covariance functions, assumption 1 holds for 𝔹=Cp(𝕏¯)𝔹superscript𝐶𝑝¯𝕏{\mathbb{B}}=C^{p}(\overline{{\mathbb{X}}})blackboard_B = italic_C start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( over¯ start_ARG blackboard_X end_ARG ) for all p0𝑝subscript0p\in\mathbb{N}_{0}italic_p ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (Da Costa et al., 2023, Corollary 13). Informally speaking, for the Gaussian covariance function, this can be seen as a limit of the argument above, since a Matérn covariance function approaches the Gaussian covariance function for ν𝜈\nu\to\inftyitalic_ν → ∞.

Gaussian processes with parametric covariance function k(𝒙1,𝒙2)=ϕ(𝒙1)𝚺ϕ(𝒙2)𝑘subscript𝒙1subscript𝒙2bold-italic-ϕsuperscriptsubscript𝒙1top𝚺bold-italic-ϕsubscript𝒙2k({\bm{x}}_{1},{\bm{x}}_{2})={\bm{\phi}}({\bm{x}}_{1})^{\top}{\bm{\Sigma}}{\bm% {\phi}}({\bm{x}}_{2})italic_k ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = bold_italic_ϕ ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Σ bold_italic_ϕ ( bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) with features ϕ:𝕏m:bold-italic-ϕ𝕏superscript𝑚{\bm{\phi}}\colon{\mathbb{X}}\to\mathbb{R}^{m}bold_italic_ϕ : blackboard_X → blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT and 𝚺m×m𝚺superscript𝑚𝑚{\bm{\Sigma}}\in\mathbb{R}^{m\times m}bold_Σ ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT positive-(semi)definite have paths in 𝔹𝔹{\mathbb{B}}blackboard_B if ϕi𝔹subscriptitalic-ϕ𝑖𝔹{\phi}_{i}\in{\mathbb{B}}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_B. In this case, assumption 1 is also satisfied, since the Gaussian measure can be explicitly constructed as the law of the random function 𝐰ϕsuperscript𝐰topbold-italic-ϕ{\bm{\mathrm{w}}}^{\top}{\bm{\phi}}bold_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ϕ, where 𝐰𝒩(𝟎,𝚺)similar-to𝐰𝒩0𝚺{\bm{\mathrm{w}}}\sim{\operatorname{\mathcal{N}}\left({\bm{0}},{\bm{\Sigma}}% \right)}bold_w ∼ caligraphic_N ( bold_0 , bold_Σ ).

B.4.2 Priors for Inferring Weak Solutions of Linear PDEs

A typical choice for the solution spaces 𝕌𝕌{\mathbb{U}}blackboard_U of linear PDEs in weak formulation (see section 2.1.1), are Sobolev spaces (Adams and Fournier, 2003). Unfortunately, it is impossible to construct a Gaussian process prior uu{\mathrm{u}}roman_u, whose paths are elements of a Sobolev space 𝕌𝕌{\mathbb{U}}blackboard_U. This is due to the fact that Sobolev spaces are, technically speaking, not function spaces, but rather spaces of equivalence classes [u]subscriptdelimited-[]𝑢similar-to[u]_{\sim}[ italic_u ] start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT of functions, which are equal almost everywhere (Adams and Fournier, 2003). By contrast, the path spaces of Gaussian processes are proper function spaces, which means that, in this setting, paths(u)𝕌pathsu𝕌\operatorname{paths}\left({\mathrm{u}}\right)\subseteq{\mathbb{U}}roman_paths ( roman_u ) ⊆ blackboard_U is impossible.

Fortunately, if the path space 𝔹paths(u)pathsu𝔹{\mathbb{B}}\supset\operatorname{paths}\left({\mathrm{u}}\right)blackboard_B ⊃ roman_paths ( roman_u ) of uu{\mathrm{u}}roman_u can be continuously embedded in 𝕌𝕌{\mathbb{U}}blackboard_U, i.e. there is a continuous and injective linear operator ι:𝔹𝕌:𝜄𝔹𝕌\iota\colon{\mathbb{B}}\to{\mathbb{U}}italic_ι : blackboard_B → blackboard_U, commonly referred to as an embedding, then the inference procedure above can still be applied. If such an embedding exists, we can interpret the paths of the GP as elements of 𝔹𝔹{\mathbb{B}}blackboard_B by applying ι𝜄\iotaitalic_ι implicitly. For instance, B[u,v]𝐵u𝑣B[{\mathrm{u}},v]italic_B [ roman_u , italic_v ] is then a shorthand notation for B[ι[u],v]𝐵𝜄delimited-[]u𝑣B[\iota[{\mathrm{u}}],v]italic_B [ italic_ι [ roman_u ] , italic_v ]. Fortunately, since the embedding is assumed to be continuous, the conditions for GP inference with linear operator observations are still met when applying ι𝜄\iotaitalic_ι implicitly. The canonical choice for the embedding in the case of Sobolev spaces is ι[u]=[u]𝜄delimited-[]𝑢subscriptdelimited-[]𝑢similar-to\iota[u]=[u]_{\sim}italic_ι [ italic_u ] = [ italic_u ] start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT.

Example 9 (Matérn covariances and Sobolev spaces).

Kanagawa et al. (2018) show that, under certain assumptions, RKHS sample spaces of GP priors with Matérn covariance functions are continuously embedded in Sobolev spaces whose smoothness depends on the parameter ν𝜈\nuitalic_ν of the covariance function. To be precise, let 𝔻d𝔻superscript𝑑{\mathbb{D}}\subset\mathbb{R}^{d}blackboard_D ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be open and bounded with Lipschitz boundary such that the cone condition (Adams and Fournier, 2003, Definition 4.6) holds. Denote by kν,lsubscript𝑘𝜈𝑙k_{\nu,l}italic_k start_POSTSUBSCRIPT italic_ν , italic_l end_POSTSUBSCRIPT the Matérn kernel with smoothness parameter ν>0𝜈0\nu>0italic_ν > 0 and lengthscale l>0𝑙0l>0italic_l > 0. Then, with probability 1, the sample paths of a Gaussian process uu{\mathrm{u}}roman_u with covariance function kν,lsubscript𝑘𝜈𝑙k_{\nu,l}italic_k start_POSTSUBSCRIPT italic_ν , italic_l end_POSTSUBSCRIPT are contained in any RKHS kν,lsubscriptsubscript𝑘superscript𝜈superscript𝑙{\mathbb{H}}_{k_{\nu^{\prime},l^{\prime}}}blackboard_H start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT with l>0superscript𝑙0l^{\prime}>0italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > 0 and

0<ν+d2m<ν0subscriptsuperscript𝜈𝑑2absentsuperscript𝑚𝜈0<\underbrace{\nu^{\prime}+\frac{d}{2}}_{\eqqcolon m^{\prime}}<\nu0 < under⏟ start_ARG italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + divide start_ARG italic_d end_ARG start_ARG 2 end_ARG end_ARG start_POSTSUBSCRIPT ≕ italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT < italic_ν (B.2)

(Kanagawa et al., 2018, Corollary 4.15 and Remark 4.15), i.e. paths(u)kν,lpathsusubscriptsubscript𝑘superscript𝜈superscript𝑙\operatorname{paths}\left({\mathrm{u}}\right)\subset{\mathbb{H}}_{k_{\nu^{% \prime},l^{\prime}}}roman_paths ( roman_u ) ⊂ blackboard_H start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Moreover, if msuperscript𝑚m^{\prime}\in\mathbb{N}italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_N, then the RKHS kν,lsubscriptsubscript𝑘superscript𝜈superscript𝑙{\mathbb{H}}_{k_{\nu^{\prime},l^{\prime}}}blackboard_H start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT is norm-equivalent to the Sobolev space Hm(𝔻)superscript𝐻superscript𝑚𝔻H^{m^{\prime}}\left({\mathbb{D}}\right)italic_H start_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( blackboard_D ) (Kanagawa et al., 2018, Example 2.6). This implies that the canonical embedding

ι:kν,lHm(𝔻),f(,ω)[f(,ω)]Hm(𝔻):𝜄formulae-sequencesubscriptsubscript𝑘superscript𝜈superscript𝑙superscript𝐻superscript𝑚𝔻maps-tof𝜔subscriptdelimited-[]f𝜔subscriptsimilar-tosuperscript𝐻superscript𝑚𝔻\iota\colon{\mathbb{H}}_{k_{\nu^{\prime},l^{\prime}}}\to H^{m^{\prime}}\left({% \mathbb{D}}\right),{\mathrm{f}}(\cdot,\omega)\mapsto[{\mathrm{f}}(\cdot,\omega% )]_{\sim_{H^{m^{\prime}}\left({\mathbb{D}}\right)}}italic_ι : blackboard_H start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT → italic_H start_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( blackboard_D ) , roman_f ( ⋅ , italic_ω ) ↦ [ roman_f ( ⋅ , italic_ω ) ] start_POSTSUBSCRIPT ∼ start_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( blackboard_D ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT (B.3)

is continuous.

For 𝕌=Hm(𝔻)𝕌superscript𝐻superscript𝑚𝔻{\mathbb{U}}=H^{m^{\prime}}\left({\mathbb{D}}\right)blackboard_U = italic_H start_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( blackboard_D ), the example above shows that the Matérn covariance function kν,lsubscript𝑘𝜈𝑙k_{\nu,l}italic_k start_POSTSUBSCRIPT italic_ν , italic_l end_POSTSUBSCRIPT with ν=m+ϵ𝜈superscript𝑚italic-ϵ\nu=m^{\prime}+\epsilonitalic_ν = italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_ϵ for any ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 leads to an admissible GP prior. The choice ϵ=12italic-ϵ12\epsilon=\frac{1}{2}italic_ϵ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG makes evaluating the covariance function particularly efficient (Rasmussen and Williams, 2006). For instance, in section 3.3, we used ν=32=1+12𝜈32112\nu=\frac{3}{2}=1+\frac{1}{2}italic_ν = divide start_ARG 3 end_ARG start_ARG 2 end_ARG = 1 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG for a weak form linear PDE with solution space 𝕌=H1(𝔻)𝕌superscript𝐻1𝔻{\mathbb{U}}=H^{1}\left({\mathbb{D}}\right)blackboard_U = italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( blackboard_D ). However, the elements of the Sobolev space Hm(𝔻)superscript𝐻𝑚𝔻H^{m}\left({\mathbb{D}}\right)italic_H start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( blackboard_D ) are only m𝑚mitalic_m-times weakly differentiable, which means that H2(𝔻)superscript𝐻2𝔻H^{2}\left({\mathbb{D}}\right)italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_D ) is not an admissible choice in section 3.2.

C Linear Partial Differential Equations

Definition 22 (Multi-index).

Using a d𝑑ditalic_d-dimensional multi-index 𝛂0d𝛂superscriptsubscript0𝑑{\bm{\alpha}}\in\mathbb{N}_{0}^{d}bold_italic_α ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we can represent (mixed) partial derivatives of arbitrary order as

|𝜶|𝒙𝜶|𝜶|x1(α1)xd(αd),superscript𝜶superscript𝒙𝜶superscript𝜶superscriptsubscript𝑥1subscript𝛼1superscriptsubscript𝑥𝑑subscript𝛼𝑑\frac{\partial^{\lvert{\bm{\alpha}}\rvert}}{\partial{\bm{x}}^{{\bm{\alpha}}}}% \coloneqq\frac{\partial^{\lvert{\bm{\alpha}}\rvert}}{\partial{x}_{1}^{({\alpha% }_{1})}\cdots\partial{x}_{d}^{({\alpha}_{d})}},divide start_ARG ∂ start_POSTSUPERSCRIPT | bold_italic_α | end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_italic_x start_POSTSUPERSCRIPT bold_italic_α end_POSTSUPERSCRIPT end_ARG ≔ divide start_ARG ∂ start_POSTSUPERSCRIPT | bold_italic_α | end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG ,

where |𝛂|i=1dαi𝛂superscriptsubscript𝑖1𝑑subscript𝛼𝑖\lvert{\bm{\alpha}}\rvert\coloneqq\sum_{i=1}^{d}{\alpha}_{i}| bold_italic_α | ≔ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. If the variables w.r.t. which we differentiate are clear from the context, we also denote this (mixed) partial derivative by D𝛂superscriptD𝛂\mathrm{D}^{{\bm{\alpha}}}roman_D start_POSTSUPERSCRIPT bold_italic_α end_POSTSUPERSCRIPT. For two multi-indices 𝛂,𝛂0d𝛂superscript𝛂superscriptsubscript0𝑑{\bm{\alpha}},{\bm{\alpha}}^{\prime}\in\mathbb{N}_{0}^{d}bold_italic_α , bold_italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we write 𝛂𝛂𝛂superscript𝛂{\bm{\alpha}}\leq{\bm{\alpha}}^{\prime}bold_italic_α ≤ bold_italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT iff αiαisubscript𝛼𝑖subscriptsuperscript𝛼𝑖{\alpha}_{i}\leq{\alpha}^{\prime}_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for all i=1,,d𝑖1𝑑i=1,\dotsc,ditalic_i = 1 , … , italic_d.

Definition 23 (Linear differential operator).

A linear differential operator 𝒟:𝕌𝕍:𝒟𝕌𝕍\mathcal{D}\colon{\mathbb{U}}\to{\mathbb{V}}caligraphic_D : blackboard_U → blackboard_V of order k𝑘kitalic_k between a space 𝕌𝕌{\mathbb{U}}blackboard_U of dsuperscriptsuperscript𝑑\mathbb{R}^{d^{\prime}}blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT-valued functions and a space 𝕍𝕍{\mathbb{V}}blackboard_V of real-valued functions defined on some common open domain 𝔻d𝔻superscript𝑑{\mathbb{D}}\subset\mathbb{R}^{d}blackboard_D ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a linear operator that linearly combines partial derivatives up to k𝑘kitalic_k-th order of its input function, i.e.

𝒟[𝒖]i=1d𝜶0d,|𝜶|kAi,𝜶D𝜶𝒖i,𝒟delimited-[]𝒖superscriptsubscript𝑖1superscript𝑑subscriptformulae-sequence𝜶superscriptsubscript0𝑑𝜶𝑘subscript𝐴𝑖𝜶superscriptD𝜶subscript𝒖𝑖\mathcal{D}[{\bm{u}}]\coloneqq\sum_{i=1}^{d^{\prime}}\sum_{{\bm{\alpha}}\in% \mathbb{N}_{0}^{d},\lvert{\bm{\alpha}}\rvert\leq k}A_{i,{\bm{\alpha}}}\mathrm{% D}^{{\bm{\alpha}}}{\bm{u}}_{i},caligraphic_D [ bold_italic_u ] ≔ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT bold_italic_α ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , | bold_italic_α | ≤ italic_k end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i , bold_italic_α end_POSTSUBSCRIPT roman_D start_POSTSUPERSCRIPT bold_italic_α end_POSTSUPERSCRIPT bold_italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where Ai,𝛂subscript𝐴𝑖𝛂A_{i,{\bm{\alpha}}}\in\mathbb{R}italic_A start_POSTSUBSCRIPT italic_i , bold_italic_α end_POSTSUBSCRIPT ∈ blackboard_R for every i{1,,d}𝑖1superscript𝑑i\in\{1,\dotsc,d^{\prime}\}italic_i ∈ { 1 , … , italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } and every multi-index 𝛂0d𝛂superscriptsubscript0𝑑{\bm{\alpha}}\in\mathbb{N}_{0}^{d}bold_italic_α ∈ blackboard_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with |𝛂|k𝛂𝑘\lvert{\bm{\alpha}}\rvert\leq k| bold_italic_α | ≤ italic_k.

C.1 Weak Derivatives and Sobolev Spaces

Definition 24 (Test Function).

Let 𝔻d𝔻superscript𝑑{\mathbb{D}}\subset\mathbb{R}^{d}blackboard_D ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be open and let

Cc(𝔻){ϕC(𝔻,)\nonscript|\nonscriptsupp(ϕ)𝔻 is compact}superscriptsubscript𝐶𝑐𝔻conditional-setitalic-ϕsuperscript𝐶𝔻\nonscript\nonscriptsuppitalic-ϕ𝔻 is compactC_{c}^{\infty}\left({\mathbb{D}}\right)\coloneqq\{\phi\in C^{\infty}({\mathbb{% D}},\mathbb{R})\nonscript\>|\allowbreak\nonscript\>\mathopen{}\operatorname{% supp}\left(\phi\right)\subset{\mathbb{D}}\text{ is compact}\}italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( blackboard_D ) ≔ { italic_ϕ ∈ italic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( blackboard_D , blackboard_R ) | roman_supp ( italic_ϕ ) ⊂ blackboard_D is compact }

be the space of smooth functions with compact support in 𝔻𝔻{\mathbb{D}}blackboard_D. A function ϕCc(𝔻)italic-ϕsuperscriptsubscript𝐶𝑐𝔻\phi\in C_{c}^{\infty}\left({\mathbb{D}}\right)italic_ϕ ∈ italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( blackboard_D ) is dubbed test function and we refer to Cc(𝔻)superscriptsubscript𝐶𝑐𝔻C_{c}^{\infty}\left({\mathbb{D}}\right)italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( blackboard_D ) as the space of test functions.

Theorem 25 (Sobolev Spaces121212This theorem is a summary of (Adams and Fournier, 2003, Definitions 3.1 and 3.2 and Theorems 3.3 and 3.6)).

Let 𝔻d𝔻superscript𝑑{\mathbb{D}}\subset\mathbb{R}^{d}blackboard_D ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be open, k>0𝑘subscriptabsent0k\in\mathbb{N}_{>0}italic_k ∈ blackboard_N start_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT, and p[1,){}𝑝1p\in[1,\infty)\cup\{\infty\}italic_p ∈ [ 1 , ∞ ) ∪ { ∞ }. The functional

uk,p,𝔻{(|α|kDαuLp(𝔻)p)1/pif p<,max|α|kDαuL(𝔻)if p=,\lVert u\rVert_{k,p,{\mathbb{D}}}\coloneqq\begin{cases}\left(\sum_{\lvert% \alpha\rvert\leq k}\lVert\mathrm{D}^{\alpha}u\rVert_{L_{p}\left({\mathbb{D}}% \right)}^{p}\right)^{\nicefrac{{1}}{{p}}}&\text{if }p<\infty,\\ \max_{\lvert\alpha\rvert\leq k}\lVert\mathrm{D}^{\alpha}u\rVert_{L_{\infty}% \left({\mathbb{D}}\right)}&\text{if }p=\infty,\end{cases}∥ italic_u ∥ start_POSTSUBSCRIPT italic_k , italic_p , blackboard_D end_POSTSUBSCRIPT ≔ { start_ROW start_CELL ( ∑ start_POSTSUBSCRIPT | italic_α | ≤ italic_k end_POSTSUBSCRIPT ∥ roman_D start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_u ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( blackboard_D ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL if italic_p < ∞ , end_CELL end_ROW start_ROW start_CELL roman_max start_POSTSUBSCRIPT | italic_α | ≤ italic_k end_POSTSUBSCRIPT ∥ roman_D start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_u ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( blackboard_D ) end_POSTSUBSCRIPT end_CELL start_CELL if italic_p = ∞ , end_CELL end_ROW (C.1)

where the DαsuperscriptD𝛼\mathrm{D}^{\alpha}roman_D start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT are weak partial derivatives, is called a Sobolev norm. A Sobolev norm uk,p,𝔻subscriptdelimited-∥∥𝑢𝑘𝑝𝔻\lVert u\rVert_{k,p,{\mathbb{D}}}∥ italic_u ∥ start_POSTSUBSCRIPT italic_k , italic_p , blackboard_D end_POSTSUBSCRIPT is a norm on subspaces of Lp(𝔻)subscript𝐿𝑝𝔻L_{p}\left({\mathbb{D}}\right)italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( blackboard_D ), on which the right-hand side is well-defined and finite. A Sobolev space of order k𝑘kitalic_k is defined as the subspace

Wk,p(𝔻){uLp(𝔻)\nonscript|\nonscriptDαuLp(𝔻)for|α|k}.superscript𝑊𝑘𝑝𝔻conditional-set𝑢subscript𝐿𝑝𝔻\nonscript\nonscriptsuperscriptD𝛼𝑢subscript𝐿𝑝𝔻for𝛼𝑘W^{k,p}\left({\mathbb{D}}\right)\coloneqq\{u\in L_{p}\left({\mathbb{D}}\right)% \nonscript\>|\allowbreak\nonscript\>\mathopen{}\mathrm{D}^{\alpha}u\in L_{p}% \left({\mathbb{D}}\right)\ \text{for}\ \lvert\alpha\rvert\leq k\}.italic_W start_POSTSUPERSCRIPT italic_k , italic_p end_POSTSUPERSCRIPT ( blackboard_D ) ≔ { italic_u ∈ italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( blackboard_D ) | roman_D start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_u ∈ italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( blackboard_D ) for | italic_α | ≤ italic_k } .

of Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Sobolev spaces Wk,p(𝔻)superscript𝑊𝑘𝑝𝔻W^{k,p}\left({\mathbb{D}}\right)italic_W start_POSTSUPERSCRIPT italic_k , italic_p end_POSTSUPERSCRIPT ( blackboard_D ) are Banach spaces under the Sobolev norm k,p,𝔻subscriptdelimited-∥∥𝑘𝑝𝔻\lVert\cdot\rVert_{k,p,{\mathbb{D}}}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_k , italic_p , blackboard_D end_POSTSUBSCRIPT. The Sobolev space Hk(𝔻)W2,k(𝔻)superscript𝐻𝑘𝔻superscript𝑊2𝑘𝔻H^{k}\left({\mathbb{D}}\right)\coloneqq W^{2,k}\left({\mathbb{D}}\right)italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( blackboard_D ) ≔ italic_W start_POSTSUPERSCRIPT 2 , italic_k end_POSTSUPERSCRIPT ( blackboard_D ) is a separable Hilbert space with inner product

u1,u2k,𝔻|α|kDαu1,Dαu2L2(𝔻)subscriptsubscript𝑢1subscript𝑢2𝑘𝔻subscript𝛼𝑘subscriptsuperscriptD𝛼subscript𝑢1superscriptD𝛼subscript𝑢2subscript𝐿2𝔻\langle u_{1},u_{2}\rangle_{k,{\mathbb{D}}}\coloneqq\sum_{\lvert\alpha\rvert% \leq k}\langle\mathrm{D}^{\alpha}u_{1},\mathrm{D}^{\alpha}u_{2}\rangle_{L_{2}% \left({\mathbb{D}}\right)}⟨ italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_k , blackboard_D end_POSTSUBSCRIPT ≔ ∑ start_POSTSUBSCRIPT | italic_α | ≤ italic_k end_POSTSUBSCRIPT ⟨ roman_D start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , roman_D start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( blackboard_D ) end_POSTSUBSCRIPT (C.2)

and norm k,𝔻,k,𝔻=k,2,𝔻.subscriptdelimited-∥∥𝑘𝔻subscript𝑘𝔻subscriptdelimited-∥∥𝑘2𝔻\lVert\cdot\rVert_{k,{\mathbb{D}}}\coloneqq\sqrt{\langle\cdot,\cdot\rangle_{k,% {\mathbb{D}}}}=\lVert\cdot\rVert_{k,2,{\mathbb{D}}}.∥ ⋅ ∥ start_POSTSUBSCRIPT italic_k , blackboard_D end_POSTSUBSCRIPT ≔ square-root start_ARG ⟨ ⋅ , ⋅ ⟩ start_POSTSUBSCRIPT italic_k , blackboard_D end_POSTSUBSCRIPT end_ARG = ∥ ⋅ ∥ start_POSTSUBSCRIPT italic_k , 2 , blackboard_D end_POSTSUBSCRIPT .

References

  • Adams and Fournier (2003) Robert A. Adams and John J. F. Fournier. Sobolev Spaces, volume 140 of Pure and Applied Mathematics. Elsevier, second edition, 2003.
  • Agrell (2019) Christian Agrell. Gaussian processes with linear operator inequality constraints. Journal of Machine Learning Research, 20(135):1–36, 2019.
  • Albert (2019) Christopher G. Albert. Gaussian processes for data fulfilling linear differential equations. Proceedings of the 39th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, 33(1), 2019. doi:10.3390/proceedings2019033005.
  • Aliprantis and Border (2006) Charalambos D. Aliprantis and Kim C. Border. Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, Berlin, Heidelberg, third edition, 2006. doi:10.1007/3-540-29587-9.
  • Alt (2012) Hans Wilhelm Alt. Lineare Funktionalanalysis: Eine anwendungsorientierte Einführung. Springer, Berlin, Heidelberg, 2012. doi:10.1007/978-3-642-22261-0.
  • Alvarez et al. (2009) Mauricio Alvarez, David Luengo, and Neil D. Lawrence. Latent force models. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 5, pages 9–16, Clearwater Beach, Florida, USA, 2009.
  • Azangulov et al. (2022) Iskander Azangulov, Andrei Smolensky, Alexander Terenin, and Viacheslav Borovitskiy. Stationary kernels and Gaussian processes on Lie groups and their homogeneous spaces i: the compact case. arXiv preprint arXiv:2208.14960, 2022.
  • Bishop (2006) Christopher M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York, first edition, 2006.
  • Black and Scholes (1973) Fischer Black and Myron Scholes. The pricing of options and corporate liabilities. Journal of Political Economy, 81(3):637–654, 1973. doi:10.1086/260062.
  • Bogachev (1998) Vladimir Igorevich Bogachev. Gaussian Measures, volume 62 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, Rhode Island, 1998.
  • Borthwick (2018) David Borthwick. Introduction to Partial Differential Equations. Universitext. Springer, first edition, 2018. doi:10.1007/978-3-319-48936-0.
  • Bradbury et al. (2018) James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  • Cockayne et al. (2017) Jon Cockayne, Chris Oates, Tim Sullivan, and Mark Girolami. Probabilistic numerical methods for PDE-constrained Bayesian inverse problems. In Geert Verdoolaege, editor, Proceedings of the 36th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, volume 1853 of AIP Conference Proceedings, pages 060001–1 – 060001–8, 2017. doi:10.1063/1.4985359.
  • Cockayne et al. (2019a) Jon Cockayne, Chris J. Oates, Ilse C.F. Ipsen, and Mark Girolami. A Bayesian conjugate gradient method (with discussion). Bayesian Analysis, 14(3):937–1012, 2019a. doi:10.1214/19-BA1145.
  • Cockayne et al. (2019b) Jon Cockayne, Chris J. Oates, T. J. Sullivan, and Mark Girolami. Bayesian probabilistic numerical methods. SIAM Review, 61(4):756–789, 2019b. doi:10.1137/17M1139357.
  • Da Costa et al. (2023) Nathaël Da Costa, Marvin Pförtner, Lancelot Da Costa, and Philipp Hennig. Sample path regularity of Gaussian processes from the covariance kernel, 2023.
  • Da Prato and Zabczyk (1992) Guiseppe Da Prato and Jerzy Zabczyk. Stochastic Equations in Infinite Dimensions. Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 1992. doi:10.1017/CBO9780511666223.
  • Evans (2010) Lawrence C. Evans. Partial Differential Equations, volume 19 of Graduate Studies in Mathematics. American Mathematical Society, Providence, Rhode Island, second edition, 2010.
  • Fasshauer (1997) Gregory E. Fasshauer. Solving partial differential equations by collocation with radial basis functions. In Alain Le Méhauté, Christophe Rabut, and Larry L. Schumaker, editors, Surface Fitting and Multiresolution Methods, pages 131–138. Vanderbilt University Press, Nashville, TN, 1997.
  • Fasshauer (1999) Gregory E. Fasshauer. Solving differential equations with radial basis functions: multilevel methods and smoothing. Advances in Computational Mathematics, 11:139–159, November 1999. doi:10.1023/A:1018919824891.
  • Fletcher (1984) C. A. J. Fletcher. Computational Galerkin Methods. Scientific Computation. Springer, Berlin, Heidelberg, first edition, 1984. doi:10.1007/978-3-642-85949-6.
  • Fourier (1822) Jean Baptiste Joseph Fourier. Théorie analytique de la chaleur. Firmin Didot, 1822. doi:10.1017/CBO9780511693229.
  • Girolami et al. (2021) Mark Girolami, Eky Febrianto, Yin Ge, and Fehmi Cirak. The statistical finite element method (statFEM) for coherent synthesis of observation data and model predictions. Computer Methods in Applied Mechanics and Engineering, 275:113533, 2021. doi:10.1016/j.cma.2020.113533.
  • Golub and Van Loan (2013) Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences. The Johns Hopkins University Press, Baltimore, fourth edition, 2013.
  • Graepel (2003) Thore Graepel. Solving noisy linear operator equations by Gaussian processes: Application to ordinary and partial differential equations. In Proceedings of the 20th International Conference on Machine Learning, pages 234–241. AAAI Press, 2003.
  • Haasdonk and Burkhardt (2007) Bernard Haasdonk and Hans Burkhardt. Invariant kernel functions for pattern analysis and machine learning. Machine learning, 68(1):35–61, 2007.
  • Hennig et al. (2015) Philipp Hennig, Michael A. Osborne, and Mark Girolami. Probabilistic numerics and uncertainty in computations. Proceedings of the Royal Society A, 471(2179), 2015. doi:10.1098/rspa.2015.0142.
  • Hennig et al. (2022) Philipp Hennig, Michael A. Osborne, and Hans P. Kersting. Probabilistic Numerics: Computation as Machine Learning. Cambridge University Press, June 2022. doi:10.1017/9781316681411.
  • Holder (2005) David S. Holder, editor. Electrical Impedance Tomography: Methods, History and Applications. Institute of Physics Medical Physics Series. Institute of Physics Publishing, Bristol, 2005.
  • Holderrieth et al. (2021) Peter Holderrieth, Michael J Hutchinson, and Yee Whye Teh. Equivariant learning of stochastic fields: Gaussian processes and steerable conditional neural processes. In International Conference on Machine Learning, pages 4297–4307. PMLR, 2021.
  • Kanagawa et al. (2018) Motonobu Kanagawa, Philipp Hennig, Dino Sejdinovic, and Bharath K. Sriperumbudur. Gaussian processes and kernel methods: A review on connections and equivalences. arXiv preprint arXiv:1807.02582, 2018.
  • Karniadakis et al. (2021) George Em Karniadakis, Ioannis G. Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021. doi:https://doi.org/10.1038/s42254-021-00314-5.
  • Kazhdan et al. (2006) Michael Kazhdan, Matthew Bolitho, and Hugues Hoppe. Poisson surface reconstruction. In Proceedings of the 4th Eurographics Symposium on Geometry Processing, volume 7, 2006.
  • Klenke (2014) Achim Klenke. Probability Theory: A Comprehensive Course. Universitext. Springer, London, second edition, 2014. doi:10.1007/978-1-4471-5361-0.
  • Krämer et al. (2022) Nicholas Krämer, Jonathan Schmidt, and Philipp Hennig. Probabilistic numerical method of lines for time-dependent partial differential equations. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 151, pages 625–639. PMLR, 2022.
  • Lautrup (2005) Benny Lautrup. The PDE’s of continuum physics. In Proceedings of the Workshop on PDE methods in Computer Graphics, 2005.
  • Li et al. (2020) Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations. In ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations, 2020. doi:10.48550/arXiv.2003.03485.
  • Li et al. (2021) Zongyi Li, Nikola B. Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew M. Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations, 2021. doi:10.48550/arXiv.2010.08895.
  • Lienhard and Lienhard (2020) John H. Lienhard, IV and John H. Lienhard, V. A Heat Transfer Textbook. Phlogiston Press, Cambridge, MA, fifth edition, 2020.
  • Lin et al. (2022) Rong Rong Lin, Hai Zhang Zhang, and Jun Zhang. On reproducing kernel Banach spaces: Generic definitions and unified framework of constructions. Acta Mathematica Sinica, English Series, 38(8):1459–1483, August 2022. doi:10.1007/s10114-022-1397-7.
  • Logg et al. (2012) Anders Logg, Kent-Andre Mardal, and Garth Wells, editors. Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering. Springer, Berlin, Heidelberg, 2012. doi:10.1007/978-3-642-23099-8.
  • Maxwell (1865) James Clerk Maxwell. A dynamical theory of the electromagnetic field. Philosophical transactions of the Royal Society of London, 155:459–512, 1865.
  • Michaud (2019) Pierre Michaud. A simple model of processor temperature for deterministic turbo clock frequency. resreport RR-9308, Inria Rennes, 2019. URL https://hal.inria.fr/hal-02391970.
  • Oates and Sullivan (2019) Chris J. Oates and Tim J. Sullivan. A modern retrospective on probabilistic numerics. Statistics and Computing, 29:1335–1351, 2019. doi:10.1007/s11222-019-09902-z.
  • Owhadi and Scovel (2018) Houman Owhadi and Clint Scovel. Conditioning Gaussian measure on Hilbert space. Journal of Mathematical and Statistical Analysis, 1(109), 2018.
  • Owhadi et al. (2019) Houman Owhadi, Clint Scovel, and Florian Schäfer. Statistical numerical approximation. Notices of the American Mathematical Society, 66(10):1608–1617, 2019. doi:10.1090/noti1963.
  • Raissi et al. (2017) Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Machine learning of linear differential equations using Gaussian processes. Journal of Computational Physics, 348:683–693, 2017. doi:10.1016/j.jcp.2017.07.050.
  • Raissi et al. (2019) Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019. doi:https://doi.org/10.1016/j.jcp.2018.10.045.
  • Rajput and Cambanis (1972) Balram S. Rajput and Stamatis Cambanis. Gaussian processes and Gaussian measures. The Annals of Mathematical Statistics, 43(6):1944–1952, 1972. doi:10.1214/aoms/1177690865.
  • Rasmussen and Williams (2006) Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, London, England, 2006.
  • Reisert and Burkhardt (2007) Marco Reisert and Hans Burkhardt. Learning equivariant functions with matrix valued kernels. Journal of Machine Learning Research, 8(3), 2007.
  • Rudin (1991) Walter Rudin. Functional Analysis. International Series in Pure and Applied Mathematics. McGraw-Hill, New York, second edition, 1991.
  • Särkkä (2011) Simo Särkkä. Linear operators and stochastic partial differential equations in Gaussian process regression. In Artificial Neural Networks and Machine Learning – ICANN 2011, pages 151–158, Berlin, Heidelberg, 2011. doi:10.1007/978-3-642-21738-8_20.
  • Särkkä et al. (2013) Simo Särkkä, Arno Solin, and Jouni Hartikainen. Spatiotemporal learning via infinite-dimensional Bayesian filtering and smoothing: A look at Gaussian process regression through Kalman filtering. IEEE Signal Processing Magazine, 30(4):51–61, 2013. doi:10.1109/MSP.2013.2246292.
  • Steinwart (2019) Ingo Steinwart. Convergence types and rates in generic Karhunen-Loève expansions with applications to sample path properties. Potential Analysis, 51:361–395, 2019. doi:10.1007/s11118-018-9715-5.
  • Steinwart and Christmann (2008) Ingo Steinwart and Andreas Christmann. Support Vector Machines. Information Science and Statistics. Springer, New York, first edition, 2008. doi:10.1007/978-0-387-77242-4.
  • von Harrach (2021) Bastian von Harrach. Numerik partieller differentialgleichungen. Lecture Notes, 2021. URL https://www.math.uni-frankfurt.de/~harrach/lehre/Numerik_PDGL.pdf.
  • Wang et al. (2021) Junyang Wang, Jon Cockayne, Oksana Chkrebtii, Tim J. Sullivan, and Chris J. Oates. Bayesian numerical methods for nonlinear partial differential equations. Statistics and Computing, 31(55), 2021. doi:10.1007/s11222-021-10030-w.
  • Wenger and Hennig (2020) Jonathan Wenger and Philipp Hennig. Probabilistic linear solvers for machine learning. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  • Wenger et al. (2021) Jonathan Wenger, Nicholas Krämer, Marvin Pförtner, Jonathan Schmidt, Nathanael Bosch, Nina Effenberger, Johannes Zenn, Alexandra Gessner, Toni Karvonen, François-Xavier Briol, Maren Mahsereci, and Philipp Hennig. ProbNum: Probabilistic numerics in python, 2021.
  • Yosida (1995) Kôsaku Yosida. Functional Analysis, volume 123 of Classics in Mathematics. Springer, sixth edition, 1995. doi:10.1007/978-3-642-61859-8.