Reproducing Kernel Hilbert Spaces

johny00
14_rkhs.pdf

1 Introduction These notes introduces a particular kind of Hilbert space known as a reproducing kernel Hilbert space (RKHS). We will establish connections with kernels, defined previously, and show that kernels and RKHSs are in one-to- one correspondence. This material is largely drawn from Chapter 4 of [1], although some results are presented in a slightly different way to ease digestion.

2 Reproducing Kernel Hilbert Spaces Throughout these notes, we use the term “Hilbert function space over X ” to refer to a Hilbert space whose elements are functions f : X 7→ R.

Definition 1. (Reproducing Kernel) Let F be a Hilbert function space over X . A reproducing kernel of F is a function k : X × X 7→ R which satisfies the following two properties:

1. k (·,x) ∈F for any x ∈X.

2. (Reproducing property) For any f ∈F and any x ∈X, f (x) = 〈f,k (·,x)〉F.

Theorem 1. Let F be a Hilbert function space over X. The following are equivalent:

1. F has a reproducing kernel.

2. For any x ∈X, the function δx : F 7→ R defined by δx (f) = f (x) is continuous.

Definition 2. Let F be a Hilbert function space over X. We say F is a reproducing kernel Hilbert space over X if it satisfies the conditions of the previous theorem.

Proof. ( (1) ⇒ (2) ) . Suppose that F has a reproducing kernel, k : X ×X 7→ R. We need to show that if (fn)

∞ n=1 converges to f ∈F, i.e., ‖fn −f‖F → 0, then for every x ∈X , |δx (fn) −δx (f)| = |fn (x) −f (x)|→

0. Now

δx (f) = f (x) = 〈f,k (·,x)〉F = 〈 lim n→∞

fn,k (·,x)〉F (a) = lim

n→∞ 〈fn,k (·,x)〉F = lim

n→∞ fn (x) = lim

n→∞ δx (fn)

Note that the identity “(a)” is implied by the continuity of inner product operator of its first argument. Before proving the reverse implication, we state a classical result from functional analysis named the Riesz representation theorem.

Theorem 2 (Riesz representation theorem). Let F be a Hilbert space and L : F 7→ R be a linear functional on F. Then L is continuous if and only if there exists Φ ∈F such that L (f) = 〈f, Φ〉F for any f ∈F.

1

2

( (2) ⇒ (1) ) . Let x ∈ X . It’s trivial to see that δx is a linear functional, so due to the assumed continuity of δx, Theorem 2 ensures the existence of a Φx ∈ F such that δx (f) = f (x) = 〈f, Φx〉F. Now define k (x2,x1) := Φx1 (x2) ∈ R. Note that k (·,x) = Φx ∈F (establishing the first property of reproducing kernels). Also,

f (x) = 〈f, Φx〉F = 〈f,k (·,x)〉F which shows that the reproducing property holds.

The next result establishes that kernels and reproducing kernels are the same.

Theorem 3. Let k : X ×X 7→ R. Then k is a kernel if and only if k is a reproducing kernel of some RKHS F over X.

Proof. (⇐) . We first prove the reverse implication. Let k be the reproducing kernel of F and define Φ : X 7→F by Φ (x) = k (·,x). Now for any x′ ∈X , consider the function fx′ = k (·,x′). Using the reproducing property and symmetry of the inner product, we obtain

k (x,x′) = fx′ (x) = 〈fx′,k (·,x)〉F = 〈k (·,x′) ,k (·,x)〉F = 〈Φ (x′) , Φ (x)〉F = 〈Φ (x) , Φ (x′)〉F.

Hence, k is a kernel. (⇒) . The following lemma is needed.

Lemma 1. If k is a kernel, then it has a feature space F̃ and a feature map Φ̃ : X 7→ F̃ such that the map V : F̃ 7→ (X 7→ R) (the range of V is the set of functions from X to R) given by

V (ω) = 〈ω, Φ̃ (·)〉F̃ (1)

is injective.

Proof of Lemma 1. Let F0, Φ0 be a feature space and feature map associated to k, respectively, i.e., k (x,x′) = 〈Φ0 (x) , Φ0 (x′)〉F0 . Now define

F = {f = 〈ω, Φ0 (·)〉F0 | ω ∈F0}

Since Φ0 is not known to be onto, it’s possible to have multiple ω’s mapping to f. Consider the nullspace of V, null (V) = {ω : Vω = 〈ω, Φ0 (·)〉F0 = 0}, where the 0 refers to the zero function. One can easily show using the continuity of an inner product in its first argument that null (V) is closed. The projection theorem then implies that F0 = null (V) ⊕F̃ (⊕ denotes the direct sum) where

F̃ = (null (V))⊥ = {v ∈F0 : 〈ω,v〉F0 = 0 ∀ ω ∈ null (V)}

F̃ is closed by a routine argument, and therefore F̃ is a Hilbert space by inheriting the inner product of F̃. For any ω ∈ null (V) and any x ∈ X , the construction of V leads to V (ω) = 〈ω, Φ0 (x)〉F0 = 0. Thus, Φ0 (x) ∈ (null (V))

⊥ = F̃. Hence, Φ̃ = Φ0 is a valid feature map on X → F̃. Finally, the restriction

V |F̃: F̃ 7→F is such that ker (V |F̃) = {0}, and therefore is injective.

Let (Φ̃,F̃) be as in Lemma 1. Define

F = { f = 〈ω, Φ̃ (·)〉F̃ | ω ∈ F̃

} F is clearly a space of functions. The linear map V : F̃ 7→ F defined by identity (1) (linearity of V is an immediate consequence of linearity an of inner product in its first argument) is bijective (injective and onto) by the constructive definition of F and therefore V is invertible. Now, define an inner product on F by

〈f,f′〉F = 〈V−1f,V−1f′〉F̃.

3

Since V is a bijective linear map, so is its inverse, and so it’s straightforward to check that 〈·, ·〉F defines a valid inner product. Therefore V is also an isometry (distance preserving map). We also need to show that F is complete. Thus, let {fn}n∈N be a Cauchy sequence in F. Since V is an isometry, we have that ‖fn −fm‖F =

∥∥V−1fn −V−1fm∥∥F̃ for any m,n ∈ N. Hence, (V−1fn)n∈N is a Cauchy sequence in the complete space F̃ which means that it converges to a limit f̃ ∈ F̃. Since V is an isometry it is continuous, and therefore fn = V(V−1fn) converges to Vf̃ ∈F. This shows that F is complete.

Next, we show that k is a reproducing kernel of F. If x ∈X , then

k (·,x) = 〈Φ̃ (x) , Φ̃ (·)〉F̃ = VΦ̃ (x) ∈F

This proves the first property of a reproducing kernel. Lastly, for any f ∈F and any x ∈X , observe

f (x) = 〈V−1 (f) , Φ̃ (x)〉F̃ = 〈f,VΦ̃ (x)〉F = 〈f,k (·,x)〉F

which established the reproducing property.

3 Kernels and RKHSs are in one-to-one correspondence

The next two results establish that kernels and RKHSs are in one-to-one correspondence.

Theorem 4. If F is a RKHS, then it has a unique reproducing kernel.

The proof is left as an exercise below.

Theorem 5. Let k be a kernel on X, and let F0 and Φ0 : X 7→F0 be an associated feature space and feature map, respectively, such that the mapping V : F0 7→ F = {f = 〈ω, Φ0 (·)〉F0 | ω ∈F0} is injective, whose existence is guarenteed by Lemma 1. Then,

1. F is the unique RKHS with k as its reproducing kernel.

2. The set Fpre = {

n∑ i=1

αik (·,xi) | n ∈ N, αi ∈ R,xi ∈X }

is dense in F.

3. For f,f′ ∈Fpre with f = n∑

i=1

αik (·,xi) and f′ = n′∑ i=1

α′ik (·,x ′ i),

〈f,f′〉F = n∑

i=1

n′∑ j=1

αiα ′ jk ( xi,x

′ j

) .

Proof. We already have seen in the proof of Theorem 3 that F is an RKHS. The proof of uniqueness is presented after proving the other two parts of the theorem.

Proof of part 2. Let F̃ be any RKHS with k as its reproducing kernel. Since k (·,x) ∈ F̃ for any x ∈X , we see that Fpre ⊆ F̃. Now, suppose towards a contradiction that Fpre is not dense in F̃. Then the orthogonal complement of the closure of Fpre has a non-trivial element, i.e.

( Fpre

)⊥ 6= {0}. Thus, there

exists f ∈ ( Fpre

)⊥ and x ∈X such that f (x) 6= 0. But k (·,x) ∈Fpre, and so

0 = 〈f,k (·,x)〉F = f (x) 6= 0,

a contradiction.

4

Proof of part 3. Let again F̃ be any RKHS with k as its reproducing kernel. Using the reproducing property and bi-linearity of inner product, we get

〈f,f′〉F̃ = n∑

i=1

n′∑ j=1

αiα ′ j〈k (·,xi) ,k

( ·,x′j

) 〉F̃ =

n∑ i=1

n′∑ j=1

αiα ′ jk ( xi,x

′ j

) .

Proof of part 1. Assume that F1 and F2 are two RKHSs with the same reproducing kernel k. Choose an arbitrary f ∈F1. We will show f ∈F2, establishing F1 ⊆F2. The reverse inclusion is established similarly.

From part 2, there is a sequence (fn)n∈N,fn ∈ Fpre, such that ‖f −fn‖F1 → 0. By part 3, ‖·‖F1 and ‖·‖F2 coincide on Fpre, and so (fn)n∈N is Cauchy in F2. Therefore, (fn)n∈N converges to some g ∈ F2. Moreover, for any x ∈X , we have

g(x) = lim n→∞

fn(x) (point evaluation at x is continuous on F2)

= f(x), (point evaluation at x is continuous on F1)

so f = g. This shows that F1 ⊆F2, and the reverse inclusion holds similarly, so F1 = F2 as sets. It remains to show that the inner products agree. Thus, let f,f′ be arbitrary functions in F1 = F2,

expressed as limits of functions in Fpre as f = lim n→∞

fn and f ′ = lim

n→∞ f′n. Appealing to continuity of an

inner product in one argument while the other is held fixed,

〈f,f′〉F1 = lim n→∞

〈fn,f′〉F1 = lim n→∞

lim m→∞

〈fn,f′m〉F1 = lim n→∞

lim m→∞

〈fn,f′m〉Fpre = lim

n→∞ lim

m→∞ 〈fn,f′m〉F2 = lim

n→∞ 〈fn,f′〉F2 = 〈f,f

′〉F2.

So the two RKHSs are the same, and thus F is the unique RKHS with k as its reproducing kernel.

By the above properties, for any kernel k with associated RKHS F, the feature map Φ : X →F defined by Φ(x) = k(·,x) is a valid feature map, and is called the canonical feature map.

For additional background on reproducing kernel Hilbert spaces, see [1].

Exercises

1. Show that convergence of functions in an RKHS implies pointwise convergence. That is, if ‖fn−f‖F → 0 then for all x, |fn(x) − f(x)| → 0.

2. Is the Hilbert space L2(R) a RKHS? If so, what is its reproducing kernel? Hint: This problem is super easy if you understand the definitions of RKHSs and L2(R).

3. Show that if F is a RKHS, then it has a unique reproducing kernel.

4. Elements of an RKHS inherit properties of the associated kernel. Let k be a kernel on a metric space X and F its RKHS. We say k is bounded if supx∈X k(x, x) < ∞. Show that

(a) if k is bounded, then every f ∈ F is a bounded function. (b) if k is bounded, then convergence in F implies convergence in the supremum norm. (c) if k is bounded and for all x, k(·, x) is a continuous function, then every f ∈ F is bounded and

continuous. Hint: Use the denseness of Fpre in F and part (b).

5

References

[1] I. Steinwart and A. Christman, Support Vector Machines, Springer, 2008.

[2] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, MIT Press, 2012.