Conceptualizing functions as infinite-dimensional vectors lets us apply the tools of linear algebra to a vast landscape of new problems, from image and geometry processing to curve fitting, light transport, and machine learning.
Prerequisites: introductory linear algebra, introductory calculus, introductory differential equations.
This article received an honorable mention in 3Blue1Brown’s Summer of Math Exposition 3!
Functions as Vectors
Vector Spaces
Linear Operators
Diagonalization
Inner Product Spaces
The Spectral Theorem
Applications
Fourier Series
Image Compression
Geometry Processing
Further Reading
Functions as Vectors
Vectors are often first introduced as lists of real numbers—i.e. the familiar notation we use for points, directions, and more.
$$ \mathbf{v} = \begin{bmatrix}x\\y\\z\end{bmatrix} $$
You may recall that this representation is only one example of an abstract vector space.
There are many other types of vectors, such as lists of complex numbers, graph cycles, and even magic squares.
However, all of these vector spaces have one thing in common: a finite number of dimensions.
That is, each kind of vector can be represented as a collection of \(N\) numbers, though the definition of “number” varies.
If any \(N\)-dimensional vector is essentially a length-\(N\) list, we could also consider a vector to be a mapping from an index to a value.
\[\begin{align*}
\mathbf{v}_1 &= x\\
\mathbf{v}_2 &= y\\
\mathbf{v}_3 &= z
\end{align*}\ \iff\ \mathbf{v} = \begin{bmatrix}x \\ y \\ z\end{bmatrix}\]
What does this perspective hint at as we increase the number of dimensions?
Dimensions
In higher dimensions, vectors start to look more like functions!
Countably Infinite Indices
Of course, a finite-length vector only specifies a value at a limited number of indices.
Could we instead define a vector that contains infinitely many values?
Writing down a vector representing a function on the natural numbers (\(\mathbb{N}\))—or any other countably infinite domain—is straightforward: just extend the list indefinitely.
$$ \begin{align*}\mathbf{v}_1 &= 1\\\mathbf{v}_2 &= 2\\ &\vdots \\ \mathbf{v}_i &= i\end{align*}\ \iff\ \mathbf{v} = \begin{bmatrix}1 \\ 2 \\ 3 \\ \vdots \end{bmatrix} $$
This vector could represent the function \(f(x) = x\), where \(x \in \mathbb{N}\).1
Uncountably Infinite Indices
Many interesting functions are defined on the real numbers (\(\mathbb{R}\)), so may not be representable as a countably infinite vector.
Therefore, we will have to make a larger conceptual leap: not only will our set of indices be infinite, it will be uncountably infinite.
That means we can’t write down vectors as lists at all—it is impossible to assign an integer index to each element of an uncountable set.
So, how can we write down a vector mapping a real index to a certain value?
Now, a vector really is just an arbitrary function:
$$ \mathbf{v}_{x} = x^2\ \iff\ \mathbf{v} = \begin{bmatrix} x \mapsto x^2 \end{bmatrix} $$
Precisely defining how and why we can represent functions as infinite-dimensional vectors is the purview of functional analysis.
In this post, we won’t attempt to prove our results in infinite dimensions: we will focus on building intuition via analogies to finite-dimensional linear algebra.
Vector Spaces
Review: Abstract vector spaces | Chapter 16, Essence of linear algebra.
Formally, a vector space is defined by choosing a set of vectors \(\mathcal{V}\), a scalar field \(\mathbb{F}\), and a zero vector \(\mathbf{0}\).
The field \(\mathbb{F}\) is often the real numbers (\(\mathbb{R}\)), complex numbers (\(\mathbb{C}\)), or a finite field such as the integers modulo a prime (\(\mathbb{Z}_p\)).
Additionally, we must specify how to add two vectors and how to multiply a vector by a scalar.
\[\begin{align*}
(+)\ &:\ \mathcal{V}\times\mathcal{V}\mapsto\mathcal{V}\\
(\cdot)\ &:\ \mathbb{F}\times\mathcal{V} \mapsto \mathcal{V}
\end{align*}\]
To describe a vector space, our definitions must entail several vector space axioms.
A Functional Vector Space
In the following sections, we’ll work with the vector space of real functions.
To avoid ambiguity, square brackets are used to denote function application.
The scalar field \(\mathbb{F}\) is the real numbers \(\mathbb{R}\).
The set of vectors \(\mathcal{V}\) contains functions from \(\mathbb{R}\) to \(\mathbb{R}\).2
\(\mathbf{0}\) is the zero function, i.e. \(\mathbf{0}[x] = 0\).
Adding functions corresponds to applying the functions separately and summing the results.
$$ (f + g)[x] = f[x] + g[x] $$
This definition generalizes the typical element-wise addition rule—it’s like adding the two values at each index.
\[f+g = \begin{bmatrix}f_1 + g_1 \\ f_2 + g_2 \\ \vdots \end{bmatrix}\]
Multiplying a function by a scalar corresponds to applying the function and scaling the result.
$$ (\alpha f)[x] = \alpha f[x] $$
This rule similarly generalizes element-wise multiplication—it’s like scaling the value at each index.
\[\alpha f = \begin{bmatrix}\alpha f_1 \\ \alpha f_2 \\ \vdots \end{bmatrix}\]
Proofs
Given these definitions, we can now prove all necessary vector space axioms.
We will illustrate the analog of each property in \(\mathbb{R}^2\), the familiar vector space of two-dimensional arrows.
Vector Addition is Commutative
For all vectors $$\mathbf{u}, \mathbf{v} \in \mathcal{V}$$:
$$\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$$
Since real addition is commutative, this property follows directly from our definition of vector addition:
$$\begin{align*}
(f + g)[x] &= f[x] + g[x]\\
&= g[x] + f[x]\\
&= (g + f)[x]
\end{align*}$$
Vector Addition is Associative
For all vectors $$\mathbf{u}, \mathbf{v}, \mathbf{w} \in \mathcal{V}$$:
$$(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$$
This property also follows from our definition of vector addition:
$$\begin{align*}
((f + g) + h)[x] &= (f + g)[x] + h[x]\\
&= f[x] + g[x] + h[x]\\
&= f[x] + (g[x] + h[x])\\
&= f[x] + (g + h)[x]\\
&= (f + (g + h))[x]
\end{align*}$$
$$\mathbf{0}$$ is an Additive Identity
For all vectors $$\mathbf{u} \in \mathcal{V}$$:
$$\mathbf{0} + \mathbf{u} = \mathbf{u} $$
This one is easy:
$$\begin{align*}
(\mathbf{0} + f)[x] &= \mathbf{0}[x] + f[x]\\
&= 0 + f[x]\\
&= f[x]
\end{align*}$$
Additive Inverses Exist
For all vectors $$\mathbf{u} \in \mathcal{V}$$, there exists a vector $$-\mathbf{u} \in \mathcal{V}$$ such that:
$$\mathbf{u} + (-\mathbf{u}) = \mathbf{0}$$
Negation is defined as applying $$f$$ and negating the result: $$(-f)[x] = -f[x]$$.
Clearly, $$-f$$ is also in $$\mathcal{V}$$.
$$\begin{align*}
(f + (-f))[x] &= f[x] + (-f)[x]\\
&= f[x] - f[x]\\
&= 0\\
&= \mathbf{0}[x]
\end{align*}$$
$$1$$ is a Multiplicative Identity
For all vectors $$\mathbf{u} \in \mathcal{V}$$:
$$1\mathbf{u} = \mathbf{u}$$
Note that $$1$$ is specified by the choice of $$\mathbb{F}$$.
In our case, it is simply the real number $$1$$.
$$\begin{align*}
(1 f)[x] &= 1 f[x]\\
&= f[x]
\end{align*}$$
Scalar Multiplication is Associative
For all vectors $$\mathbf{u} \in \mathcal{V}$$ and scalars $$\alpha, \beta \in \mathbb{F}$$:
$$(\alpha \beta)\mathbf{u} = \alpha(\beta\mathbf{u})$$
This property follows from our definition of scalar multiplication:
$$\begin{align*}
((\alpha\beta) f)[x] &= (\alpha\beta)f[x]\\
&= \alpha(\beta f[x])\\
&= \alpha(\beta f)[x]
\end{align*}$$
Scalar Multiplication Distributes Over Vector Addition
For all vectors $$\mathbf{u}, \mathbf{v} \in \mathcal{V}$$ and scalars $$\alpha \in \mathbb{F}$$:
$$\alpha(\mathbf{u} + \mathbf{v}) = \alpha\mathbf{u} + \alpha\mathbf{v}$$
Again using our definitions of vector addition and scalar multiplication:
$$\begin{align*}
(\alpha (f + g))[x] &= \alpha(f + g)[x]\\
&= \alpha(f[x] + g[x])\\
&= \alpha f[x] + \alpha g[x]\\
&= (\alpha f)[x] + (\alpha g)[x]\\
&= (\alpha f + \alpha g)[x]
\end{align*}$$
Scalar Multiplication Distributes Over Scalar Addition
For all vectors $$\mathbf{u} \in \mathcal{V}$$ and scalars $$\alpha, \beta \in \mathbb{F}$$:
$$(\alpha + \beta)\mathbf{u} = \alpha\mathbf{u} + \beta\mathbf{u}$$
Again using our definitions of vector addition and scalar multiplication:
$$\begin{align*}
((\alpha + \beta)f)[x] &= (\alpha + \beta)f[x]\\
&= \alpha f[x] + \beta f[x] \\
&= (\alpha f)[x] + (\beta f)[x]
\end{align*}$$
Therefore, we’ve built a vector space of functions!3
It may not be immediately obvious why this result is useful, but bear with us through a few more definitions—we will spend the rest of this post exploring powerful techniques arising from this perspective.
A Standard Basis for Functions
Review: Linear combinations, span, and basis vectors | Chapter 2, Essence of linear algebra.
Unless specified otherwise, vectors are written down with respect to the standard basis.
In \(\mathbb{R}^2\), the standard basis consists of the two coordinate axes.
$$ \mathbf{e}_1 = \begin{bmatrix}1 \\ 0\end{bmatrix},\,\, \mathbf{e}_2 = \begin{bmatrix}0 \\ 1\end{bmatrix} $$
Hence, vector notation is shorthand for a linear combination of the standard basis vectors.
$$ \mathbf{u} = \begin{bmatrix}\alpha \\ \beta\end{bmatrix} = \alpha\mathbf{e}_1 + \beta\mathbf{e}_2 $$
Above, we represented functions as vectors by assuming each dimension of an infinite-length vector contains the function’s result for that index.
This construction points to a natural generalization of the standard basis.
Just like the coordinate axes, each standard basis function contains a \(1\) at one index and \(0\) everywhere else.
More precisely, for every \(\alpha \in \mathbb{R}\):
\[\mathbf{e}_\alpha[x] = \begin{cases} 1 & \text{if } x = \alpha \\ 0 & \text{otherwise} \end{cases}\]
Ideally, we could express an arbitrary function \(f\) as a linear combination of these basis functions.
However, there are uncountably many of them—and we can’t simply write down a sum over the reals.
Still, considering their linear combination is illustrative:
\[\begin{align*} f[x] &= f[\alpha]\mathbf{e}_\alpha[x] \\ &= f[1]\mathbf{e}_1[x] + f[2]\mathbf{e}_2[x] + f[\pi]\mathbf{e}_\pi[x] + \dots \end{align*}\]
If we evaluate this “sum” at \(x\), we’ll find that all terms are zero—except \(\mathbf{e}_x\), making the result \(f[x]\).
Linear Operators
Review: Change of basis | Chapter 13, Essence of linear algebra.
Now that we can manipulate functions as vectors, let’s start transferring the tools of linear algebra to the functional perspective.
One ubiquitous operation on finite-dimensional vectors is transforming them with matrices.
A matrix \(\mathbf{A}\) encodes a linear transformation, meaning multiplication preserves linear combinations.
\[\mathbf{A}(\alpha \mathbf{x} + \beta \mathbf{y}) = \alpha \mathbf{A}\mathbf{x} + \beta \mathbf{A}\mathbf{y}\]
Multiplying a vector by a matrix can be intuitively interpreted as defining a new set of coordinate axes from the matrix’s column vectors.
The result is a linear combination of the columns:
\[\mathbf{Ax} = \begin{bmatrix} \vert & \vert & \vert \\ \mathbf{u} & \mathbf{v} & \mathbf{w} \\ \vert & \vert & \vert \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} = x_1\mathbf{u} + x_2\mathbf{v} + x_3\mathbf{w}\]
\[\begin{align*}
\mathbf{Ax} &= \begin{bmatrix} \vert & \vert & \vert \\ \mathbf{u} & \mathbf{v} & \mathbf{w} \\ \vert & \vert & \vert \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} \\ &= x_1\mathbf{u} + x_2\mathbf{v} + x_3\mathbf{w}
\end{align*}\]
When all vectors can be expressed as a linear combination of \(\mathbf{u}\), \(\mathbf{v}\), and \(\mathbf{w}\), the columns form a basis for the underlying vector space.
Here, the matrix \(\mathbf{A}\) transforms a vector from the \(\mathbf{uvw}\) basis into the standard basis.
Since functions are vectors, we could imagine transforming a function by a matrix.
Such a matrix would be infinite-dimensional, so we will instead call it a linear operator and denote it with \(\mathcal{L}\).
\[\mathcal{L}f = \begin{bmatrix} \vert & \vert & \vert & \\ \mathbf{f} & \mathbf{g} & \mathbf{h} & \cdots \\ \vert & \vert & \vert & \end{bmatrix} \begin{bmatrix}f_1\\ f_2 \\ f_3\\ \vdots\end{bmatrix} = f_1\mathbf{f} + f_2\mathbf{g} + f_3\mathbf{h} + \cdots\]
\[\begin{align*}
\mathcal{L}f &= \begin{bmatrix} \vert & \vert & \vert & \\ \mathbf{f} & \mathbf{g} & \mathbf{h} & \cdots \\ \vert & \vert & \vert & \end{bmatrix} \begin{bmatrix}f_1\\ f_2\\ f_3 \\ \vdots\end{bmatrix} \\ &= f_1\mathbf{f} + f_2\mathbf{g} + f_3\mathbf{h} + \cdots
\end{align*}\]
This visualization isn’t very accurate—we’re dealing with uncountably infinite-dimensional vectors, so we can’t actually write out an operator in matrix form.
Nonetheless, the structure is suggestive: each “column” of the operator describes a new basis function for our functional vector space.
Just like we saw with finite-dimensional vectors, \(\mathcal{L}\) represents a change of basis.
Differentiation
Review: Derivative formulas through geometry | Chapter 3, Essence of calculus.
So, what’s an example of a linear operator on functions?
You might recall that differentiation is linear:
\[\frac{\partial}{\partial x} \left(\alpha f[x] + \beta g[x]\right) = \alpha\frac{\partial f}{\partial x} + \beta\frac{\partial g}{\partial x}\]
It’s hard to visualize differentiation on general functions, but it’s feasible for the subspace of polynomials, \(\mathcal{P}\).
Let’s take a slight detour to examine this smaller space of functions.
\[\mathcal{P} = \{ p[x] = a + bx + cx^2 + dx^3 + \cdots \}\]
We typically write down polynomials as a sequence of powers, i.e. \(1, x, x^2, x^3\), etc.
All polynomials are linear combinations of the functions \(\mathbf{e}_i[x] = x^i\), so they constitute a countably infinite basis for \(\mathcal{P}\).4
This basis provides a convenient vector notation:
\[\begin{align*} p[x] &= a + bx + cx^2 + dx^3 + \cdots \\ &= a\mathbf{e}_0 + b\mathbf{e}_1 + c \mathbf{e}_2 + d\mathbf{e}_3 + \dots \end{align*}\ \iff\ \mathbf{p} = \begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix}\]
\[\begin{align*} p[x] &= a + bx + cx^2 + dx^3 + \cdots \\ &= a\mathbf{e}_0 + b\mathbf{e}_1 + c \mathbf{e}_2 + d\mathbf{e}_3 + \dots \\& \iff\ \mathbf{p} = \begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix} \end{align*}\]
Since differentiation is linear, we’re able to apply the rule \(\frac{\partial}{\partial x} x^n = nx^{n-1}\) to each term.
\[\begin{align*}\frac{\partial}{\partial x}p[x] &= \vphantom{\Bigg\vert}a\frac{\partial}{\partial x}1 + b\frac{\partial}{\partial x}x + c\frac{\partial}{\partial x}x^2 + d\frac{\partial}{\partial x}x^3 + \dots \\ &= b + 2cx + 3dx^2 + \cdots\\ &= b\mathbf{e}_0 + 2c\mathbf{e}_1 + 3d\mathbf{e}_2 + \dots\end{align*} \ \iff\ \frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\]
\[\begin{align*}\frac{\partial}{\partial x}p[x] &= \vphantom{\Bigg\vert}a\frac{\partial}{\partial x}1 + b\frac{\partial}{\partial x}x + c\frac{\partial}{\partial x}x^2\, +\\ & \phantom{=} d\frac{\partial}{\partial x}x^3 + \dots \\ &= b + 2cx + 3dx^2 + \cdots\\ &= b\mathbf{e}_0 + 2c\mathbf{e}_1 + 3d\mathbf{e}_2 + \dots \\ &\iff\ \frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\end{align*}\]
We’ve performed a linear transformation on the coefficients, so we can represent differentiation as a matrix!
\[\frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}0 & 1 & 0 & 0 & \cdots\\ 0 & 0 & 2 & 0 & \cdots\\ 0 & 0 & 0 & 3 & \cdots\\ \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix}\begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\]
Each column of the differentiation operator is itself a polynomial, so this matrix represents a change of basis.
\[\frac{\partial}{\partial x} = \begin{bmatrix} \vert & \vert & \vert & \vert & \vert & \\ 0 & 1 & 2x & 3x^2 & 4x^3 & \cdots \\ \vert & \vert & \vert & \vert & \vert & \end{bmatrix}\]
As we can see, the differentiation operator simply maps each basis function to its derivative.
This result also applies to the larger space of analytic real functions, which includes polynomials, exponential functions, trigonometric functions, logarithms, and other familiar names.
By definition, an analytic function can be expressed as a Taylor series about \(0\):
\[f[x] = \sum_{n=0}^\infty \frac{f^{(n)}[0]}{n!}x^n = \sum_{n=0}^\infty \alpha_n x^n\]
Which is a linear combination of our polynomial basis functions.
That means a Taylor expansion is essentially a change of basis into the sequence of powers, where our differentiation operator is quite simple.5
Diagonalization
Review: Eigenvectors and eigenvalues | Chapter 14, Essence of linear algebra.
Matrix decompositions are arguably the crowning achievement of linear algebra.
To get started, let’s review what diagonalization means for a \(3\times3\) real matrix \(\mathbf{A}\).
Eigenvectors
A vector \(\mathbf{u}\) is an eigenvector of the matrix \(\mathbf{A}\) when the following condition holds:
$$ \mathbf{Au} = \lambda \mathbf{u} $$
The eigenvalue \(\lambda\) may be computed by solving the characteristic polynomial of \(\mathbf{A}\).
Eigenvalues may be real or complex.
The matrix \(\mathbf{A}\) is diagonalizable when it admits three linearly independent eigenvectors, each with a corresponding real eigenvalue.
This set of eigenvectors constitutes an eigenbasis for the underlying vector space, indicating that we can express any vector \(\mathbf{x}\) via their linear combination.
$$ \mathbf{x} = \alpha\mathbf{u}_1 + \beta\mathbf{u}_2 + \gamma\mathbf{u}_3 $$
To multiply \(\mathbf{x}\) by \(\mathbf{A}\), we just have to scale each component by its corresponding eigenvalue.
$$ \begin{align*} \mathbf{Ax} &= \alpha\mathbf{A}\mathbf{u}_1 + \beta\mathbf{A}\mathbf{u}_2 + \gamma\mathbf{A}\mathbf{u}_3 \\
&= \alpha\lambda_1\mathbf{u}_1 + \beta\lambda_2\mathbf{u}_2 + \gamma\lambda_3\mathbf{u}_3 \end{align*} $$
Finally, re-combining the eigenvectors expresses the result in the standard basis.
Intuitively, we’ve shown that multiplying by \(\mathbf{A}\) is equivalent to a change of basis, a scaling, and a change back.
That means we can write \(\mathbf{A}\) as the product of an invertible matrix \(\mathbf{U}\) and a diagonal matrix \(\mathbf{\Lambda}\).
\[\begin{align*} \mathbf{A} &= \mathbf{U\Lambda U^{-1}} \\
&= \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix}
\begin{bmatrix}\lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{bmatrix}
\begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix}^{-1}
\end{align*}\]
\[\begin{align*} \mathbf{A} &= \mathbf{U\Lambda U^{-1}} \\
&= \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix}
\\ & \phantom{=} \begin{bmatrix}\lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{bmatrix}
\\ & \phantom{=} \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix}^{-1}
\end{align*}\]
Note that \(\mathbf{U}\) is invertible because its columns (the eigenvectors) form a basis for \(\mathbb{R}^3\).
When multiplying by \(\mathbf{x}\), \(\mathbf{U}^{-1}\) converts \(\mathbf{x}\) to the eigenbasis, \(\mathbf{\Lambda}\) scales by the corresponding eigenvalues, and \(\mathbf{U}\) takes us back to the standard basis.
In the presence of complex eigenvalues, \(\mathbf{A}\) may still be diagonalizable if we allow \(\mathbf{U}\) and \(\mathbf{\Lambda}\) to include complex entires.
In this case, the decomposition as a whole still maps real vectors to real vectors, but the intermediate values become complex.
Eigenfunctions
Review: What’s so special about Euler’s number e? | Chapter 5, Essence of calculus.
So, what does diagonalization mean in a vector space of functions?
Given a linear operator \(\mathcal{L}\), you might imagine a corresponding definition for eigenfunctions:
\[\mathcal{L}f = \psi f\]
The scalar \(\psi\) is again known as an eigenvalue.
Since \(\mathcal{L}\) is infinite-dimensional, it doesn’t have a characteristic polynomial—there’s not a straightforward method for computing \(\psi\).
Nevertheless, let’s attempt to diagonalize differentiation on analytic functions.
The first step is to find the eigenfunctions.
Start by applying the above condition to our differentiation operator in the power basis:
\[\begin{align*}
&& \frac{\partial}{\partial x}\mathbf{p} = \psi \mathbf{p} \vphantom{\Big|}& \\
&\iff& \begin{bmatrix}0 & 1 & 0 & 0 & \cdots\\ 0 & 0 & 2 & 0 & \cdots\\ 0 & 0 & 0 & 3 & \cdots\\ \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix}\begin{bmatrix}p_0\\ p_1\\ p_2\\ p_3\\ \vdots\end{bmatrix}
&= \begin{bmatrix}\psi p_0\\ \psi p_1 \\ \psi p_2 \\ \psi p_3 \\ \vdots \end{bmatrix} \\
&\iff& \begin{cases} p_1 &= \psi p_0 \\ p_2 &= \frac{\psi}{2} p_1 \\ p_3 &= \frac{\psi}{3} p_2 \\ &\dots \end{cases} &
\end{align*}\]
This system of equations implies that all coefficients are determined solely by our choice of constants \(p_0\) and \(\psi\).
We can explicitly write down their relationship as \(p_i = \frac{\psi^i}{i!}p_0\).
Now, let’s see what this class of polynomials actually looks like.
\[p[x] = p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2 + p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots\]
\[\begin{align*}
p[x] &= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2\, +\\ &\phantom{=} p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots
\end{align*}\]
Differentiation shows that this function is, in fact, an eigenfunction for the eigenvalue \(\psi\).
\[\begin{align*} \frac{\partial}{\partial x} p[x] &= 0 + p_0\psi + p_0 \psi^2 x + p_0\frac{\psi^3}{2}x^2 + p_0\frac{\psi^4}{6}x^3 + \dots \\
&= \psi p[x] \end{align*}\]
\[\begin{align*}
\frac{\partial}{\partial x} p[x] &= 0 + p_0\psi + p_0 \psi^2 x\, +\\ &\phantom{=} p_0\frac{\psi^3}{2}x^2 + p_0\frac{\psi^4}{6}x^3 + \dots \\
&= \psi p[x]
\end{align*}\]
With a bit of algebraic manipulation, the definition of \(e^{x}\) pops out:
\[\begin{align*} p[x] &= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2 + p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots \\
&= p_0\left((\psi x) + \frac{1}{2!}(\psi x)^2 + \frac{1}{3!}(\psi x)^3 + \frac{1}{4!}(\psi x)^4 + \dots\right) \\
&= p_0 e^{\psi x} \end{align*}\]
\[\begin{align*}
p[x] &= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2\, +\\ &\phantom{=} p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots \\
&= p_0\Big((\psi x) + \frac{1}{2!}(\psi x)^2\, +\\ &\phantom{=p_0\Big((} \frac{1}{3!}(\psi x)^3 + \frac{1}{4!}(\psi x)^4 + \dots\Big) \\
&= p_0 e^{\psi x}
\end{align*}\]
Therefore, functions of the form \(p_0e^{\psi x}\) are eigenfunctions for the eigenvalue \(\psi\), including when \(\psi=0\).
Diagonalizing Differentiation
We’ve found the eigenfunctions of the derivative operator, but can we diagonalize it?
Ideally, we would express differentiation as the combination of an invertible operator \(\mathcal{L}\) and a diagonal operator \(\mathcal{D}\).
\[\begin{align*} \frac{\partial}{\partial x} &= \mathcal{L} \mathcal{D} \mathcal{L}^{-1} \\
&=
\begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix}
\begin{bmatrix} \psi_1 & 0 & \dots \\ 0 & \psi_2 & \dots \\ \vdots & \vdots & \ddots \end{bmatrix}
{\color{red} \begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix}^{-1} }
\end{align*}\]
\[\begin{align*} \frac{\partial}{\partial x} &= \mathcal{L} \mathcal{D} \mathcal{L}^{-1} \\
&=
\begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix}
\\ & \phantom{=} \begin{bmatrix} \psi_1 & 0 & \dots \\ 0 & \psi_2 & \dots \\ \vdots & \vdots & \ddots \end{bmatrix}
\\ & \phantom{=} {\color{red} \begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix}^{-1} }
\end{align*}\]
Diagonalization is only possible when our eigenfunctions form a basis.
This would be true if all analytic functions are expressible as a linear combination of exponentials.
However…
Counterexample: $$f[x] = x$$
First assume that \(f[x] = x\) can be represented as a linear combination of exponentials.
Since analytic functions have countably infinite dimensionality, we should only need a countably infinite sum:
\[f[x] = x = \sum_{n=0}^\infty \alpha_n e^{\psi_n x}\]
Differentiating both sides:
\[\begin{align*} f^{\prime}[x] &= 1 = \sum_{n=0}^\infty \psi_n\alpha_n e^{\psi_n x} \\
f^{\prime\prime}[x] &= 0 = \sum_{n=0}^\infty \psi_n^2\alpha_n e^{\psi_n x} \end{align*}\]
Since \(e^{\psi_n x}\) and \(e^{\psi_m x}\) are linearly independent when \(n\neq m\), the final equation implies that all \(\alpha = 0\), except possibly the \(\alpha_\xi\) corresponding to \(\psi_\xi = 0\).
Therefore:
\[\begin{align*}
1 &= \sum_{n=0}^\infty \psi_n\alpha_n e^{\psi_n x}\\
&= \psi_\xi \alpha_\xi + \sum_{n\neq \xi} 0\psi_n e^{\psi_n x} \\
&= 0
\end{align*}\]
That’s a contradiction—the linear combination representing \(f[x] = x\) does not exist.
A similar argument shows that we can’t represent any non-constant function whose \(n\)th derivative is zero, nor periodic functions like sine and cosine.
Real exponentials don’t constitute a basis, so we cannot construct an invertible \(\mathcal{L}\).
The Laplace Transform
We previously mentioned that more matrices can be diagonalized if we allow the decomposition to contain complex numbers.
Analogously, more linear operators are diagonalizable in the larger vector space of functions from \(\mathbb{R}\) to \(\mathbb{C}\).
Differentiation works the same way in this space; we’ll still find that its eigenfunctions are exponential.
\[\frac{\partial}{\partial x} e^{(a+bi)x} = (a+bi)e^{(a+bi)x}\]
However, the new eigenfunctions have complex eigenvalues, so we still can’t diagonalize.
We’ll need to consider the still larger space of functions from \(\mathbb{C}\) to \(\mathbb{C}\).
\[\frac{\partial}{\partial x} : (\mathbb{C}\mapsto\mathbb{C}) \mapsto (\mathbb{C}\mapsto\mathbb{C})\]
In this space, differentiation can be diagonalized via the Laplace transform.
Although useful for solving differential equations, the Laplace transform is non-trivial to invert, so we won’t discuss it further.
In the following sections, we’ll delve into an operator that can be easily diagonalized in \(\mathbb{R}\mapsto\mathbb{C}\): the Laplacian.
Inner Product Spaces
Review: Dot products and duality | Chapter 9, Essence of linear algebra.
Before we get to the spectral theorem, we’ll need to understand one more topic: inner products.
You’re likely already familiar with one example of an inner product—the Euclidean dot product.
\[\begin{bmatrix}x\\ y\\ z\end{bmatrix} \cdot \begin{bmatrix}a\\ b\\ c\end{bmatrix} = ax + by + cz\]
An inner product describes how to measure a vector along another vector.
For example, \(\mathbf{u}\cdot\mathbf{v}\) is proportional to the length of the projection of \(\mathbf{u}\) onto \(\mathbf{v}\).
$$ \mathbf{u} \cdot \mathbf{v} =\|\mathbf{u}\|\|\mathbf{v}\|\cos[\theta] $$
With a bit of trigonometry, we can show that the dot product is equivalent to multiplying the vectors’ lengths with the cosine of their angle.
This relationship suggests that the product of a vector with itself produces the square of its length.
\[\begin{align*} \mathbf{u}\cdot\mathbf{u} &= \|\mathbf{u}\|\|\mathbf{u}\|\cos[0] \\
&= \|\mathbf{u}\|^2
\end{align*}\]
Similarly, when two vectors form a right angle (are orthogonal), their dot product is zero.
$$ \begin{align*} \mathbf{u} \cdot \mathbf{v} &= \|\mathbf{u}\|\|\mathbf{v}\|\cos[90^\circ] \\ &= 0 \end{align*} $$
Of course, the Euclidean dot product is only one example of an inner product.
In more general spaces, the inner product is denoted using angle brackets, such as \(\langle \mathbf{u}, \mathbf{v} \rangle\).
The length (also known as the norm) of a vector is defined as \(\|\mathbf{u}\| = \sqrt{\langle \mathbf{u}, \mathbf{u} \rangle}\).
Two vectors are orthogonal if their inner product is zero: \(\ \mathbf{u} \perp \mathbf{v}\ \iff\ \langle \mathbf{u}, \mathbf{v} \rangle = 0\).
A vector space augmented with an inner product is known as an inner product space.
A Functional Inner Product
We can’t directly apply the Euclidean dot product to our space of real functions, but its \(N\)-dimensional generalization is suggestive.
\[\begin{align*} \mathbf{u} \cdot \mathbf{v} &= u_1v_1 + u_2v_2 + \dots + u_Nv_N \\ &= \sum_{i=1}^N u_iv_i \end{align*}\]
Given countable indices, we simply match up the values, multiply them, and add the results.
When indices are uncountable, we can convert the discrete sum to its continuous analog: an integral!
\[\langle f, g \rangle = \int_a^b f[x]g[x] \, dx\]
When \(f\) and \(g\) are similar, multiplying them produces a larger function; when they’re different, they cancel out.
Integration measures their product over some domain to produce a scalar result.
Of course, not all functions can be integrated.
Our inner product space will only contain functions that are square integrable over the domain \([a, b]\), which may be \([-\infty, \infty]\).
Luckily, the important properties of our inner product do not depend on the choice of integration domain.
Proofs
Below, we’ll briefly cover functions from \(\mathbb{R}\) to \(\mathbb{C}\).
In this space, our intuitive notion of similarity still applies, but we’ll use a slightly more general inner product:
\[\langle f,g \rangle = \int_a^b f[x]\overline{g[x]}\, dx\]
Where \(\overline{x}\) denotes conjugation, i.e. \(\overline{a + bi} = a - bi\).
Like other vector space operations, an inner product must satisfy several axioms:
Conjugate Symmetry
For all vectors $$\mathbf{u}, \mathbf{v} \in \mathcal{V}$$:
$$\langle \mathbf{u}, \mathbf{v} \rangle = \overline{\langle \mathbf{v}, \mathbf{u} \rangle}$$
Conjugation may be taken outside the integral, making this one easy:
$$\begin{align*} \langle f, g \rangle &= \int_a^b f[x]\overline{g[x]} \, dx \\
&= \int_a^b \overline{g[x]\overline{f[x]}} \, dx \\
&= \overline{\int_a^b g[x]\overline{f[x]} \, dx} \\
&= \overline{\langle g, f \rangle}
\end{align*}$$
Note that we require conjugate symmetry because it implies $$\langle\mathbf{u}, \mathbf{u}\rangle = \overline{\langle\mathbf{u}, \mathbf{u}\rangle}$$, i.e. the inner product of a vector with itself is real.
Linearity in the First Argument
For all vectors $$\mathbf{u}, \mathbf{v}, \mathbf{w} \in \mathcal{V}$$ and scalars $$\alpha, \beta \in \mathbb{F}$$:
$$\langle \alpha \mathbf{u} + \beta \mathbf{v}, \mathbf{w} \rangle = \alpha\langle \mathbf{u}, \mathbf{w} \rangle + \beta\langle \mathbf{v}, \mathbf{w} \rangle $$
The proof follows from linearity of integration, as well as our vector space axioms:
\[\begin{align*} \langle \alpha f + \beta g, h \rangle &= \int_a^b (\alpha f + \beta g)[x]\overline{h[x]} \, dx \\
&= \int_a^b (\alpha f[x] + \beta g[x])\overline{h[x]} \, dx \\
&= \int_a^b \alpha f[x]\overline{h[x]} + \beta g[x]\overline{h[x]} \, dx \\
&= \alpha\int_a^b f[x]\overline{h[x]}\, dx + \beta\int_a^b g[x]\overline{h[x]} \, dx \\
&= \alpha\langle f, h \rangle + \beta\langle g, h \rangle
\end{align*}\]
\[\begin{align*} &\langle \alpha f + \beta g, h \rangle\\ &= \int_a^b (\alpha f + \beta g)[x]\overline{h[x]} \, dx \\
&= \int_a^b (\alpha f[x] + \beta g[x])\overline{h[x]} \, dx \\
&= \int_a^b \alpha f[x]\overline{h[x]} + \beta g[x]\overline{h[x]} \, dx \\
&= \alpha\int_a^b f[x]\overline{h[x]}\, dx\, +\\&\hphantom{==} \beta\int_a^b g[x]\overline{h[x]} \, dx \\
&= \alpha\langle f, h \rangle + \beta\langle g, h \rangle
\end{align*}\]
Given conjugate symmetry, an inner product is also antilinear in the second argument.
Positive-Definiteness
For all $$\mathbf{u} \in \mathcal{V}$$:
$$ \begin{cases} \langle \mathbf{u}, \mathbf{u} \rangle = 0 & \text{if } \mathbf{u} = \mathbf{0} \\ \langle \mathbf{u}, \mathbf{u} \rangle > 0 & \text{otherwise} \end{cases} $$
By conjugate symmetry, we know $$\langle f, f \rangle$$ is real, so we can compare it with zero.
However, rigorously proving this result requires measure-theoretic concepts beyond the scope of this post.
In brief, we redefine $$\mathbf{0}$$ not as specifically $$\mathbf{0}[x] = 0$$, but as an equivalence class of functions that are zero