Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://mastodon.gamedev.place/users/TheNumbat/statuses/110798166960576796">thenumbat@mastodon.gamedev.place's status on Sunday, 07-Apr-2024 06:36:51 JST</a><a href="https://mastodon.gamedev.place/@TheNumbat" title="thenumbat@mastodon.gamedev.place"><img src="https://gnusocial.jp/avatar/150245-48-20230717063925.webp" width="48" height="48" alt="TheNumbat" style="position: absolute; left: 0; top: 0;">TheNumbat</a></section><article><p>Functions are vectors! This perspective lets us apply the tools of linear algebra to computational problems from image and geometry processing to machine learning and light transport—and provides a natural explanation for Fourier series.<br>Let's explore: <a href="https://thenumb.at/Functions-are-Vectors" rel="nofollow noreferrer">https://thenumb.at/Functions-are-Vectors</a></p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/2906325#notice-5780124">In conversation</a><time datetime="2024-04-07T06:36:51+09:00" title="Sunday, 07-Apr-2024 06:36:51 JST">about a year ago</time> <span>from <span><a href="https://mastodon.gamedev.place/@TheNumbat/110798166960576796" rel="external" title="Sent from mastodon.gamedev.place via ActivityPub">mastodon.gamedev.place</a></span></span><a href="https://mastodon.gamedev.place/@TheNumbat/110798166960576796">permalink</a><h4>Attachments</h4><ol><li><label><a rel="external" href="https://gnusocial.jp/attachment/2468240">Untitled attachment</a></label><br><video poster="" controls="controls"><source src="https://cdn.masto.host/mastodongamedevplace/media_attachments/files/110/798/095/173/691/353/original/5a203987f80b4b45.mp4" type="video/mp4"></source></video></li><li><article><header><div>Domain not in remote thumbnail source whitelist: thenumb.at</div><h5><a href="https://thenumb.at/Functions-are-Vectors/">Functions are Vectors</a></h5><div> from <span>Max Slater</span></div></header><div>Conceptualizing functions as infinite-dimensional vectors lets us apply the tools of linear algebra to a vast landscape of new problems, from image and geometry processing to curve fitting, light transport, and machine learning.Prerequisites: introductory linear algebra, introductory calculus, introductory differential equations.This article received an honorable mention in 3Blue1Brown’s Summer of Math Exposition 3!  Functions as Vectors      Vector Spaces
      Linear Operators
      Diagonalization
      Inner Product Spaces
      The Spectral Theorem  Applications      Fourier Series
      Image Compression
      Geometry Processing
      Further ReadingFunctions as VectorsVectors are often first introduced as lists of real numbers—i.e. the familiar notation we use for points, directions, and more.$$ \mathbf{v} = \begin{bmatrix}x\\y\\z\end{bmatrix} $$You may recall that this representation is only one example of an abstract vector space.
There are many other types of vectors, such as lists of complex numbers, graph cycles, and even magic squares.However, all of these vector spaces have one thing in common: a finite number of dimensions.
That is, each kind of vector can be represented as a collection of $N$ numbers, though the definition of “number” varies.If any $N$-dimensional vector is essentially a length-$N$ list, we could also consider a vector to be a mapping from an index to a value.\[\begin{align*}
\mathbf{v}_1 &amp;= x\\
\mathbf{v}_2 &amp;= y\\
\mathbf{v}_3 &amp;= z
\end{align*}\ \iff\ \mathbf{v} = \begin{bmatrix}x \\ y \\ z\end{bmatrix}\]What does this perspective hint at as we increase the number of dimensions?    DimensionsIn higher dimensions, vectors start to look more like functions!Countably Infinite IndicesOf course, a finite-length vector only specifies a value at a limited number of indices.
Could we instead define a vector that contains infinitely many values?Writing down a vector representing a function on the natural numbers ($\mathbb{N}$)—or any other countably infinite domain—is straightforward: just extend the list indefinitely.$$ \begin{align*}\mathbf{v}_1 &amp;= 1\\\mathbf{v}_2 &amp;= 2\\ &amp;\vdots \\ \mathbf{v}_i &amp;= i\end{align*}\ \iff\ \mathbf{v} = \begin{bmatrix}1 \\ 2 \\ 3 \\ \vdots \end{bmatrix} $$This vector could represent the function $f(x) = x$, where $x \in \mathbb{N}$.1Uncountably Infinite IndicesMany interesting functions are defined on the real numbers ($\mathbb{R}$), so may not be representable as a countably infinite vector.
Therefore, we will have to make a larger conceptual leap: not only will our set of indices be infinite, it will be uncountably infinite.That means we can’t write down vectors as lists at all—it is impossible to assign an integer index to each element of an uncountable set.
So, how can we write down a vector mapping a real index to a certain value?Now, a vector really is just an arbitrary function:$$ \mathbf{v}_{x} = x^2\ \iff\ \mathbf{v} = \begin{bmatrix} x \mapsto x^2 \end{bmatrix} $$Precisely defining how and why we can represent functions as infinite-dimensional vectors is the purview of functional analysis.
In this post, we won’t attempt to prove our results in infinite dimensions: we will focus on building intuition via analogies to finite-dimensional linear algebra.Vector Spaces  Review: Abstract vector spaces | Chapter 16, Essence of linear algebra.Formally, a vector space is defined by choosing a set of vectors $\mathcal{V}$, a scalar field $\mathbb{F}$, and a zero vector $\mathbf{0}$.
The field $\mathbb{F}$ is often the real numbers ($\mathbb{R}$), complex numbers ($\mathbb{C}$), or a finite field such as the integers modulo a prime ($\mathbb{Z}_p$).Additionally, we must specify how to add two vectors and how to multiply a vector by a scalar.\[\begin{align*}
(+)\ &amp;:\ \mathcal{V}\times\mathcal{V}\mapsto\mathcal{V}\\
(\cdot)\ &amp;:\ \mathbb{F}\times\mathcal{V} \mapsto \mathcal{V}
\end{align*}\]To describe a vector space, our definitions must entail several vector space axioms.A Functional Vector SpaceIn the following sections, we’ll work with the vector space of real functions.
To avoid ambiguity, square brackets are used to denote function application.  The scalar field $\mathbb{F}$ is the real numbers $\mathbb{R}$.
  The set of vectors $\mathcal{V}$ contains functions from $\mathbb{R}$ to $\mathbb{R}$.2
  $\mathbf{0}$ is the zero function, i.e. $\mathbf{0}[x] = 0$.Adding functions corresponds to applying the functions separately and summing the results.$$ (f + g)[x] = f[x] + g[x] $$This definition generalizes the typical element-wise addition rule—it’s like adding the two values at each index.\[f+g = \begin{bmatrix}f_1 + g_1 \\ f_2 + g_2 \\ \vdots \end{bmatrix}\]Multiplying a function by a scalar corresponds to applying the function and scaling the result.$$ (\alpha f)[x] = \alpha f[x] $$This rule similarly generalizes element-wise multiplication—it’s like scaling the value at each index.\[\alpha f = \begin{bmatrix}\alpha f_1 \\ \alpha f_2 \\ \vdots \end{bmatrix}\]ProofsGiven these definitions, we can now prove all necessary vector space axioms.
We will illustrate the analog of each property in $\mathbb{R}^2$, the familiar vector space of two-dimensional arrows.    Vector Addition is CommutativeFor all vectors $$\mathbf{u}, \mathbf{v} \in \mathcal{V}$$:$$\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$$Since real addition is commutative, this property follows directly from our definition of vector addition:$$\begin{align*}
(f + g)[x] &amp;= f[x] + g[x]\\
&amp;= g[x] + f[x]\\
&amp;= (g + f)[x]
\end{align*}$$    Vector Addition is AssociativeFor all vectors $$\mathbf{u}, \mathbf{v}, \mathbf{w} \in \mathcal{V}$$:$$(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$$This property also follows from our definition of vector addition:$$\begin{align*}
((f + g) + h)[x] &amp;= (f + g)[x] + h[x]\\
&amp;= f[x] + g[x] + h[x]\\
&amp;= f[x] + (g[x] + h[x])\\
&amp;= f[x] + (g + h)[x]\\
&amp;= (f + (g + h))[x]
\end{align*}$$    $$\mathbf{0}$$ is an Additive IdentityFor all vectors $$\mathbf{u} \in \mathcal{V}$$:$$\mathbf{0} + \mathbf{u} = \mathbf{u} $$This one is easy:$$\begin{align*}
(\mathbf{0} + f)[x] &amp;= \mathbf{0}[x] + f[x]\\
&amp;= 0 + f[x]\\
&amp;= f[x]
\end{align*}$$    Additive Inverses ExistFor all vectors $$\mathbf{u} \in \mathcal{V}$$, there exists a vector $$-\mathbf{u} \in \mathcal{V}$$ such that:$$\mathbf{u} + (-\mathbf{u}) = \mathbf{0}$$Negation is defined as applying $$f$$ and negating the result: $$(-f)[x] = -f[x]$$.
Clearly, $$-f$$ is also in $$\mathcal{V}$$.$$\begin{align*}
(f + (-f))[x] &amp;= f[x] + (-f)[x]\\
&amp;= f[x] - f[x]\\
&amp;= 0\\
&amp;= \mathbf{0}[x]
\end{align*}$$    $$1$$ is a Multiplicative IdentityFor all vectors $$\mathbf{u} \in \mathcal{V}$$:$$1\mathbf{u} = \mathbf{u}$$Note that $$1$$ is specified by the choice of $$\mathbb{F}$$.
In our case, it is simply the real number $$1$$.$$\begin{align*}
(1 f)[x] &amp;= 1 f[x]\\
&amp;= f[x]
\end{align*}$$    Scalar Multiplication is AssociativeFor all vectors $$\mathbf{u} \in \mathcal{V}$$ and scalars $$\alpha, \beta \in \mathbb{F}$$:$$(\alpha \beta)\mathbf{u} = \alpha(\beta\mathbf{u})$$This property follows from our definition of scalar multiplication:$$\begin{align*}
((\alpha\beta) f)[x] &amp;= (\alpha\beta)f[x]\\
&amp;= \alpha(\beta f[x])\\
&amp;= \alpha(\beta f)[x]
\end{align*}$$    Scalar Multiplication Distributes Over Vector AdditionFor all vectors $$\mathbf{u}, \mathbf{v} \in \mathcal{V}$$ and scalars $$\alpha \in \mathbb{F}$$:$$\alpha(\mathbf{u} + \mathbf{v}) = \alpha\mathbf{u} + \alpha\mathbf{v}$$Again using our definitions of vector addition and scalar multiplication:$$\begin{align*}
(\alpha (f + g))[x] &amp;= \alpha(f + g)[x]\\
&amp;= \alpha(f[x] + g[x])\\
&amp;= \alpha f[x] + \alpha g[x]\\
&amp;= (\alpha f)[x] + (\alpha g)[x]\\
&amp;= (\alpha f + \alpha g)[x]
\end{align*}$$    Scalar Multiplication Distributes Over Scalar AdditionFor all vectors $$\mathbf{u} \in \mathcal{V}$$ and scalars $$\alpha, \beta \in \mathbb{F}$$: $$(\alpha + \beta)\mathbf{u} = \alpha\mathbf{u} + \beta\mathbf{u}$$Again using our definitions of vector addition and scalar multiplication:$$\begin{align*}
((\alpha + \beta)f)[x] &amp;= (\alpha + \beta)f[x]\\
&amp;= \alpha f[x] + \beta f[x] \\
&amp;= (\alpha f)[x] + (\beta f)[x]
\end{align*}$$Therefore, we’ve built a vector space of functions!3
It may not be immediately obvious why this result is useful, but bear with us through a few more definitions—we will spend the rest of this post exploring powerful techniques arising from this perspective.A Standard Basis for Functions  Review: Linear combinations, span, and basis vectors | Chapter 2, Essence of linear algebra.Unless specified otherwise, vectors are written down with respect to the standard basis.
In $\mathbb{R}^2$, the standard basis consists of the two coordinate axes.$$ \mathbf{e}_1 = \begin{bmatrix}1 \\ 0\end{bmatrix},\,\, \mathbf{e}_2 = \begin{bmatrix}0 \\ 1\end{bmatrix} $$Hence, vector notation is shorthand for a linear combination of the standard basis vectors.$$ \mathbf{u} = \begin{bmatrix}\alpha \\ \beta\end{bmatrix} = \alpha\mathbf{e}_1 + \beta\mathbf{e}_2 $$Above, we represented functions as vectors by assuming each dimension of an infinite-length vector contains the function’s result for that index.
This construction points to a natural generalization of the standard basis.Just like the coordinate axes, each standard basis function contains a $1$ at one index and $0$ everywhere else.
More precisely, for every $\alpha \in \mathbb{R}$:\[\mathbf{e}_\alpha[x] = \begin{cases} 1 &amp; \text{if } x = \alpha \\ 0 &amp; \text{otherwise} \end{cases}\]Ideally, we could express an arbitrary function $f$ as a linear combination of these basis functions.
However, there are uncountably many of them—and we can’t simply write down a sum over the reals.
Still, considering their linear combination is illustrative:\[\begin{align*} f[x] &amp;= f[\alpha]\mathbf{e}_\alpha[x] \\ &amp;= f[1]\mathbf{e}_1[x] + f[2]\mathbf{e}_2[x] + f[\pi]\mathbf{e}_\pi[x] + \dots \end{align*}\]If we evaluate this “sum” at $x$, we’ll find that all terms are zero—except $\mathbf{e}_x$, making the result $f[x]$.Linear Operators  Review: Change of basis | Chapter 13, Essence of linear algebra.Now that we can manipulate functions as vectors, let’s start transferring the tools of linear algebra to the functional perspective.One ubiquitous operation on finite-dimensional vectors is transforming them with matrices.
A matrix $\mathbf{A}$ encodes a linear transformation, meaning multiplication preserves linear combinations.\[\mathbf{A}(\alpha \mathbf{x} + \beta \mathbf{y}) = \alpha \mathbf{A}\mathbf{x} + \beta \mathbf{A}\mathbf{y}\]Multiplying a vector by a matrix can be intuitively interpreted as defining a new set of coordinate axes from the matrix’s column vectors.
The result is a linear combination of the columns:\[\mathbf{Ax} = \begin{bmatrix} \vert &amp; \vert &amp; \vert \\ \mathbf{u} &amp; \mathbf{v} &amp; \mathbf{w} \\ \vert &amp; \vert &amp; \vert \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} = x_1\mathbf{u} + x_2\mathbf{v} + x_3\mathbf{w}\]\[\begin{align*}
\mathbf{Ax} &amp;= \begin{bmatrix} \vert &amp; \vert &amp; \vert \\ \mathbf{u} &amp; \mathbf{v} &amp; \mathbf{w} \\ \vert &amp; \vert &amp; \vert \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} \\ &amp;= x_1\mathbf{u} + x_2\mathbf{v} + x_3\mathbf{w}
\end{align*}\]When all vectors can be expressed as a linear combination of $\mathbf{u}$, $\mathbf{v}$, and $\mathbf{w}$, the columns form a basis for the underlying vector space.
Here, the matrix $\mathbf{A}$ transforms a vector from the $\mathbf{uvw}$ basis into the standard basis.Since functions are vectors, we could imagine transforming a function by a matrix.
Such a matrix would be infinite-dimensional, so we will instead call it a linear operator and denote it with $\mathcal{L}$.\[\mathcal{L}f = \begin{bmatrix} \vert &amp; \vert &amp; \vert &amp; \\ \mathbf{f} &amp; \mathbf{g} &amp; \mathbf{h} &amp; \cdots \\ \vert &amp; \vert &amp; \vert &amp; \end{bmatrix} \begin{bmatrix}f_1\\ f_2 \\ f_3\\ \vdots\end{bmatrix} = f_1\mathbf{f} + f_2\mathbf{g} + f_3\mathbf{h} + \cdots\]\[\begin{align*}
\mathcal{L}f &amp;= \begin{bmatrix} \vert &amp; \vert &amp; \vert &amp; \\ \mathbf{f} &amp; \mathbf{g} &amp; \mathbf{h} &amp; \cdots \\ \vert &amp; \vert &amp; \vert &amp; \end{bmatrix} \begin{bmatrix}f_1\\ f_2\\ f_3 \\ \vdots\end{bmatrix} \\ &amp;= f_1\mathbf{f} + f_2\mathbf{g} + f_3\mathbf{h} + \cdots
\end{align*}\]This visualization isn’t very accurate—we’re dealing with uncountably infinite-dimensional vectors, so we can’t actually write out an operator in matrix form.
Nonetheless, the structure is suggestive: each “column” of the operator describes a new basis function for our functional vector space.
Just like we saw with finite-dimensional vectors, $\mathcal{L}$ represents a change of basis.Differentiation  Review: Derivative formulas through geometry | Chapter 3, Essence of calculus.So, what’s an example of a linear operator on functions?
You might recall that differentiation is linear:\[\frac{\partial}{\partial x} \left(\alpha f[x] + \beta g[x]\right) = \alpha\frac{\partial f}{\partial x} + \beta\frac{\partial g}{\partial x}\]It’s hard to visualize differentiation on general functions, but it’s feasible for the subspace of polynomials, $\mathcal{P}$.
Let’s take a slight detour to examine this smaller space of functions.\[\mathcal{P} = \{ p[x] = a + bx + cx^2 + dx^3 + \cdots \}\]We typically write down polynomials as a sequence of powers, i.e. $1, x, x^2, x^3$, etc.
All polynomials are linear combinations of the functions $\mathbf{e}_i[x] = x^i$, so they constitute a countably infinite basis for $\mathcal{P}$.4This basis provides a convenient vector notation:\[\begin{align*} p[x] &amp;= a + bx + cx^2 + dx^3 + \cdots \\ &amp;= a\mathbf{e}_0 + b\mathbf{e}_1 + c \mathbf{e}_2 + d\mathbf{e}_3 + \dots \end{align*}\ \iff\ \mathbf{p} = \begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix}\]\[\begin{align*} p[x] &amp;= a + bx + cx^2 + dx^3 + \cdots \\ &amp;= a\mathbf{e}_0 + b\mathbf{e}_1 + c \mathbf{e}_2 + d\mathbf{e}_3 + \dots \\&amp; \iff\ \mathbf{p} = \begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix} \end{align*}\]Since differentiation is linear, we’re able to apply the rule $\frac{\partial}{\partial x} x^n = nx^{n-1}$ to each term.\[\begin{align*}\frac{\partial}{\partial x}p[x] &amp;= \vphantom{\Bigg\vert}a\frac{\partial}{\partial x}1 + b\frac{\partial}{\partial x}x + c\frac{\partial}{\partial x}x^2 + d\frac{\partial}{\partial x}x^3 + \dots \\ &amp;= b + 2cx + 3dx^2 + \cdots\\ &amp;= b\mathbf{e}_0 + 2c\mathbf{e}_1 + 3d\mathbf{e}_2 + \dots\end{align*}  \ \iff\ \frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\]\[\begin{align*}\frac{\partial}{\partial x}p[x] &amp;= \vphantom{\Bigg\vert}a\frac{\partial}{\partial x}1 + b\frac{\partial}{\partial x}x + c\frac{\partial}{\partial x}x^2\, +\\ &amp; \phantom{=} d\frac{\partial}{\partial x}x^3 + \dots \\ &amp;= b + 2cx + 3dx^2 + \cdots\\ &amp;= b\mathbf{e}_0 + 2c\mathbf{e}_1 + 3d\mathbf{e}_2 + \dots  \\ &amp;\iff\ \frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\end{align*}\]We’ve performed a linear transformation on the coefficients, so we can represent differentiation as a matrix!\[\frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}0 &amp; 1 &amp; 0 &amp; 0 &amp; \cdots\\ 0 &amp; 0 &amp; 2 &amp; 0 &amp; \cdots\\ 0 &amp; 0 &amp; 0 &amp; 3 &amp; \cdots\\ \vdots &amp; \vdots &amp; \vdots &amp; \vdots &amp; \ddots \end{bmatrix}\begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\]Each column of the differentiation operator is itself a polynomial, so this matrix represents a change of basis.\[\frac{\partial}{\partial x} = \begin{bmatrix} \vert &amp; \vert &amp; \vert &amp; \vert &amp; \vert &amp;  \\ 0 &amp; 1 &amp; 2x &amp; 3x^2 &amp; 4x^3 &amp; \cdots \\ \vert &amp; \vert &amp; \vert &amp; \vert &amp; \vert &amp;  \end{bmatrix}\]As we can see, the differentiation operator simply maps each basis function to its derivative.This result also applies to the larger space of analytic real functions, which includes polynomials, exponential functions, trigonometric functions, logarithms, and other familiar names.
By definition, an analytic function can be expressed as a Taylor series about $0$:\[f[x] = \sum_{n=0}^\infty \frac{f^{(n)}[0]}{n!}x^n = \sum_{n=0}^\infty \alpha_n x^n\]Which is a linear combination of our polynomial basis functions.
That means a Taylor expansion is essentially a change of basis into the sequence of powers, where our differentiation operator is quite simple.5Diagonalization  Review: Eigenvectors and eigenvalues | Chapter 14, Essence of linear algebra.Matrix decompositions are arguably the crowning achievement of linear algebra.
To get started, let’s review what diagonalization means for a $3\times3$ real matrix $\mathbf{A}$.EigenvectorsA vector $\mathbf{u}$ is an eigenvector of the matrix $\mathbf{A}$ when the following condition holds:$$ \mathbf{Au} = \lambda \mathbf{u} $$The eigenvalue $\lambda$ may be computed by solving the characteristic polynomial of $\mathbf{A}$.
Eigenvalues may be real or complex.The matrix $\mathbf{A}$ is diagonalizable when it admits three linearly independent eigenvectors, each with a corresponding real eigenvalue.
This set of eigenvectors constitutes an eigenbasis for the underlying vector space, indicating that we can express any vector $\mathbf{x}$ via their linear combination.$$ \mathbf{x} = \alpha\mathbf{u}_1 + \beta\mathbf{u}_2 + \gamma\mathbf{u}_3 $$To multiply $\mathbf{x}$ by $\mathbf{A}$, we just have to scale each component by its corresponding eigenvalue.$$ \begin{align*} \mathbf{Ax} &amp;= \alpha\mathbf{A}\mathbf{u}_1 + \beta\mathbf{A}\mathbf{u}_2 + \gamma\mathbf{A}\mathbf{u}_3 \\
&amp;= \alpha\lambda_1\mathbf{u}_1 + \beta\lambda_2\mathbf{u}_2 + \gamma\lambda_3\mathbf{u}_3 \end{align*} $$Finally, re-combining the eigenvectors expresses the result in the standard basis.Intuitively, we’ve shown that multiplying by $\mathbf{A}$ is equivalent to a change of basis, a scaling, and a change back.
That means we can write $\mathbf{A}$ as the product of an invertible matrix $\mathbf{U}$ and a diagonal matrix $\mathbf{\Lambda}$.\[\begin{align*} \mathbf{A} &amp;= \mathbf{U\Lambda U^{-1}} \\
 &amp;= \begin{bmatrix}\vert &amp; \vert &amp; \vert \\ \mathbf{u}_1 &amp; \mathbf{u}_2 &amp; \mathbf{u}_3 \\ \vert &amp; \vert &amp; \vert \end{bmatrix}
 \begin{bmatrix}\lambda_1 &amp; 0 &amp; 0 \\ 0 &amp; \lambda_2 &amp; 0 \\ 0 &amp; 0 &amp; \lambda_3 \end{bmatrix}
 \begin{bmatrix}\vert &amp; \vert &amp; \vert \\ \mathbf{u}_1 &amp; \mathbf{u}_2 &amp; \mathbf{u}_3 \\ \vert &amp; \vert &amp; \vert \end{bmatrix}^{-1}
\end{align*}\]\[\begin{align*} \mathbf{A} &amp;= \mathbf{U\Lambda U^{-1}} \\
 &amp;= \begin{bmatrix}\vert &amp; \vert &amp; \vert \\ \mathbf{u}_1 &amp; \mathbf{u}_2 &amp; \mathbf{u}_3 \\ \vert &amp; \vert &amp; \vert \end{bmatrix}
 \\ &amp; \phantom{=} \begin{bmatrix}\lambda_1 &amp; 0 &amp; 0 \\ 0 &amp; \lambda_2 &amp; 0 \\ 0 &amp; 0 &amp; \lambda_3 \end{bmatrix}
 \\ &amp; \phantom{=} \begin{bmatrix}\vert &amp; \vert &amp; \vert \\ \mathbf{u}_1 &amp; \mathbf{u}_2 &amp; \mathbf{u}_3 \\ \vert &amp; \vert &amp; \vert \end{bmatrix}^{-1}
\end{align*}\]Note that $\mathbf{U}$ is invertible because its columns (the eigenvectors) form a basis for $\mathbb{R}^3$.
When multiplying by $\mathbf{x}$, $\mathbf{U}^{-1}$ converts $\mathbf{x}$ to the eigenbasis, $\mathbf{\Lambda}$ scales by the corresponding eigenvalues, and $\mathbf{U}$ takes us back to the standard basis.In the presence of complex eigenvalues, $\mathbf{A}$ may still be diagonalizable if we allow $\mathbf{U}$ and $\mathbf{\Lambda}$ to include complex entires.
In this case, the decomposition as a whole still maps real vectors to real vectors, but the intermediate values become complex.Eigenfunctions  Review: What’s so special about Euler’s number e? | Chapter 5, Essence of calculus.So, what does diagonalization mean in a vector space of functions?
Given a linear operator $\mathcal{L}$, you might imagine a corresponding definition for eigenfunctions:\[\mathcal{L}f = \psi f\]The scalar $\psi$ is again known as an eigenvalue.
Since $\mathcal{L}$ is infinite-dimensional, it doesn’t have a characteristic polynomial—there’s not a straightforward method for computing $\psi$.Nevertheless, let’s attempt to diagonalize differentiation on analytic functions.
The first step is to find the eigenfunctions.
Start by applying the above condition to our differentiation operator in the power basis:\[\begin{align*}
&amp;&amp; \frac{\partial}{\partial x}\mathbf{p} = \psi \mathbf{p} \vphantom{\Big|}&amp; \\
&amp;\iff&amp; \begin{bmatrix}0 &amp; 1 &amp; 0 &amp; 0 &amp; \cdots\\ 0 &amp; 0 &amp; 2 &amp; 0 &amp; \cdots\\ 0 &amp; 0 &amp; 0 &amp; 3 &amp; \cdots\\ \vdots &amp; \vdots &amp; \vdots &amp; \vdots &amp; \ddots \end{bmatrix}\begin{bmatrix}p_0\\ p_1\\ p_2\\ p_3\\ \vdots\end{bmatrix}
&amp;= \begin{bmatrix}\psi p_0\\ \psi p_1 \\ \psi p_2 \\ \psi p_3 \\ \vdots \end{bmatrix} \\
&amp;\iff&amp; \begin{cases} p_1 &amp;= \psi p_0 \\ p_2 &amp;= \frac{\psi}{2} p_1 \\ p_3 &amp;= \frac{\psi}{3} p_2 \\ &amp;\dots \end{cases} &amp;
\end{align*}\]This system of equations implies that all coefficients are determined solely by our choice of constants $p_0$ and $\psi$.
We can explicitly write down their relationship as $p_i = \frac{\psi^i}{i!}p_0$.Now, let’s see what this class of polynomials actually looks like.\[p[x] = p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2 + p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots\]\[\begin{align*}
p[x] &amp;= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2\, +\\ &amp;\phantom{=} p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots
\end{align*}\]Differentiation shows that this function is, in fact, an eigenfunction for the eigenvalue $\psi$.\[\begin{align*} \frac{\partial}{\partial x} p[x] &amp;= 0 + p_0\psi + p_0 \psi^2 x + p_0\frac{\psi^3}{2}x^2 + p_0\frac{\psi^4}{6}x^3 + \dots \\
&amp;= \psi p[x] \end{align*}\]\[\begin{align*}
\frac{\partial}{\partial x} p[x] &amp;= 0 + p_0\psi + p_0 \psi^2 x\, +\\ &amp;\phantom{=} p_0\frac{\psi^3}{2}x^2 + p_0\frac{\psi^4}{6}x^3 + \dots \\
&amp;= \psi p[x]
\end{align*}\]With a bit of algebraic manipulation, the definition of $e^{x}$ pops out:\[\begin{align*} p[x] &amp;= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2 + p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots \\
&amp;= p_0\left((\psi x) + \frac{1}{2!}(\psi x)^2 + \frac{1}{3!}(\psi x)^3 + \frac{1}{4!}(\psi x)^4 + \dots\right) \\
&amp;= p_0 e^{\psi x} \end{align*}\]\[\begin{align*}
p[x] &amp;= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2\, +\\ &amp;\phantom{=} p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots \\
&amp;= p_0\Big((\psi x) + \frac{1}{2!}(\psi x)^2\, +\\ &amp;\phantom{=p_0\Big((} \frac{1}{3!}(\psi x)^3 + \frac{1}{4!}(\psi x)^4 + \dots\Big) \\
&amp;= p_0 e^{\psi x}
\end{align*}\]Therefore, functions of the form $p_0e^{\psi x}$ are eigenfunctions for the eigenvalue $\psi$, including when $\psi=0$.Diagonalizing DifferentiationWe’ve found the eigenfunctions of the derivative operator, but can we diagonalize it?
Ideally, we would express differentiation as the combination of an invertible operator $\mathcal{L}$ and a diagonal operator $\mathcal{D}$.\[\begin{align*} \frac{\partial}{\partial x} &amp;= \mathcal{L} \mathcal{D} \mathcal{L}^{-1}  \\
&amp;=
\begin{bmatrix} \vert &amp; \vert &amp;  &amp; \\ \alpha e^{\psi_1 x} &amp; \beta e^{\psi_2 x} &amp; \dots \\ \vert &amp; \vert &amp;  \end{bmatrix}
\begin{bmatrix} \psi_1 &amp; 0 &amp; \dots \\ 0 &amp; \psi_2 &amp; \dots \\ \vdots &amp; \vdots &amp; \ddots \end{bmatrix}
{\color{red} \begin{bmatrix} \vert &amp; \vert &amp;  &amp; \\ \alpha e^{\psi_1 x} &amp; \beta e^{\psi_2 x} &amp; \dots \\ \vert &amp; \vert &amp;  \end{bmatrix}^{-1} }
\end{align*}\]\[\begin{align*} \frac{\partial}{\partial x} &amp;= \mathcal{L} \mathcal{D} \mathcal{L}^{-1}  \\
&amp;=
\begin{bmatrix} \vert &amp; \vert &amp;  &amp; \\ \alpha e^{\psi_1 x} &amp; \beta e^{\psi_2 x} &amp; \dots \\ \vert &amp; \vert &amp;  \end{bmatrix}
\\ &amp; \phantom{=} \begin{bmatrix} \psi_1 &amp; 0 &amp; \dots \\ 0 &amp; \psi_2 &amp; \dots \\ \vdots &amp; \vdots &amp; \ddots \end{bmatrix}
\\ &amp; \phantom{=} {\color{red} \begin{bmatrix} \vert &amp; \vert &amp;  &amp; \\ \alpha e^{\psi_1 x} &amp; \beta e^{\psi_2 x} &amp; \dots \\ \vert &amp; \vert &amp;  \end{bmatrix}^{-1} }
\end{align*}\]Diagonalization is only possible when our eigenfunctions form a basis.
This would be true if all analytic functions are expressible as a linear combination of exponentials.
However…    Counterexample: $$f[x] = x$$      First assume that $f[x] = x$ can be represented as a linear combination of exponentials.
Since analytic functions have countably infinite dimensionality, we should only need a countably infinite sum:\[f[x] = x = \sum_{n=0}^\infty \alpha_n e^{\psi_n x}\]      Differentiating both sides:\[\begin{align*} f^{\prime}[x] &amp;= 1 = \sum_{n=0}^\infty \psi_n\alpha_n e^{\psi_n x} \\
 f^{\prime\prime}[x] &amp;= 0 = \sum_{n=0}^\infty \psi_n^2\alpha_n e^{\psi_n x} \end{align*}\]      Since $e^{\psi_n x}$ and $e^{\psi_m x}$ are linearly independent when $n\neq m$, the final equation implies that all $\alpha = 0$, except possibly the $\alpha_\xi$ corresponding to $\psi_\xi = 0$.
Therefore:\[\begin{align*}
1 &amp;= \sum_{n=0}^\infty \psi_n\alpha_n e^{\psi_n x}\\
&amp;= \psi_\xi \alpha_\xi + \sum_{n\neq \xi} 0\psi_n e^{\psi_n x} \\
&amp;= 0
\end{align*}\]      That’s a contradiction—the linear combination representing $f[x] = x$ does not exist.      A similar argument shows that we can’t represent any non-constant function whose $n$th derivative is zero, nor periodic functions like sine and cosine.Real exponentials don’t constitute a basis, so we cannot construct an invertible $\mathcal{L}$.The Laplace TransformWe previously mentioned that more matrices can be diagonalized if we allow the decomposition to contain complex numbers.
Analogously, more linear operators are diagonalizable in the larger vector space of functions from $\mathbb{R}$ to $\mathbb{C}$.Differentiation works the same way in this space; we’ll still find that its eigenfunctions are exponential.\[\frac{\partial}{\partial x} e^{(a+bi)x} = (a+bi)e^{(a+bi)x}\]However, the new eigenfunctions have complex eigenvalues, so we still can’t diagonalize.
We’ll need to consider the still larger space of functions from $\mathbb{C}$ to $\mathbb{C}$.\[\frac{\partial}{\partial x} : (\mathbb{C}\mapsto\mathbb{C}) \mapsto (\mathbb{C}\mapsto\mathbb{C})\]In this space, differentiation can be diagonalized via the Laplace transform.
Although useful for solving differential equations, the Laplace transform is non-trivial to invert, so we won’t discuss it further.
In the following sections, we’ll delve into an operator that can be easily diagonalized in $\mathbb{R}\mapsto\mathbb{C}$: the Laplacian.Inner Product Spaces  Review: Dot products and duality | Chapter 9, Essence of linear algebra.Before we get to the spectral theorem, we’ll need to understand one more topic: inner products.
You’re likely already familiar with one example of an inner product—the Euclidean dot product.\[\begin{bmatrix}x\\ y\\ z\end{bmatrix} \cdot \begin{bmatrix}a\\ b\\ c\end{bmatrix} = ax + by + cz\]An inner product describes how to measure a vector along another vector.
For example, $\mathbf{u}\cdot\mathbf{v}$ is proportional to the length of the projection of $\mathbf{u}$ onto $\mathbf{v}$.$$ \mathbf{u} \cdot \mathbf{v} =\|\mathbf{u}\|\|\mathbf{v}\|\cos[\theta] $$With a bit of trigonometry, we can show that the dot product is equivalent to multiplying the vectors’ lengths with the cosine of their angle.
This relationship suggests that the product of a vector with itself produces the square of its length.\[\begin{align*} \mathbf{u}\cdot\mathbf{u} &amp;= \|\mathbf{u}\|\|\mathbf{u}\|\cos[0] \\
&amp;= \|\mathbf{u}\|^2
\end{align*}\]Similarly, when two vectors form a right angle (are orthogonal), their dot product is zero.$$ \begin{align*}  \mathbf{u} \cdot \mathbf{v} &amp;= \|\mathbf{u}\|\|\mathbf{v}\|\cos[90^\circ] \\ &amp;= 0 \end{align*} $$Of course, the Euclidean dot product is only one example of an inner product.
In more general spaces, the inner product is denoted using angle brackets, such as $\langle \mathbf{u}, \mathbf{v} \rangle$.  The length (also known as the norm) of a vector is defined as $\|\mathbf{u}\| = \sqrt{\langle \mathbf{u}, \mathbf{u} \rangle}$.
  Two vectors are orthogonal if their inner product is zero: $\ \mathbf{u} \perp \mathbf{v}\ \iff\ \langle \mathbf{u}, \mathbf{v} \rangle = 0$.A vector space augmented with an inner product is known as an inner product space.A Functional Inner ProductWe can’t directly apply the Euclidean dot product to our space of real functions, but its $N$-dimensional generalization is suggestive.\[\begin{align*} \mathbf{u} \cdot \mathbf{v} &amp;= u_1v_1 + u_2v_2 + \dots + u_Nv_N \\ &amp;= \sum_{i=1}^N u_iv_i \end{align*}\]Given countable indices, we simply match up the values, multiply them, and add the results.
When indices are uncountable, we can convert the discrete sum to its continuous analog: an integral!\[\langle f, g \rangle = \int_a^b f[x]g[x] \, dx\]When $f$ and $g$ are similar, multiplying them produces a larger function; when they’re different, they cancel out.
Integration measures their product over some domain to produce a scalar result.Of course, not all functions can be integrated.
Our inner product space will only contain functions that are square integrable over the domain $[a, b]$, which may be $[-\infty, \infty]$.
Luckily, the important properties of our inner product do not depend on the choice of integration domain.ProofsBelow, we’ll briefly cover functions from $\mathbb{R}$ to $\mathbb{C}$.
In this space, our intuitive notion of similarity still applies, but we’ll use a slightly more general inner product:\[\langle f,g \rangle = \int_a^b f[x]\overline{g[x]}\, dx\]Where $\overline{x}$ denotes conjugation, i.e. $\overline{a + bi} = a - bi$.Like other vector space operations, an inner product must satisfy several axioms:    Conjugate Symmetry  For all vectors $$\mathbf{u}, \mathbf{v} \in \mathcal{V}$$:  $$\langle \mathbf{u}, \mathbf{v} \rangle = \overline{\langle \mathbf{v}, \mathbf{u} \rangle}$$  Conjugation may be taken outside the integral, making this one easy:  $$\begin{align*} \langle f, g \rangle &amp;= \int_a^b f[x]\overline{g[x]} \, dx \\
  &amp;= \int_a^b \overline{g[x]\overline{f[x]}} \, dx \\
  &amp;= \overline{\int_a^b g[x]\overline{f[x]} \, dx} \\
  &amp;= \overline{\langle g, f \rangle}
  \end{align*}$$  Note that we require conjugate symmetry because it implies $$\langle\mathbf{u}, \mathbf{u}\rangle = \overline{\langle\mathbf{u}, \mathbf{u}\rangle}$$, i.e. the inner product of a vector with itself is real.    Linearity in the First Argument  For all vectors $$\mathbf{u}, \mathbf{v}, \mathbf{w} \in \mathcal{V}$$ and scalars $$\alpha, \beta \in \mathbb{F}$$:  $$\langle \alpha \mathbf{u} + \beta \mathbf{v}, \mathbf{w} \rangle = \alpha\langle \mathbf{u}, \mathbf{w} \rangle + \beta\langle \mathbf{v}, \mathbf{w} \rangle $$  The proof follows from linearity of integration, as well as our vector space axioms:\[\begin{align*} \langle \alpha f + \beta g, h \rangle &amp;= \int_a^b (\alpha f + \beta g)[x]\overline{h[x]} \, dx \\
&amp;= \int_a^b (\alpha f[x] + \beta g[x])\overline{h[x]} \, dx \\
&amp;= \int_a^b \alpha f[x]\overline{h[x]} + \beta g[x]\overline{h[x]} \, dx \\
&amp;= \alpha\int_a^b f[x]\overline{h[x]}\, dx + \beta\int_a^b g[x]\overline{h[x]} \, dx \\
&amp;= \alpha\langle f, h \rangle + \beta\langle g, h \rangle
\end{align*}\]\[\begin{align*} &amp;\langle \alpha f + \beta g, h \rangle\\ &amp;= \int_a^b (\alpha f + \beta g)[x]\overline{h[x]} \, dx \\
&amp;= \int_a^b (\alpha f[x] + \beta g[x])\overline{h[x]} \, dx \\
&amp;= \int_a^b \alpha f[x]\overline{h[x]} + \beta g[x]\overline{h[x]} \, dx \\
&amp;= \alpha\int_a^b f[x]\overline{h[x]}\, dx\, +\\&amp;\hphantom{==} \beta\int_a^b g[x]\overline{h[x]} \, dx \\
&amp;= \alpha\langle f, h \rangle + \beta\langle g, h \rangle
\end{align*}\]  Given conjugate symmetry, an inner product is also antilinear in the second argument.    Positive-Definiteness  For all $$\mathbf{u} \in \mathcal{V}$$:  $$ \begin{cases} \langle \mathbf{u}, \mathbf{u} \rangle = 0 &amp; \text{if } \mathbf{u} = \mathbf{0} \\ \langle \mathbf{u}, \mathbf{u} \rangle &gt; 0 &amp; \text{otherwise} \end{cases} $$  By conjugate symmetry, we know $$\langle f, f \rangle$$ is real, so we can compare it with zero.  However, rigorously proving this result requires measure-theoretic concepts beyond the scope of this post.
  In brief, we redefine $$\mathbf{0}$$ not as specifically $$\mathbf{0}[x] = 0$$, but as an equivalence class of functions that are zero</div><footer></footer></article></li></ol></footer></blockquote>

Corresponding Notice

Embed this notice
thenumbat@mastodon.gamedev.place's status on Sunday, 07-Apr-2024 06:36:51 JST TheNumbat
Functions are vectors! This perspective lets us apply the tools of linear algebra to computational problems from image and geometry processing to machine learning and light transport—and provides a natural explanation for Fourier series.
Let's explore: https://thenumb.at/Functions-are-Vectors
In conversationabout a year ago from mastodon.gamedev.placepermalink
Attachments
1. Untitled attachment
2. Domain not in remote thumbnail source whitelist: thenumb.at
  Functions are Vectors
  from Max Slater
  Conceptualizing functions as infinite-dimensional vectors lets us apply the tools of linear algebra to a vast landscape of new problems, from image and geometry processing to curve fitting, light transport, and machine learning. Prerequisites: introductory linear algebra, introductory calculus, introductory differential equations. This article received an honorable mention in 3Blue1Brown’s Summer of Math Exposition 3! Functions as Vectors Vector Spaces Linear Operators Diagonalization Inner Product Spaces The Spectral Theorem Applications Fourier Series Image Compression Geometry Processing Further Reading Functions as Vectors Vectors are often first introduced as lists of real numbers—i.e. the familiar notation we use for points, directions, and more. $$ \mathbf{v} = \begin{bmatrix}x\\y\\z\end{bmatrix} $$ You may recall that this representation is only one example of an abstract vector space. There are many other types of vectors, such as lists of complex numbers, graph cycles, and even magic squares. However, all of these vector spaces have one thing in common: a finite number of dimensions. That is, each kind of vector can be represented as a collection of $N$ numbers, though the definition of “number” varies. If any $N$-dimensional vector is essentially a length-$N$ list, we could also consider a vector to be a mapping from an index to a value. \[\begin{align*} \mathbf{v}_1 &= x\\ \mathbf{v}_2 &= y\\ \mathbf{v}_3 &= z \end{align*}\ \iff\ \mathbf{v} = \begin{bmatrix}x \\ y \\ z\end{bmatrix}\] What does this perspective hint at as we increase the number of dimensions? Dimensions In higher dimensions, vectors start to look more like functions! Countably Infinite Indices Of course, a finite-length vector only specifies a value at a limited number of indices. Could we instead define a vector that contains infinitely many values? Writing down a vector representing a function on the natural numbers ($\mathbb{N}$)—or any other countably infinite domain—is straightforward: just extend the list indefinitely. $$ \begin{align*}\mathbf{v}_1 &= 1\\\mathbf{v}_2 &= 2\\ &\vdots \\ \mathbf{v}_i &= i\end{align*}\ \iff\ \mathbf{v} = \begin{bmatrix}1 \\ 2 \\ 3 \\ \vdots \end{bmatrix} $$ This vector could represent the function $f(x) = x$, where $x \in \mathbb{N}$.1 Uncountably Infinite Indices Many interesting functions are defined on the real numbers ($\mathbb{R}$), so may not be representable as a countably infinite vector. Therefore, we will have to make a larger conceptual leap: not only will our set of indices be infinite, it will be uncountably infinite. That means we can’t write down vectors as lists at all—it is impossible to assign an integer index to each element of an uncountable set. So, how can we write down a vector mapping a real index to a certain value? Now, a vector really is just an arbitrary function: $$ \mathbf{v}_{x} = x^2\ \iff\ \mathbf{v} = \begin{bmatrix} x \mapsto x^2 \end{bmatrix} $$ Precisely defining how and why we can represent functions as infinite-dimensional vectors is the purview of functional analysis. In this post, we won’t attempt to prove our results in infinite dimensions: we will focus on building intuition via analogies to finite-dimensional linear algebra. Vector Spaces Review: Abstract vector spaces | Chapter 16, Essence of linear algebra. Formally, a vector space is defined by choosing a set of vectors $\mathcal{V}$, a scalar field $\mathbb{F}$, and a zero vector $\mathbf{0}$. The field $\mathbb{F}$ is often the real numbers ($\mathbb{R}$), complex numbers ($\mathbb{C}$), or a finite field such as the integers modulo a prime ($\mathbb{Z}_p$). Additionally, we must specify how to add two vectors and how to multiply a vector by a scalar. \[\begin{align*} (+)\ &:\ \mathcal{V}\times\mathcal{V}\mapsto\mathcal{V}\\ (\cdot)\ &:\ \mathbb{F}\times\mathcal{V} \mapsto \mathcal{V} \end{align*}\] To describe a vector space, our definitions must entail several vector space axioms. A Functional Vector Space In the following sections, we’ll work with the vector space of real functions. To avoid ambiguity, square brackets are used to denote function application. The scalar field $\mathbb{F}$ is the real numbers $\mathbb{R}$. The set of vectors $\mathcal{V}$ contains functions from $\mathbb{R}$ to $\mathbb{R}$.2 $\mathbf{0}$ is the zero function, i.e. $\mathbf{0}[x] = 0$. Adding functions corresponds to applying the functions separately and summing the results. $$ (f + g)[x] = f[x] + g[x] $$ This definition generalizes the typical element-wise addition rule—it’s like adding the two values at each index. \[f+g = \begin{bmatrix}f_1 + g_1 \\ f_2 + g_2 \\ \vdots \end{bmatrix}\] Multiplying a function by a scalar corresponds to applying the function and scaling the result. $$ (\alpha f)[x] = \alpha f[x] $$ This rule similarly generalizes element-wise multiplication—it’s like scaling the value at each index. \[\alpha f = \begin{bmatrix}\alpha f_1 \\ \alpha f_2 \\ \vdots \end{bmatrix}\] Proofs Given these definitions, we can now prove all necessary vector space axioms. We will illustrate the analog of each property in $\mathbb{R}^2$, the familiar vector space of two-dimensional arrows. Vector Addition is Commutative For all vectors $$\mathbf{u}, \mathbf{v} \in \mathcal{V}$$: $$\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$$ Since real addition is commutative, this property follows directly from our definition of vector addition: $$\begin{align*} (f + g)[x] &= f[x] + g[x]\\ &= g[x] + f[x]\\ &= (g + f)[x] \end{align*}$$ Vector Addition is Associative For all vectors $$\mathbf{u}, \mathbf{v}, \mathbf{w} \in \mathcal{V}$$: $$(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$$ This property also follows from our definition of vector addition: $$\begin{align*} ((f + g) + h)[x] &= (f + g)[x] + h[x]\\ &= f[x] + g[x] + h[x]\\ &= f[x] + (g[x] + h[x])\\ &= f[x] + (g + h)[x]\\ &= (f + (g + h))[x] \end{align*}$$ $$\mathbf{0}$$ is an Additive Identity For all vectors $$\mathbf{u} \in \mathcal{V}$$: $$\mathbf{0} + \mathbf{u} = \mathbf{u} $$ This one is easy: $$\begin{align*} (\mathbf{0} + f)[x] &= \mathbf{0}[x] + f[x]\\ &= 0 + f[x]\\ &= f[x] \end{align*}$$ Additive Inverses Exist For all vectors $$\mathbf{u} \in \mathcal{V}$$, there exists a vector $$-\mathbf{u} \in \mathcal{V}$$ such that: $$\mathbf{u} + (-\mathbf{u}) = \mathbf{0}$$ Negation is defined as applying $$f$$ and negating the result: $$(-f)[x] = -f[x]$$. Clearly, $$-f$$ is also in $$\mathcal{V}$$. $$\begin{align*} (f + (-f))[x] &= f[x] + (-f)[x]\\ &= f[x] - f[x]\\ &= 0\\ &= \mathbf{0}[x] \end{align*}$$ $$1$$ is a Multiplicative Identity For all vectors $$\mathbf{u} \in \mathcal{V}$$: $$1\mathbf{u} = \mathbf{u}$$ Note that $$1$$ is specified by the choice of $$\mathbb{F}$$. In our case, it is simply the real number $$1$$. $$\begin{align*} (1 f)[x] &= 1 f[x]\\ &= f[x] \end{align*}$$ Scalar Multiplication is Associative For all vectors $$\mathbf{u} \in \mathcal{V}$$ and scalars $$\alpha, \beta \in \mathbb{F}$$: $$(\alpha \beta)\mathbf{u} = \alpha(\beta\mathbf{u})$$ This property follows from our definition of scalar multiplication: $$\begin{align*} ((\alpha\beta) f)[x] &= (\alpha\beta)f[x]\\ &= \alpha(\beta f[x])\\ &= \alpha(\beta f)[x] \end{align*}$$ Scalar Multiplication Distributes Over Vector Addition For all vectors $$\mathbf{u}, \mathbf{v} \in \mathcal{V}$$ and scalars $$\alpha \in \mathbb{F}$$: $$\alpha(\mathbf{u} + \mathbf{v}) = \alpha\mathbf{u} + \alpha\mathbf{v}$$ Again using our definitions of vector addition and scalar multiplication: $$\begin{align*} (\alpha (f + g))[x] &= \alpha(f + g)[x]\\ &= \alpha(f[x] + g[x])\\ &= \alpha f[x] + \alpha g[x]\\ &= (\alpha f)[x] + (\alpha g)[x]\\ &= (\alpha f + \alpha g)[x] \end{align*}$$ Scalar Multiplication Distributes Over Scalar Addition For all vectors $$\mathbf{u} \in \mathcal{V}$$ and scalars $$\alpha, \beta \in \mathbb{F}$$: $$(\alpha + \beta)\mathbf{u} = \alpha\mathbf{u} + \beta\mathbf{u}$$ Again using our definitions of vector addition and scalar multiplication: $$\begin{align*} ((\alpha + \beta)f)[x] &= (\alpha + \beta)f[x]\\ &= \alpha f[x] + \beta f[x] \\ &= (\alpha f)[x] + (\beta f)[x] \end{align*}$$ Therefore, we’ve built a vector space of functions!3 It may not be immediately obvious why this result is useful, but bear with us through a few more definitions—we will spend the rest of this post exploring powerful techniques arising from this perspective. A Standard Basis for Functions Review: Linear combinations, span, and basis vectors | Chapter 2, Essence of linear algebra. Unless specified otherwise, vectors are written down with respect to the standard basis. In $\mathbb{R}^2$, the standard basis consists of the two coordinate axes. $$ \mathbf{e}_1 = \begin{bmatrix}1 \\ 0\end{bmatrix},\,\, \mathbf{e}_2 = \begin{bmatrix}0 \\ 1\end{bmatrix} $$ Hence, vector notation is shorthand for a linear combination of the standard basis vectors. $$ \mathbf{u} = \begin{bmatrix}\alpha \\ \beta\end{bmatrix} = \alpha\mathbf{e}_1 + \beta\mathbf{e}_2 $$ Above, we represented functions as vectors by assuming each dimension of an infinite-length vector contains the function’s result for that index. This construction points to a natural generalization of the standard basis. Just like the coordinate axes, each standard basis function contains a $1$ at one index and $0$ everywhere else. More precisely, for every $\alpha \in \mathbb{R}$: \[\mathbf{e}_\alpha[x] = \begin{cases} 1 & \text{if } x = \alpha \\ 0 & \text{otherwise} \end{cases}\] Ideally, we could express an arbitrary function $f$ as a linear combination of these basis functions. However, there are uncountably many of them—and we can’t simply write down a sum over the reals. Still, considering their linear combination is illustrative: \[\begin{align*} f[x] &= f[\alpha]\mathbf{e}_\alpha[x] \\ &= f[1]\mathbf{e}_1[x] + f[2]\mathbf{e}_2[x] + f[\pi]\mathbf{e}_\pi[x] + \dots \end{align*}\] If we evaluate this “sum” at $x$, we’ll find that all terms are zero—except $\mathbf{e}_x$, making the result $f[x]$. Linear Operators Review: Change of basis | Chapter 13, Essence of linear algebra. Now that we can manipulate functions as vectors, let’s start transferring the tools of linear algebra to the functional perspective. One ubiquitous operation on finite-dimensional vectors is transforming them with matrices. A matrix $\mathbf{A}$ encodes a linear transformation, meaning multiplication preserves linear combinations. \[\mathbf{A}(\alpha \mathbf{x} + \beta \mathbf{y}) = \alpha \mathbf{A}\mathbf{x} + \beta \mathbf{A}\mathbf{y}\] Multiplying a vector by a matrix can be intuitively interpreted as defining a new set of coordinate axes from the matrix’s column vectors. The result is a linear combination of the columns: \[\mathbf{Ax} = \begin{bmatrix} \vert & \vert & \vert \\ \mathbf{u} & \mathbf{v} & \mathbf{w} \\ \vert & \vert & \vert \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} = x_1\mathbf{u} + x_2\mathbf{v} + x_3\mathbf{w}\] \[\begin{align*} \mathbf{Ax} &= \begin{bmatrix} \vert & \vert & \vert \\ \mathbf{u} & \mathbf{v} & \mathbf{w} \\ \vert & \vert & \vert \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} \\ &= x_1\mathbf{u} + x_2\mathbf{v} + x_3\mathbf{w} \end{align*}\] When all vectors can be expressed as a linear combination of $\mathbf{u}$, $\mathbf{v}$, and $\mathbf{w}$, the columns form a basis for the underlying vector space. Here, the matrix $\mathbf{A}$ transforms a vector from the $\mathbf{uvw}$ basis into the standard basis. Since functions are vectors, we could imagine transforming a function by a matrix. Such a matrix would be infinite-dimensional, so we will instead call it a linear operator and denote it with $\mathcal{L}$. \[\mathcal{L}f = \begin{bmatrix} \vert & \vert & \vert & \\ \mathbf{f} & \mathbf{g} & \mathbf{h} & \cdots \\ \vert & \vert & \vert & \end{bmatrix} \begin{bmatrix}f_1\\ f_2 \\ f_3\\ \vdots\end{bmatrix} = f_1\mathbf{f} + f_2\mathbf{g} + f_3\mathbf{h} + \cdots\] \[\begin{align*} \mathcal{L}f &= \begin{bmatrix} \vert & \vert & \vert & \\ \mathbf{f} & \mathbf{g} & \mathbf{h} & \cdots \\ \vert & \vert & \vert & \end{bmatrix} \begin{bmatrix}f_1\\ f_2\\ f_3 \\ \vdots\end{bmatrix} \\ &= f_1\mathbf{f} + f_2\mathbf{g} + f_3\mathbf{h} + \cdots \end{align*}\] This visualization isn’t very accurate—we’re dealing with uncountably infinite-dimensional vectors, so we can’t actually write out an operator in matrix form. Nonetheless, the structure is suggestive: each “column” of the operator describes a new basis function for our functional vector space. Just like we saw with finite-dimensional vectors, $\mathcal{L}$ represents a change of basis. Differentiation Review: Derivative formulas through geometry | Chapter 3, Essence of calculus. So, what’s an example of a linear operator on functions? You might recall that differentiation is linear: \[\frac{\partial}{\partial x} \left(\alpha f[x] + \beta g[x]\right) = \alpha\frac{\partial f}{\partial x} + \beta\frac{\partial g}{\partial x}\] It’s hard to visualize differentiation on general functions, but it’s feasible for the subspace of polynomials, $\mathcal{P}$. Let’s take a slight detour to examine this smaller space of functions. \[\mathcal{P} = \{ p[x] = a + bx + cx^2 + dx^3 + \cdots \}\] We typically write down polynomials as a sequence of powers, i.e. $1, x, x^2, x^3$, etc. All polynomials are linear combinations of the functions $\mathbf{e}_i[x] = x^i$, so they constitute a countably infinite basis for $\mathcal{P}$.4 This basis provides a convenient vector notation: \[\begin{align*} p[x] &= a + bx + cx^2 + dx^3 + \cdots \\ &= a\mathbf{e}_0 + b\mathbf{e}_1 + c \mathbf{e}_2 + d\mathbf{e}_3 + \dots \end{align*}\ \iff\ \mathbf{p} = \begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix}\] \[\begin{align*} p[x] &= a + bx + cx^2 + dx^3 + \cdots \\ &= a\mathbf{e}_0 + b\mathbf{e}_1 + c \mathbf{e}_2 + d\mathbf{e}_3 + \dots \\& \iff\ \mathbf{p} = \begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix} \end{align*}\] Since differentiation is linear, we’re able to apply the rule $\frac{\partial}{\partial x} x^n = nx^{n-1}$ to each term. \[\begin{align*}\frac{\partial}{\partial x}p[x] &= \vphantom{\Bigg\vert}a\frac{\partial}{\partial x}1 + b\frac{\partial}{\partial x}x + c\frac{\partial}{\partial x}x^2 + d\frac{\partial}{\partial x}x^3 + \dots \\ &= b + 2cx + 3dx^2 + \cdots\\ &= b\mathbf{e}_0 + 2c\mathbf{e}_1 + 3d\mathbf{e}_2 + \dots\end{align*} \ \iff\ \frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\] \[\begin{align*}\frac{\partial}{\partial x}p[x] &= \vphantom{\Bigg\vert}a\frac{\partial}{\partial x}1 + b\frac{\partial}{\partial x}x + c\frac{\partial}{\partial x}x^2\, +\\ & \phantom{=} d\frac{\partial}{\partial x}x^3 + \dots \\ &= b + 2cx + 3dx^2 + \cdots\\ &= b\mathbf{e}_0 + 2c\mathbf{e}_1 + 3d\mathbf{e}_2 + \dots \\ &\iff\ \frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\end{align*}\] We’ve performed a linear transformation on the coefficients, so we can represent differentiation as a matrix! \[\frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}0 & 1 & 0 & 0 & \cdots\\ 0 & 0 & 2 & 0 & \cdots\\ 0 & 0 & 0 & 3 & \cdots\\ \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix}\begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\] Each column of the differentiation operator is itself a polynomial, so this matrix represents a change of basis. \[\frac{\partial}{\partial x} = \begin{bmatrix} \vert & \vert & \vert & \vert & \vert & \\ 0 & 1 & 2x & 3x^2 & 4x^3 & \cdots \\ \vert & \vert & \vert & \vert & \vert & \end{bmatrix}\] As we can see, the differentiation operator simply maps each basis function to its derivative. This result also applies to the larger space of analytic real functions, which includes polynomials, exponential functions, trigonometric functions, logarithms, and other familiar names. By definition, an analytic function can be expressed as a Taylor series about $0$: \[f[x] = \sum_{n=0}^\infty \frac{f^{(n)}[0]}{n!}x^n = \sum_{n=0}^\infty \alpha_n x^n\] Which is a linear combination of our polynomial basis functions. That means a Taylor expansion is essentially a change of basis into the sequence of powers, where our differentiation operator is quite simple.5 Diagonalization Review: Eigenvectors and eigenvalues | Chapter 14, Essence of linear algebra. Matrix decompositions are arguably the crowning achievement of linear algebra. To get started, let’s review what diagonalization means for a $3\times3$ real matrix $\mathbf{A}$. Eigenvectors A vector $\mathbf{u}$ is an eigenvector of the matrix $\mathbf{A}$ when the following condition holds: $$ \mathbf{Au} = \lambda \mathbf{u} $$ The eigenvalue $\lambda$ may be computed by solving the characteristic polynomial of $\mathbf{A}$. Eigenvalues may be real or complex. The matrix $\mathbf{A}$ is diagonalizable when it admits three linearly independent eigenvectors, each with a corresponding real eigenvalue. This set of eigenvectors constitutes an eigenbasis for the underlying vector space, indicating that we can express any vector $\mathbf{x}$ via their linear combination. $$ \mathbf{x} = \alpha\mathbf{u}_1 + \beta\mathbf{u}_2 + \gamma\mathbf{u}_3 $$ To multiply $\mathbf{x}$ by $\mathbf{A}$, we just have to scale each component by its corresponding eigenvalue. $$ \begin{align*} \mathbf{Ax} &= \alpha\mathbf{A}\mathbf{u}_1 + \beta\mathbf{A}\mathbf{u}_2 + \gamma\mathbf{A}\mathbf{u}_3 \\ &= \alpha\lambda_1\mathbf{u}_1 + \beta\lambda_2\mathbf{u}_2 + \gamma\lambda_3\mathbf{u}_3 \end{align*} $$ Finally, re-combining the eigenvectors expresses the result in the standard basis. Intuitively, we’ve shown that multiplying by $\mathbf{A}$ is equivalent to a change of basis, a scaling, and a change back. That means we can write $\mathbf{A}$ as the product of an invertible matrix $\mathbf{U}$ and a diagonal matrix $\mathbf{\Lambda}$. \[\begin{align*} \mathbf{A} &= \mathbf{U\Lambda U^{-1}} \\ &= \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix} \begin{bmatrix}\lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{bmatrix} \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix}^{-1} \end{align*}\] \[\begin{align*} \mathbf{A} &= \mathbf{U\Lambda U^{-1}} \\ &= \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix} \\ & \phantom{=} \begin{bmatrix}\lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{bmatrix} \\ & \phantom{=} \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix}^{-1} \end{align*}\] Note that $\mathbf{U}$ is invertible because its columns (the eigenvectors) form a basis for $\mathbb{R}^3$. When multiplying by $\mathbf{x}$, $\mathbf{U}^{-1}$ converts $\mathbf{x}$ to the eigenbasis, $\mathbf{\Lambda}$ scales by the corresponding eigenvalues, and $\mathbf{U}$ takes us back to the standard basis. In the presence of complex eigenvalues, $\mathbf{A}$ may still be diagonalizable if we allow $\mathbf{U}$ and $\mathbf{\Lambda}$ to include complex entires. In this case, the decomposition as a whole still maps real vectors to real vectors, but the intermediate values become complex. Eigenfunctions Review: What’s so special about Euler’s number e? | Chapter 5, Essence of calculus. So, what does diagonalization mean in a vector space of functions? Given a linear operator $\mathcal{L}$, you might imagine a corresponding definition for eigenfunctions: \[\mathcal{L}f = \psi f\] The scalar $\psi$ is again known as an eigenvalue. Since $\mathcal{L}$ is infinite-dimensional, it doesn’t have a characteristic polynomial—there’s not a straightforward method for computing $\psi$. Nevertheless, let’s attempt to diagonalize differentiation on analytic functions. The first step is to find the eigenfunctions. Start by applying the above condition to our differentiation operator in the power basis: \[\begin{align*} && \frac{\partial}{\partial x}\mathbf{p} = \psi \mathbf{p} \vphantom{\Big|}& \\ &\iff& \begin{bmatrix}0 & 1 & 0 & 0 & \cdots\\ 0 & 0 & 2 & 0 & \cdots\\ 0 & 0 & 0 & 3 & \cdots\\ \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix}\begin{bmatrix}p_0\\ p_1\\ p_2\\ p_3\\ \vdots\end{bmatrix} &= \begin{bmatrix}\psi p_0\\ \psi p_1 \\ \psi p_2 \\ \psi p_3 \\ \vdots \end{bmatrix} \\ &\iff& \begin{cases} p_1 &= \psi p_0 \\ p_2 &= \frac{\psi}{2} p_1 \\ p_3 &= \frac{\psi}{3} p_2 \\ &\dots \end{cases} & \end{align*}\] This system of equations implies that all coefficients are determined solely by our choice of constants $p_0$ and $\psi$. We can explicitly write down their relationship as $p_i = \frac{\psi^i}{i!}p_0$. Now, let’s see what this class of polynomials actually looks like. \[p[x] = p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2 + p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots\] \[\begin{align*} p[x] &= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2\, +\\ &\phantom{=} p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots \end{align*}\] Differentiation shows that this function is, in fact, an eigenfunction for the eigenvalue $\psi$. \[\begin{align*} \frac{\partial}{\partial x} p[x] &= 0 + p_0\psi + p_0 \psi^2 x + p_0\frac{\psi^3}{2}x^2 + p_0\frac{\psi^4}{6}x^3 + \dots \\ &= \psi p[x] \end{align*}\] \[\begin{align*} \frac{\partial}{\partial x} p[x] &= 0 + p_0\psi + p_0 \psi^2 x\, +\\ &\phantom{=} p_0\frac{\psi^3}{2}x^2 + p_0\frac{\psi^4}{6}x^3 + \dots \\ &= \psi p[x] \end{align*}\] With a bit of algebraic manipulation, the definition of $e^{x}$ pops out: \[\begin{align*} p[x] &= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2 + p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots \\ &= p_0\left((\psi x) + \frac{1}{2!}(\psi x)^2 + \frac{1}{3!}(\psi x)^3 + \frac{1}{4!}(\psi x)^4 + \dots\right) \\ &= p_0 e^{\psi x} \end{align*}\] \[\begin{align*} p[x] &= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2\, +\\ &\phantom{=} p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots \\ &= p_0\Big((\psi x) + \frac{1}{2!}(\psi x)^2\, +\\ &\phantom{=p_0\Big((} \frac{1}{3!}(\psi x)^3 + \frac{1}{4!}(\psi x)^4 + \dots\Big) \\ &= p_0 e^{\psi x} \end{align*}\] Therefore, functions of the form $p_0e^{\psi x}$ are eigenfunctions for the eigenvalue $\psi$, including when $\psi=0$. Diagonalizing Differentiation We’ve found the eigenfunctions of the derivative operator, but can we diagonalize it? Ideally, we would express differentiation as the combination of an invertible operator $\mathcal{L}$ and a diagonal operator $\mathcal{D}$. \[\begin{align*} \frac{\partial}{\partial x} &= \mathcal{L} \mathcal{D} \mathcal{L}^{-1} \\ &= \begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix} \begin{bmatrix} \psi_1 & 0 & \dots \\ 0 & \psi_2 & \dots \\ \vdots & \vdots & \ddots \end{bmatrix} {\color{red} \begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix}^{-1} } \end{align*}\] \[\begin{align*} \frac{\partial}{\partial x} &= \mathcal{L} \mathcal{D} \mathcal{L}^{-1} \\ &= \begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix} \\ & \phantom{=} \begin{bmatrix} \psi_1 & 0 & \dots \\ 0 & \psi_2 & \dots \\ \vdots & \vdots & \ddots \end{bmatrix} \\ & \phantom{=} {\color{red} \begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix}^{-1} } \end{align*}\] Diagonalization is only possible when our eigenfunctions form a basis. This would be true if all analytic functions are expressible as a linear combination of exponentials. However… Counterexample: $$f[x] = x$$ First assume that $f[x] = x$ can be represented as a linear combination of exponentials. Since analytic functions have countably infinite dimensionality, we should only need a countably infinite sum: \[f[x] = x = \sum_{n=0}^\infty \alpha_n e^{\psi_n x}\] Differentiating both sides: \[\begin{align*} f^{\prime}[x] &= 1 = \sum_{n=0}^\infty \psi_n\alpha_n e^{\psi_n x} \\ f^{\prime\prime}[x] &= 0 = \sum_{n=0}^\infty \psi_n^2\alpha_n e^{\psi_n x} \end{align*}\] Since $e^{\psi_n x}$ and $e^{\psi_m x}$ are linearly independent when $n\neq m$, the final equation implies that all $\alpha = 0$, except possibly the $\alpha_\xi$ corresponding to $\psi_\xi = 0$. Therefore: \[\begin{align*} 1 &= \sum_{n=0}^\infty \psi_n\alpha_n e^{\psi_n x}\\ &= \psi_\xi \alpha_\xi + \sum_{n\neq \xi} 0\psi_n e^{\psi_n x} \\ &= 0 \end{align*}\] That’s a contradiction—the linear combination representing $f[x] = x$ does not exist. A similar argument shows that we can’t represent any non-constant function whose $n$th derivative is zero, nor periodic functions like sine and cosine. Real exponentials don’t constitute a basis, so we cannot construct an invertible $\mathcal{L}$. The Laplace Transform We previously mentioned that more matrices can be diagonalized if we allow the decomposition to contain complex numbers. Analogously, more linear operators are diagonalizable in the larger vector space of functions from $\mathbb{R}$ to $\mathbb{C}$. Differentiation works the same way in this space; we’ll still find that its eigenfunctions are exponential. \[\frac{\partial}{\partial x} e^{(a+bi)x} = (a+bi)e^{(a+bi)x}\] However, the new eigenfunctions have complex eigenvalues, so we still can’t diagonalize. We’ll need to consider the still larger space of functions from $\mathbb{C}$ to $\mathbb{C}$. \[\frac{\partial}{\partial x} : (\mathbb{C}\mapsto\mathbb{C}) \mapsto (\mathbb{C}\mapsto\mathbb{C})\] In this space, differentiation can be diagonalized via the Laplace transform. Although useful for solving differential equations, the Laplace transform is non-trivial to invert, so we won’t discuss it further. In the following sections, we’ll delve into an operator that can be easily diagonalized in $\mathbb{R}\mapsto\mathbb{C}$: the Laplacian. Inner Product Spaces Review: Dot products and duality | Chapter 9, Essence of linear algebra. Before we get to the spectral theorem, we’ll need to understand one more topic: inner products. You’re likely already familiar with one example of an inner product—the Euclidean dot product. \[\begin{bmatrix}x\\ y\\ z\end{bmatrix} \cdot \begin{bmatrix}a\\ b\\ c\end{bmatrix} = ax + by + cz\] An inner product describes how to measure a vector along another vector. For example, $\mathbf{u}\cdot\mathbf{v}$ is proportional to the length of the projection of $\mathbf{u}$ onto $\mathbf{v}$. $$ \mathbf{u} \cdot \mathbf{v} =\|\mathbf{u}\|\|\mathbf{v}\|\cos[\theta] $$ With a bit of trigonometry, we can show that the dot product is equivalent to multiplying the vectors’ lengths with the cosine of their angle. This relationship suggests that the product of a vector with itself produces the square of its length. \[\begin{align*} \mathbf{u}\cdot\mathbf{u} &= \|\mathbf{u}\|\|\mathbf{u}\|\cos[0] \\ &= \|\mathbf{u}\|^2 \end{align*}\] Similarly, when two vectors form a right angle (are orthogonal), their dot product is zero. $$ \begin{align*} \mathbf{u} \cdot \mathbf{v} &= \|\mathbf{u}\|\|\mathbf{v}\|\cos[90^\circ] \\ &= 0 \end{align*} $$ Of course, the Euclidean dot product is only one example of an inner product. In more general spaces, the inner product is denoted using angle brackets, such as $\langle \mathbf{u}, \mathbf{v} \rangle$. The length (also known as the norm) of a vector is defined as $\|\mathbf{u}\| = \sqrt{\langle \mathbf{u}, \mathbf{u} \rangle}$. Two vectors are orthogonal if their inner product is zero: $\ \mathbf{u} \perp \mathbf{v}\ \iff\ \langle \mathbf{u}, \mathbf{v} \rangle = 0$. A vector space augmented with an inner product is known as an inner product space. A Functional Inner Product We can’t directly apply the Euclidean dot product to our space of real functions, but its $N$-dimensional generalization is suggestive. \[\begin{align*} \mathbf{u} \cdot \mathbf{v} &= u_1v_1 + u_2v_2 + \dots + u_Nv_N \\ &= \sum_{i=1}^N u_iv_i \end{align*}\] Given countable indices, we simply match up the values, multiply them, and add the results. When indices are uncountable, we can convert the discrete sum to its continuous analog: an integral! \[\langle f, g \rangle = \int_a^b f[x]g[x] \, dx\] When $f$ and $g$ are similar, multiplying them produces a larger function; when they’re different, they cancel out. Integration measures their product over some domain to produce a scalar result. Of course, not all functions can be integrated. Our inner product space will only contain functions that are square integrable over the domain $[a, b]$, which may be $[-\infty, \infty]$. Luckily, the important properties of our inner product do not depend on the choice of integration domain. Proofs Below, we’ll briefly cover functions from $\mathbb{R}$ to $\mathbb{C}$. In this space, our intuitive notion of similarity still applies, but we’ll use a slightly more general inner product: \[\langle f,g \rangle = \int_a^b f[x]\overline{g[x]}\, dx\] Where $\overline{x}$ denotes conjugation, i.e. $\overline{a + bi} = a - bi$. Like other vector space operations, an inner product must satisfy several axioms: Conjugate Symmetry For all vectors $$\mathbf{u}, \mathbf{v} \in \mathcal{V}$$: $$\langle \mathbf{u}, \mathbf{v} \rangle = \overline{\langle \mathbf{v}, \mathbf{u} \rangle}$$ Conjugation may be taken outside the integral, making this one easy: $$\begin{align*} \langle f, g \rangle &= \int_a^b f[x]\overline{g[x]} \, dx \\ &= \int_a^b \overline{g[x]\overline{f[x]}} \, dx \\ &= \overline{\int_a^b g[x]\overline{f[x]} \, dx} \\ &= \overline{\langle g, f \rangle} \end{align*}$$ Note that we require conjugate symmetry because it implies $$\langle\mathbf{u}, \mathbf{u}\rangle = \overline{\langle\mathbf{u}, \mathbf{u}\rangle}$$, i.e. the inner product of a vector with itself is real. Linearity in the First Argument For all vectors $$\mathbf{u}, \mathbf{v}, \mathbf{w} \in \mathcal{V}$$ and scalars $$\alpha, \beta \in \mathbb{F}$$: $$\langle \alpha \mathbf{u} + \beta \mathbf{v}, \mathbf{w} \rangle = \alpha\langle \mathbf{u}, \mathbf{w} \rangle + \beta\langle \mathbf{v}, \mathbf{w} \rangle $$ The proof follows from linearity of integration, as well as our vector space axioms: \[\begin{align*} \langle \alpha f + \beta g, h \rangle &= \int_a^b (\alpha f + \beta g)[x]\overline{h[x]} \, dx \\ &= \int_a^b (\alpha f[x] + \beta g[x])\overline{h[x]} \, dx \\ &= \int_a^b \alpha f[x]\overline{h[x]} + \beta g[x]\overline{h[x]} \, dx \\ &= \alpha\int_a^b f[x]\overline{h[x]}\, dx + \beta\int_a^b g[x]\overline{h[x]} \, dx \\ &= \alpha\langle f, h \rangle + \beta\langle g, h \rangle \end{align*}\] \[\begin{align*} &\langle \alpha f + \beta g, h \rangle\\ &= \int_a^b (\alpha f + \beta g)[x]\overline{h[x]} \, dx \\ &= \int_a^b (\alpha f[x] + \beta g[x])\overline{h[x]} \, dx \\ &= \int_a^b \alpha f[x]\overline{h[x]} + \beta g[x]\overline{h[x]} \, dx \\ &= \alpha\int_a^b f[x]\overline{h[x]}\, dx\, +\\&\hphantom{==} \beta\int_a^b g[x]\overline{h[x]} \, dx \\ &= \alpha\langle f, h \rangle + \beta\langle g, h \rangle \end{align*}\] Given conjugate symmetry, an inner product is also antilinear in the second argument. Positive-Definiteness For all $$\mathbf{u} \in \mathcal{V}$$: $$ \begin{cases} \langle \mathbf{u}, \mathbf{u} \rangle = 0 & \text{if } \mathbf{u} = \mathbf{0} \\ \langle \mathbf{u}, \mathbf{u} \rangle > 0 & \text{otherwise} \end{cases} $$ By conjugate symmetry, we know $$\langle f, f \rangle$$ is real, so we can compare it with zero. However, rigorously proving this result requires measure-theoretic concepts beyond the scope of this post. In brief, we redefine $$\mathbf{0}$$ not as specifically $$\mathbf{0}[x] = 0$$, but as an equivalence class of functions that are zero

Public

Embed Notice

HTML Code

Corresponding Notice