奇异值分解是数学矩阵分析的一种方法，能够将任意形状的（不仅包含方阵）矩阵分解为正交阵和（近似）对角阵的乘积的形式。该方法在很多领域都有应用，如图像压缩、推荐系统等。本篇简要介绍。

理论

假设 $A_{m\times n}$ 为一个 $m\times n$ 实矩阵，则 $A^TA$ 为一个 $n$ 阶实对称矩阵，所以存在正交矩阵 $V=(v_1, v_2, \cdots, v_n)$ 使得

$$ V^T(A^TA)V = \text{diag}(\lambda_1, \lambda_2, \cdots, \lambda_n), $$

其中 $v_i$ 为 $A^TA$ 的对应于特征值 $\lambda_i$ 的单位特征向量，有

$$ \lambda_i = \lambda_i v^T_i v_i = v^T_i(\lambda_i v_i) = v^T_i(A^T Av_i) = (Av_i)^T(Av_i) = \parallel Av_i \parallel^2 \geq 0. $$

对特征值从大到小排序，$\lambda_1 \geq \lambda_i \geq \cdots \geq \lambda_r > 0, \lambda_{r+1} = \lambda_{r+2} = \cdots = \lambda_n = 0$，其中，

$$ r = r(A^TA) = r(A), \\ \parallel Av_i \parallel = \sqrt{\lambda_i}, i=1,2,\cdots,r, \\ Av_{r+1} = Av_{r+2} = \cdots = Av_n = 0. $$

下面说明对于不同的 $i \neq j$，有：

$$ (Av_i)^T(Av_j) = v^T_i(A^TAv_j) = v^T_i(\lambda_j v_j) = \lambda_j(v^T_i v_j) = 0. $$

所以，$Av_1, Av_2, \cdots, Av_r$ 是两两正交的非零向量。令

$$ u_i = \frac{Av_i}{\sqrt{\lambda_i}}, i=1,2,\cdots,r, $$

那么 $u_1, u_2, \cdots, u_r$ 是 $\mathbb{R}^m$ 上的标准正交向量组。因为 $r$ 是矩阵 $A$ 的秩，所以有 $r \leq m$。扩充

$$ u_1, u_2, \cdots, u_r, u_{r+1}, \cdots, u_m, $$

为 $\mathbb{R}^m$ 的标注正交基。因此，

$$ AV = (Av_1, \cdots, Av_r, Av_{r+1}, \cdots, Av_n) = (\sqrt{\lambda_1}u1, \cdots, \sqrt{\lambda_r}u_r, 0, \cdots, 0) $$ $$ = (u_1, \cdots, u_r, u_{r+1}, \cdots, u_m) \begin{bmatrix} \sqrt{\lambda_1} & \cdots & 0 & 0 \\ \vdots & \ddots & \vdots & \vdots \\ 0 & \cdots & \sqrt{\lambda_r} & 0 \\ 0 & \cdots & 0 & O \end{bmatrix} = U\Sigma, $$ $$ A = U\Sigma V^T. $$

SVD 图像压缩

使用 SVD 进行图像压缩。对彩色 RGB 图像每个通道的矩阵进行奇异值分解，提取主特征近似原图，得到近似图像。近似图保留了原图的主要特征，能够有效降低噪声，压缩图像大小。

ENV 环境配置

如下命令在终端中执行(以 Ubuntu 18.04 为例)：

# install miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
/bin/bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda
conda init
source ~/.bashrc

# install conda env maths-ai
conda create -n maths-ai python=3.11 -y
conda activate maths-ai

# install common packages
pip install ipykernel ipywidgets
python -m ipykernel install --user --name maths-ai --display-name maths-ai
pip install opencv-python pillow numpy matplotlib scipy sympy
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118

安装完成后，打开 jupyter 继续执行如下命令

加载包

import os

import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

计算特征

1
2
3

img_path = (
    "/disk0/documents/personal/报告/2024/20240420_maths-for-ai/archive/SVD/sd.jpeg"
)

1
2
3

# 查看当前图像大小，单位为字节（B）
original_size = os.stat(img_path).st_size
original_size

1 2	# 读取图像，分别获取每一个通道 img = Image.open(img_path)

1	r, g, b = img.split()

1	r.size, g.size, b.size

((960, 1440), (960, 1440), (960, 1440))

1
2
3

r_arr = np.asarray(r)
g_arr = np.asarray(g)
b_arr = np.asarray(b)

fig, ax = plt.subplots(1, 3)
ax[0].imshow(r_arr, cmap="gray")
ax[0].set_title("red channel")
ax[1].imshow(g_arr, cmap="gray")
ax[1].set_title("green channel")
ax[2].imshow(b_arr, cmap="gray")
ax[2].set_title("blue channel")

ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[2].xaxis.set_major_locator(plt.NullLocator())
ax[2].yaxis.set_major_locator(plt.NullLocator())
plt.show()

png

1 2	# 分别对 R,G,B 三通道灰度图进行奇异值分解 r_u, r_sigma, r_vt = np.linalg.svd(r_arr)

1	len(r_sigma)

1 2	# 通过下面验证，奇异值是从大到小已经排序好的 np.sum(np.sort(r_sigma)[::-1] == r_sigma) == len(r_sigma)

True

# 选取前10个奇异值
num1 = 10
r_svd_1 = np.zeros_like(r_arr)
for i in range(num1):
    # u 矩阵的第i列，vt 矩阵的第i行
    svd_img = r_sigma[i] * (r_u[:, i].reshape(-1, 1) * r_vt[i, :].reshape(1, -1))
    r_svd_1 += svd_img.astype(np.uint8)

fig, ax = plt.subplots(1, 2)
ax[0].imshow(r_arr, cmap="gray")
ax[1].imshow(r_svd_1, cmap="gray")
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title(f"red, original pic")
ax[1].set_title(f"red, top {num1} SVD")
plt.show()

png

# 选取前100个奇异值
num2 = 100
r_svd_2 = np.zeros_like(r_arr)
for i in range(num2):
    # u 矩阵的第i列，vt 矩阵的第i行
    svd_img = r_sigma[i] * (r_u[:, i].reshape(-1, 1) * r_vt[i, :].reshape(1, -1))
    r_svd_2 += svd_img.astype(np.uint8)

fig, ax = plt.subplots(1, 2)
ax[0].imshow(r_arr, cmap="gray")
ax[1].imshow(r_svd_2, cmap="gray")
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title(f"red, original pic")
ax[1].set_title(f"red, top {num2} SVD")
plt.show()

png

# 选取前500个奇异值
num3 = 500
r_svd_3 = np.zeros_like(r_arr)
for i in range(num3):
    # u 矩阵的第i列，vt 矩阵的第i行
    svd_img = r_sigma[i] * (r_u[:, i].reshape(-1, 1) * r_vt[i, :].reshape(1, -1))
    r_svd_3 += svd_img.astype(np.uint8)

fig, ax = plt.subplots(1, 2)
ax[0].imshow(r_arr, cmap="gray")
ax[1].imshow(r_svd_3, cmap="gray")
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title(f"red, original pic")
ax[1].set_title(f"red, top {num3} SVD")
plt.show()

png

1 2	# 分别对 R,G,B 三通道灰度图进行奇异值分解 g_u, g_sigma, g_vt = np.linalg.svd(g_arr)

1 2	# 通过下面验证，奇异值是从大到小已经排序好的 np.sum(np.sort(g_sigma)[::-1] == g_sigma) == len(g_sigma)

True

# 选取前10个奇异值
num1 = 10
g_svd_1 = np.zeros_like(g_arr)
for i in range(num1):
    # u 矩阵的第i列，vt 矩阵的第i行
    svd_img = g_sigma[i] * (g_u[:, i].reshape(-1, 1) * g_vt[i, :].reshape(1, -1))
    g_svd_1 += svd_img.astype(np.uint8)

fig, ax = plt.subplots(1, 2)
ax[0].imshow(g_arr, cmap="gray")
ax[1].imshow(g_svd_1, cmap="gray")
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title(f"green, original pic")
ax[1].set_title(f"green, top {num1} SVD")
plt.show()

png

# 选取前100个奇异值
num2 = 100
g_svd_2 = np.zeros_like(g_arr)
for i in range(num2):
    # u 矩阵的第i列，vt 矩阵的第i行
    svd_img = g_sigma[i] * (g_u[:, i].reshape(-1, 1) * g_vt[i, :].reshape(1, -1))
    g_svd_2 += svd_img.astype(np.uint8)

fig, ax = plt.subplots(1, 2)
ax[0].imshow(g_arr, cmap="gray")
ax[1].imshow(g_svd_2, cmap="gray")
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title(f"green, original pic")
ax[1].set_title(f"green, top {num2} SVD")
plt.show()

png

# 选取前500个奇异值
num3 = 500
g_svd_3 = np.zeros_like(g_arr)
for i in range(num3):
    # u 矩阵的第i列，vt 矩阵的第i行
    svd_img = g_sigma[i] * (g_u[:, i].reshape(-1, 1) * g_vt[i, :].reshape(1, -1))
    g_svd_3 += svd_img.astype(np.uint8)

fig, ax = plt.subplots(1, 2)
ax[0].imshow(g_arr, cmap="gray")
ax[1].imshow(g_svd_3, cmap="gray")
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title(f"green, original pic")
ax[1].set_title(f"green, top {num3} SVD")
plt.show()

png

1 2	# 分别对 R,G,B 三通道灰度图进行奇异值分解 b_u, b_sigma, b_vt = np.linalg.svd(b_arr)

1 2	# 通过下面验证，奇异值是从大到小已经排序好的 np.sum(np.sort(b_sigma)[::-1] == b_sigma) == len(b_sigma)

True

# 选取前10个奇异值
num1 = 10
b_svd_1 = np.zeros_like(b_arr)
for i in range(num1):
    # u 矩阵的第i列，vt 矩阵的第i行
    svd_img = b_sigma[i] * (b_u[:, i].reshape(-1, 1) * b_vt[i, :].reshape(1, -1))
    b_svd_1 += svd_img.astype(np.uint8)

fig, ax = plt.subplots(1, 2)
ax[0].imshow(b_arr, cmap="gray")
ax[1].imshow(b_svd_1, cmap="gray")
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title(f"blue, original pic")
ax[1].set_title(f"blue, top {num1} SVD")
plt.show()

png

# 选取前100个奇异值
num2 = 100
b_svd_2 = np.zeros_like(b_arr)
for i in range(num2):
    # u 矩阵的第i列，vt 矩阵的第i行
    svd_img = b_sigma[i] * (b_u[:, i].reshape(-1, 1) * b_vt[i, :].reshape(1, -1))
    b_svd_2 += svd_img.astype(np.uint8)

fig, ax = plt.subplots(1, 2)
ax[0].imshow(b_arr, cmap="gray")
ax[1].imshow(b_svd_2, cmap="gray")
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title(f"blue, original pic")
ax[1].set_title(f"blue, top {num2} SVD")
plt.show()

png

# 选取前500个奇异值
num3 = 500
b_svd_3 = np.zeros_like(b_arr)
for i in range(num3):
    # u 矩阵的第i列，vt 矩阵的第i行
    svd_img = b_sigma[i] * (b_u[:, i].reshape(-1, 1) * b_vt[i, :].reshape(1, -1))
    b_svd_3 += svd_img.astype(np.uint8)

fig, ax = plt.subplots(1, 2)
ax[0].imshow(b_arr, cmap="gray")
ax[1].imshow(b_svd_3, cmap="gray")
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title(f"blue, original pic")
ax[1].set_title(f"blue, top {num3} SVD")
plt.show()

png

合并图像

# 合并 10 个奇异值的图像
r_svd_1_pil = Image.fromarray(r_svd_1).convert("L")
g_svd_1_pil = Image.fromarray(g_svd_1).convert("L")
b_svd_1_pil = Image.fromarray(b_svd_1).convert("L")
svd_1 = Image.merge("RGB", [r_svd_1_pil, g_svd_1_pil, b_svd_1_pil])

svd_1_path = (
    "/disk0/documents/personal/报告/2024/20240420_maths-for-ai/archive/SVD/svd_1.jpg"
)
svd_1.save(svd_1_path)

1 2	svd_1_size = os.stat(svd_1_path).st_size svd_1_size, original_size

(77855, 248908)

1	original_size - svd_1_size

可见，图像大小变小了很多，减小了大约 17KB

# 看下图像质量
fig, ax = plt.subplots(1, 2)
ax[0].imshow(np.asarray(img))
ax[1].imshow(np.asarray(svd_1))
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title("Originial Pic")
ax[1].set_title(f"SVD {num1}")
plt.show()

png

图像质量相对原图较为模糊

# 合并 100 个奇异值的图像
r_svd_2_pil = Image.fromarray(r_svd_2).convert("L")
g_svd_2_pil = Image.fromarray(g_svd_2).convert("L")
b_svd_2_pil = Image.fromarray(b_svd_2).convert("L")
svd_2 = Image.merge("RGB", [r_svd_2_pil, g_svd_2_pil, b_svd_2_pil])

svd_2_path = (
    "/disk0/documents/personal/报告/2024/20240420_maths-for-ai/archive/SVD/svd_2.jpg"
)
svd_2.save(svd_2_path)

1 2	svd_2_size = os.stat(svd_2_path).st_size svd_2_size, original_size

(114734, 248908)

1	original_size - svd_2_size

图像大小减少了接近 13.4KB

# 看下图像质量
fig, ax = plt.subplots(1, 2)
ax[0].imshow(np.asarray(img))
ax[1].imshow(np.asarray(svd_2))
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title("Originial Pic")
ax[1].set_title(f"SVD {num2}")
plt.show()

png

图像质量已经和原图差别不大

# 合并 500 个奇异值的图像
r_svd_3_pil = Image.fromarray(r_svd_3).convert("L")
g_svd_3_pil = Image.fromarray(g_svd_3).convert("L")
b_svd_3_pil = Image.fromarray(b_svd_3).convert("L")
svd_3 = Image.merge("RGB", [r_svd_3_pil, g_svd_3_pil, b_svd_3_pil])

svd_3_path = (
    "/disk0/documents/personal/报告/2024/20240420_maths-for-ai/archive/SVD/svd_3.jpg"
)
svd_3.save(svd_3_path)

1 2	svd_3_size = os.stat(svd_3_path).st_size svd_3_size, original_size

(117333, 248908)

1	original_size - svd_3_size

图像大小减少了接近 13.1KB

# 看下图像质量
fig, ax = plt.subplots(1, 2)
ax[0].imshow(np.asarray(img))
ax[1].imshow(np.asarray(svd_3))
ax[0].xaxis.set_major_locator(plt.NullLocator())
ax[0].yaxis.set_major_locator(plt.NullLocator())
ax[1].xaxis.set_major_locator(plt.NullLocator())
ax[1].yaxis.set_major_locator(plt.NullLocator())
ax[0].set_title("Originial Pic")
ax[1].set_title(f"SVD {num3}")
plt.show()

png

图像质量已经和原图差别更小了

SVD 推荐系统

假设 $A_{m\times n}$ 为一个数字矩阵，每一列代表一个产品或电影，每一行代表一个用户对不同列（产品或电影）的评分。那么当某一行 $k$ 的用户对某一列 $j$ 未打分，我们想要通过这个矩阵找到与用户 $k$ 最接近的用户对电影 $j$ 的打分情况，来绝对是否为用户 $k$ 推荐电影 $j$。

解决思路：将矩阵 $A$ 进行奇异值分解，得到

$$ A_{m\times n} = U_{m\times m}\Sigma_{m \times n} V^T_{n \times n} = (u_1, u_2, \cdots, u_m) \begin{bmatrix} \lambda_i & \cdots & 0 & 0 \\ \vdots & \ddots & \vdots & \vdots \\ 0 & \cdots & \lambda_r & 0 \\ 0 & \cdots & 0 & O \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} $$

假设我们选择主奇异值 $\lambda_1, \lambda_2$（这里为了简单，只选择2个，实际使用时根据情况选择）构成主奇异值矩阵

$ \Sigma^2 = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} $

，那么矩阵

$$ \hat{U} = (u_1, u_2) = \begin{bmatrix} u^1_1 & u^2_1 \\ u^1_2 & u^2_2 \\ \vdots & \vdots \\ u^1_m & u^2_m \end{bmatrix} $$

的每一行可以看做每一个用户的特征。而矩阵

$$ \hat{V} = \begin{bmatrix} v^1 \\ v^2 \end{bmatrix}= \begin{bmatrix} v^1_1 & v^1_2 & \cdots & v^1_n \\ v^2_1 & v^2_2 & \cdots & v^2_n \end{bmatrix} $$

的每一列可以看做产品或电影的特征。

推荐时，只需要计算 $\hat{U}$ 中第 $k$ 行的向量 $(u^1_k, u^2_k)$ 与其他行向量最相似（相似度可以用余弦、欧氏距离等）的用户，根据这个用户的打分来决定是否推荐给用户 $k$ 电影 $j$。

对于新用户（未在训练数据集内的用户），目前打分向量假设为 $a = (s_1, s_2, \cdots, s_n)$，那么只需要计算

$$ u = s\cdot \hat{V} \cdot \Sigma^2 $$

得到其在 $\hat{U}$ 空间中的投影特征，然后在分别于 $\hat{U}$ 中的每一行向量计算相似度，找到最相似的用户，根据这个用户的偏好对其推荐。

使用上面公式，是因为

$$ A_{m\times n} = U_{m\times m}\Sigma_{m \times n} V^T_{n \times n} \\\\ \\\\ \begin{bmatrix} A_1 \\ A_2 \\ \vdots \\ A_m \end{bmatrix} = \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_m \end{bmatrix} \Sigma_{m\times n} V^T_{n \times n} \\\\ \\\\ A_i = u_i \Sigma_{m\times n} V^T_{n\times n} \\\\ \\\\ u_i = A_i \cdot (V^T)^{-1}\Sigma^{-1} = A_i V\Sigma^{-1} $$