# 线性回归¶

$h_\theta(x) = \theta_0 + \theta_1 x_1 + \cdots + \theta_d x_d$

$h_\theta(x) = \sum_{i=0}^d \theta_i x_i = \boldsymbol{\theta}^T \boldsymbol{x}$

$y^{(i)} = \theta^T x^{(i)} + \epsilon^{(i)}.$

$p\left( \epsilon^{(i)} \right) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left( - \frac{\left( \epsilon^{(i)} \right)^2}{2\sigma ^2} \right)$

• 假设有两个变量 $$X$$$$Y$$
• $$X$$$$Y$$ 同分布，当且仅当 $$P[x \ge X] = P[x\ge Y],\ \forall x \in \mathcal{I}$$
• $$X$$$$Y$$ 独立，当且仅当 $$(P[y \geq Y]=P[y \geq Y \mid x \geq X]) \wedge (P[x \geq X]=P[x \geq X \mid y \geq Y])\ \forall x, y \in \mathbb{I}$$

https://zh.wikipedia.org/wiki/独立同分布

$$\sigma=1$$ 时，该函数形态为

import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-10, 10, 100)
sigma = 1
y = np.exp(- x ** 2 / (2 * sigma ** 2)) / np.sqrt(2 * np.pi) * sigma
plt.plot(x, y)
plt.savefig("gaussian.svg")

![[gaussian.svg]]

$p\left(y^{(i)} \mid x^{(i)} ; \boldsymbol{\theta}\right)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(y^{(i)}-\boldsymbol{\theta}^T x^{(i)}\right)^2}{2 \sigma^2}\right),$

$L(\theta) = L(\theta; X, \vec{y}) = p(\vec{y} \mid X; \theta).$

\begin{aligned} L(\theta) & =\prod_{i=1}^n p\left(y^{(i)} \mid x^{(i)} ; \theta\right) \\ & =\prod_{i=1}^n \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(y^{(i)}-\theta^T x^{(i)}\right)^2}{2 \sigma^2}\right) . \end{aligned}

\begin{aligned} \ell(\theta) & =\log L(\theta) \\ & =\log \prod_{i=1}^n \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(y^{(i)}-\theta^T x^{(i)}\right)^2}{2 \sigma^2}\right) \\ & =\sum_{i=1}^n \log \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(y^{(i)}-\theta^T x^{(i)}\right)^2}{2 \sigma^2}\right) \\ & =\underbrace{n \log \frac{1}{\sqrt{2 \pi} \sigma}}_\text{常数}-\frac{1}{\sigma^2} \cdot \frac{1}{2} \sum_{i=1}^n\left(y^{(i)}-\theta^T x^{(i)}\right)^2 . \end{aligned}

$J(\theta) = \frac{1}{2} \sum_{i=1}^n\left( y^{(i)} - \theta^T x^{(i)} \right).$

\begin{aligned} \nabla_\theta J(\theta) & =\nabla_\theta \frac{1}{2}(X \theta-\vec{y})^T(X \theta-\vec{y}) \\ & =\frac{1}{2} \nabla_\theta\left((X \theta)^T X \theta-(X \theta)^T \vec{y}-\vec{y}^T(X \theta)+\vec{y}^T \vec{y}\right) \\ & =\frac{1}{2} \nabla_\theta\left(\theta^T\left(X^T X\right) \theta-\vec{y}^T(X \theta)-\vec{y}^T(X \theta)\right) \\ & =\frac{1}{2} \nabla_\theta\left(\theta^T\left(X^T X\right) \theta-2\left(X^T \vec{y}\right)^T \theta\right) \\ & =\frac{1}{2}\left(2 X^T X \theta-2 X^T \vec{y}\right) \\ & =X^T X \theta-X^T \vec{y} \end{aligned}

$\theta (X^T X)^{-1}X^T\vec{y}.$

### 正则化线性回归¶

$\hat{\theta}_\mathit{MAP} = \arg \underset{\theta}{\max}\ p(\theta | S),$

$p(\theta | S) \propto p(S|\theta)p(\theta),$

\begin{aligned} \hat{\theta}_\mathit{MAP} &= \arg \underset{\theta}{\max}\ p(S|\theta)p(\theta)\\ &= \arg \underset{\theta}{\max}\ \prod_{i=1}^Np(y^{(i)} | x^{(i)}, \theta)p(\theta). \end{aligned}

$\begin{gathered} \hat{\theta}_{M A P}=\arg \max _\theta Q(\theta)=\arg \max _\theta q(\theta)\\ Q(\theta) \equiv\left({\color{orange} \prod_{i=1}^N }\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(\frac{{ \color{orange} -\left(y^{(i)}-\theta^T x^{(i)}\right)^2}}{2 \sigma^2}\right)\right) \sqrt{\frac{\lambda}{2 \pi}} \frac{1}{\sigma} \exp \left(-\frac{{\color{blue} \lambda \theta^T \theta}}{2 \sigma^2}\right) \\ q(\theta)=\log (Q(\theta))=N \log \frac{1}{\sqrt{2 \pi} \sigma}+\frac{1}{2} \log \frac{\lambda}{2 \pi}-\log \sigma-\frac{1}{2\sigma^2} \left\{\color{orange} {\left[\sum_{i=1}^N\left(y^{(i)}-\theta^T x^{(i)}\right)^2\right]}+{\color{blue} \lambda \theta^T \theta} \right\} \\ \end{gathered}$

$J(\theta) \equiv \left[ \sum_{i=1}^N \left( y^{(i)} - \theta^T x^{(i)} \right)^2\right] + \lambda \theta^T\theta.$

$\hat{\theta}_\mathit{MAP} = \arg \underset{\theta}{\min} \left[ \sum_{i=1}^N \left( y^{(i)} - \theta^T x^{(i)} \right)^2\right] + \lambda \theta^T\theta.$

### 使用其他范数正则化的线性回归¶

• 对于其他先验知识，对于线性回归来说，代表着什么正则化项？
• 特定的正则化项，如 $$\ell_1$$ 正则化，如何通过概率论进行解释？

$f(\theta) = \frac{\exp\left( - \left| \frac{x-\mu}{\sigma} \right|\right)}{2\sigma},$

![[laplace.svg]]

\begin{aligned} \hat{\theta}_\mathit{MAP} &= \arg \underset{\theta}{\max}\ p(S|\theta)p(\theta)\\ &= \arg \underset{\theta}{\max}\ \prod_{i=1}^Np(y^{(i)} | x^{(i)}, \theta)p(\theta)\\ &=\arg \max _\theta Q(\theta) \\ &=\arg \max _\theta q(\theta) \\ \end{aligned}
$\begin{gathered} Q(\theta) \equiv\left({\color{orange} \prod_{i=1}^N }\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(\frac{{ \color{orange} -\left(y^{(i)}-\theta^T x^{(i)}\right)^2}}{2 \sigma^2}\right)\right) \frac{1}{2t} \exp \left(- \left|\frac{\theta}{t}\right|\right) \\ q(\theta)=\log (Q(\theta))= N \log \frac{1}{\sqrt{2 \pi} \sigma} -\log \sigma -\log\left( 2t \right) -\frac{1}{2\sigma^2} \left\{{\color{orange} \left[\sum_{i=1}^N\left(y^{(i)}-\theta^T x^{(i)}\right)^2\right]}+{\color{blue} \lambda |\theta|} \right\} \\ \end{gathered}$

1. 最讨厌人名命名的定理、常量和数据结构了。但大家都在用，也就跟着用吧。毕竟和人交流比自己开心重要。