跳转至

Notes/Tiankai Ma

Derivative

tiankaima/md-notes

Notes/Tiankai Ma

tiankaima/md-notes

Welcome
Blog
Math
Machine Learning
Machine Learning
- Traditional Methods
  Traditional Methods
  - Linear Regression
  - Logistic Regression
- Deep Learning
  Deep Learning
  - Review
  - Neural Network
    
    Neural Network
    
    Backward Propagation
    
    Playground / MNIST
    
    Miscellaneous
    Miscellaneous
    
    Derivative Derivative
    目录
    
    激活函数 Activation Function
    
    损失函数 Loss Function
    
    Graph
  - CNN
  - NeRF
  - Gaussian Splatting
Lab

Derivative 导数¶

本文讨论 Neural Network 中与导数相关的内容。

激活函数 Activation Function¶

我们先给出几种激活函数的导数：

Sigmoid & Tanh

Sigmod (1)
\[ f(x) = \frac{1}{1 + e^{-x}} \]

\[ f'(x) = \frac{e^{-x}}{(1 + e^{-x})^2} = f(x)(1 - f(x)) \]
Tanh (1)
\[ f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} \]

\[ f'(x) = \frac{4}{(e^x + e^{-x})^2} = 1 - f(x)^2 \]

ReLU & Leaky ReLU

ReLU (1)
\[ f(x) = \max(0, x) \]

\[ f'(x) = \begin{cases} 0 & \text{if } x < 0 \\ 1 & \text{if } x > 0 \\ \text{undefined} & \text{if } x = 0 \end{cases} \]
Leaky ReLU (1)
\[ f(x) = \begin{cases} x & \text{if } x > 0 \\ 0.01x & \text{if } x \leq 0 \end{cases} \]

\[ f'(x) = \begin{cases} 1 & \text{if } x > 0 \\ 0.01 & \text{if } x < 0 \\ \text{undefined} & \text{if } x = 0 \end{cases} \]

Softmax: \(\mathbb{R}^n \rightarrow \mathbb{R}^n\)

\[ \mathbf{f}_i(\mathbf{x}) = \frac{e^{x_i}}{\sum_{j=1}^{n} e^{x_j}} \]

\[ \nabla_{x_j} \mathbf{f}_i(\mathbf{x}) = \mathbf{f}_i(\mathbf{x})(\delta_{i j} - \mathbf{f}_j(\mathbf{x})) \]

Matrix format

Consider \(\mathbf{f}(\mathbf{x}) = \{f_1(\mathbf{x}), f_2(\mathbf{x}), \ldots, f_n(\mathbf{x})\}\), then

\[ \begin{aligned} \left(\frac {\partial \mathbf{f} }{\partial \mathbf{x}}\right)_{ij} &= \mathbf{f}_i(\mathbf{x})(\delta_{ij} - \mathbf{f}_j(\mathbf{x})) \\ &=\mathbf{f}_i(\mathbf{x})\delta_{ij} - \mathbf{f}_i(\mathbf{x})\mathbf{f}_j(\mathbf{x}) \\ \end{aligned} \]

Then we can write it in matrix form: $$ \frac {\partial \mathbf{f} }{\partial \mathbf{x}} = \mathbf{f}\cdot\mathbf{I} - \mathbf{f}^T\cdot \mathbf{f} $$

\[ \mathbf{J}(\mathbf{f}) = \begin{pmatrix} f_1(1 - f_1) & -f_1 f_2 & -f_1 f_3 & \cdots & -f_1 f_n \\ -f_2 f_1 & f_2(1 - f_2) & -f_2 f_3 & \cdots & -f_2 f_n \\ -f_3 f_1 & -f_3 f_2 & f_3(1 - f_3) & \cdots & -f_3 f_n \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ -f_n f_1 & -f_n f_2 & -f_n f_3 & \cdots & f_n(1 - f_n) \end{pmatrix} \]

损失函数 Loss Function¶

Mean Squared Error (MSE): \(\mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}\)

\[ L(\mathbf{y}, \mathbf{\hat{y}}) = \frac{1}{n} \left\lVert \mathbf{y} - \mathbf{\hat{y}} \right\rVert^2 \]

\[ \frac{\partial L}{\partial \hat{y}_i} = \frac 1n \left(\hat{y}_i - y_i\right) \]

Then:

\[ \frac{\partial L}{\partial \mathbf{\hat{y}}} = \frac 1n \left(\mathbf{\hat{y}} - \mathbf{y}\right) \]
Cross Entropy: \(\mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}\)

\[ L(\mathbf{y}, \mathbf{\hat{y}}) = -\frac{1}{n} \sum_{i=1}^{n} y_i \log(\hat{y}_i) \]

\[ \frac{\partial L}{\partial \hat{y}_i} = -\frac{y_i}{\hat{y}_i} \]

Then:

\[ \frac{\partial L}{\partial \mathbf{\hat{y}}} = -\frac{\mathbf{y}}{\mathbf{\hat{y}}} \]

哪怕在 \(\mathbb{R}^3\) 上，向量的叉乘也不成立群，更不要提逆元。上面的除法只是一种形式上的写法，不能理解为向量的除法。