虽然吴恩达机器学习课整个课堂都是在推公式、时间很紧张、有难度，但是这个Coursera Deep Learning(Ng, Katanforoosh, and Mourri 2018)的课，真的是通俗易懂，三张图，就大概明白最基本的RNNs原理，再在基础上深化。

normalization，图示

library(tidyverse)

## Warning: 程辑包'tidyverse'是用R版本3.6.3 来建造的

## -- Attaching packages --------------------------------------------------------------------------- tidyverse 1.3.0 --

## √ ggplot2 3.3.2     √ purrr   0.3.4
## √ tibble  3.0.3     √ dplyr   1.0.2
## √ tidyr   1.1.2     √ stringr 1.4.0
## √ readr   1.3.1     √ forcats 0.5.0

## Warning: 程辑包'ggplot2'是用R版本3.6.3 来建造的

## Warning: 程辑包'tibble'是用R版本3.6.3 来建造的

## Warning: 程辑包'tidyr'是用R版本3.6.3 来建造的

## Warning: 程辑包'readr'是用R版本3.6.3 来建造的

## Warning: 程辑包'purrr'是用R版本3.6.3 来建造的

## Warning: 程辑包'dplyr'是用R版本3.6.3 来建造的

## Warning: 程辑包'stringr'是用R版本3.6.3 来建造的

## Warning: 程辑包'forcats'是用R版本3.6.3 来建造的

## -- Conflicts ------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

data_frame(x = rnorm(100,mean=2,sd=5),
           y = 1:100) %>% 
  mutate(x_bar = x - mean(x),
         x_under_sigma = x_bar/sd(x)) %>% 
  gather(key,value,-y) %>% 
  ggplot(aes(x = y, y = value)) +
  geom_point() +
  facet_wrap(~key, scales = "free_y")

## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

basic RNNs

这里解释了CNNs模型的原理，横轴表示时间推进，每次时间推进，产生一个参数\(a^i\)， \(a^i\)的产生来自于\(a^i \sim a^{i-1} + x^{i}\)，同时\(y^i \sim a^{i-1} + x^{i}\)。因此图中产生三种变量\(y,x,a\)，分别对应参数，\(w_y,w_x,w_a\)。

这个图更清晰的描述了关系。因此CNNs，考虑时间的递进关系，因此可以做时间序列的分析。

这里主要涉及矩阵相乘的内容，简单理解是，假设\(x\)变量为10000个，时间是100，那么产生了100个\(a\)，那么一共有变量10000+100=10100个，同时\(N(w_x)+N(w_a) = 10000+100=10100\)。

vanishing gradients

因为当backpropogation时，越久远的\(w\)会变小，结果消失，导致我们不能够抓住long range dependency。当然也可以使用exploding gradients，但是越久远的\(w\)会变大，最好设置一个clipping，也就是最大值。

GRU

gated recurrent gru

Cho et al. (2014),Chung et al. (2014),Ng, Katanforoosh, and Mourri (2018) 增加了一个\(\Gamma\)来控制短期和长期的效应，

\[\Gamma = \begin{cases} 1 &\text{ long term;}\\ 1 \times 10^{-16} &\text{ short term}\\ \end{cases}\]

LTSM

GRU是LTSM的简化版本，Hochreiter and Schmidhuber (1997) LTSM增加了一个\(\Gamma_f\)forget gate而已，大同小异。

deep RNNs

__deep__的理解就是多加几层。

参考文献

Cho, Kyunghyun, B van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. “On the Properties of Neural Machine Translation: Encoder-Decoder Approaches.” In Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (Ssst-8), 2014.

Chung, Junyoung, Caglar Gulcehre, Kyung Hyun Cho, and Yoshua Bengio. 2014. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” Eprint Arxiv.

Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–80.

Ng, Andrew, Kian Katanforoosh, and Younes Bensouda Mourri. 2018. “Https://Www.coursera.org/Learn/Neural-Networks-Deep-Learning.” deeplearning.ai.

CNNs 理解