13 min read

Deep Learning in Python 学习笔记

新增

  • Activation functions的图像、公式

Deep Learning in Python

网站在这, Dan Becker | DataCamp 这个哥们教的。

就是神经网络啊

interaction 就是神经网络的主旨。

哈哈哈哈哈哈哈哈, 这里会有两节讲概念的,开心。

keras就是这个死板的包???搞起来。行动起来。

  • input layer
  • output layer: number of transaction, 好粗暴。
  • hidden layer

能力去抓取interaction。

Forward propagation | Python

终于知道了神经网络的作用机理。

import numpy as np引入。

input_data = np.array([2, 3])输入层。

weights = {'node_0': np.array([1, 1]),
   ...:            'node_1': np.array([-1, 1]),
   ...:            'output': np.array([2, -1])}
In [4]: node_0_value = (input_data * weights['node_0']).sum()
In [5]: node_1_value = (input_data * weights['node_1']).sum()

隐藏层

hidden_layer_values = np.array([node_0_value, node_1_value])

In [7]: print(hidden_layer_values)
[5, 1]
In [8]: output = (hidden_layer_values * weights['output']).sum()
In [9]: print(output)
9

输出层。

Coding the forward propagation algorithm | Python

这一波,都是在用numpy的矩阵计算。

# Calculate node 0 value: node_0_value
node_0_value = (input_data * weight['node_0']).sum()

\[input \underline \space data_{1 \times 2 } \times weight(node \underline \space 0)_{2 \times 1} = value_{1,1}\]

\(input \underline \space data_{1 \times 2 }\)一个(行)用户,两个特征。 \(weight(node \underline \space 0)_{2 \times 1}\)两个特征,每个特征有一个权重,权重形成一排。 因此计算出来的\(value_{1,1}\)维度如此。

# Calculate node 0 value: node_0_value
node_0_value = (input_data * weights['node_0']).sum()

# Calculate node 1 value: node_1_value
node_1_value = (input_data * weights['node_1']).sum()

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_value, node_1_value])

# Calculate output: output
output = (hidden_layer_outputs * weights['output']).sum()

# Print output
print(output)

Activation functions | Python

Rizwan (2018) 给出了三种比较常用的情况。

所以之前搞的是tanh函数喽? np.tanh函数。

ReLU = rectified1 linear activation

\[relu(3) = 3\]

\[relu(-3) = 0\]

\[RELU(x) = \begin{cases} 0, & \mbox{if }n\mbox{ < 0} \\ x, & \mbox{if }n\mbox{ >= 0} \end{cases}\]

\(\Box\)不懂,最后print出来的值,为什么带小数点?

The Rectified Linear Activation Function | Python

def relu(input):
    '''Define your relu activation function here'''
    # Calculate the value for the output of the relu function: output
    output = max(input, 0)
    
    # Return the value just calculated
    return(output)

# Calculate node 0 value: node_0_output
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = relu(node_0_input)

# Calculate node 1 value: node_1_output
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = relu(node_1_input)

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])

# Calculate model output (do not apply relu)
model_output = (hidden_layer_outputs * weights['output']).sum()

# Print model output
print(model_output)

def relu(input):自己定义下函数, 然后就是在input到hidden layer时,加一个函数而已,其他不变。

Great work! You predicted 52 transactions. Without this activation function, you would have predicted a negative number! The real power of activation functions will come soon when you start tuning model weights.

感觉这种方法还是有点粗暴,等等看吧,说后面会厉害起来。

Applying the network to many observations/rows of data | Python

In [1]: weights
Out[1]: {'node_0': array([2, 4]), 'node_1': array([ 4, -5]), 'output': array([2, 7])}

心里有数。字典格式。

# Define predict_with_network()
def predict_with_network(input_data_row, weights):

    # Calculate node 0 value
    node_0_input = (input_data_row * weights['node_0']).sum()
    node_0_output = relu(node_0_input)

    # Calculate node 1 value
    node_1_input = (input_data_row * weights['node_1']).sum()
    node_1_output = relu(node_1_input)

    # Put node values into array: hidden_layer_outputs
    hidden_layer_outputs = np.array([node_0_output, node_1_output])
    
    # Calculate model output
    input_to_final_layer = (hidden_layer_outputs * weights['output']).sum()
    model_output = relu(input_to_final_layer)
    
    # Return model output
    return(model_output)


# Create empty list to store prediction results
results = []
for input_data_row in input_data:
    # Append prediction to results
    results.append(predict_with_network(input_data_row, weights))

# Print results
print(results)

Deeper networks | Python

多搞几个hidden layer。`

虽然说一直在搞interaction,但是实际上不会去specify,真机智,而是去找最重要的模式。

Multi-layer neural networks | Python

def predict_with_network(input_data):
    # Calculate node 0 in the first hidden layer
    node_0_0_input = (input_data * weights['node_0_0']).sum()
    node_0_0_output = relu(node_0_0_input)

    # Calculate node 1 in the first hidden layer
    node_0_1_input = (input_data * weights['node_0_1']).sum()
    node_0_1_output = relu(node_0_1_input)

    # Put node values into array: hidden_0_outputs
    hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])
    
    # Calculate node 0 in the second hidden layer
    node_1_0_input = (hidden_0_outputs * weights['node_1_0']).sum()
    node_1_0_output = relu(node_1_0_input)

    # Calculate node 1 in the second hidden layer
    node_1_1_input = (hidden_0_outputs * weights['node_1_1']).sum()
    node_1_1_output = relu(node_1_1_input)

    # Put node values into array: hidden_1_outputs
    hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])

    # Calculate model output: model_output
    model_output = (hidden_1_outputs * weights['output']).sum()
    
    # Return model_output
    return(model_output)

output = predict_with_network(input_data)
print(output)

Do not apply the relu() function to the final output.

The need for optimization | Python

神经网络中, 确定最后一个hidden layer的weight,其实也是用Loss function

\[L = L(weight1, weight2)\]

也是用梯度下降完成。

Gradient descent。

感受三维空间中的坡度,当四面八方都是向上的坡度是,那么这个可能就是个极小值了。 查询偏导,不为0,或者接近于0,那么就可以再搞搞。

Coding how weight changes affect accuracy | Python

# The data point you will make a prediction for
input_data = np.array([0, 3])

# Sample weights
weights_0 = {'node_0': [2, 1],
             'node_1': [1, 2],
             'output': [1, 1]
            }

# The actual target value, used to calculate the error
target_actual = 3

# Make prediction using original weights
model_output_0 = predict_with_network(input_data, weights_0)

# Calculate error: error_0
error_0 = model_output_0 - target_actual

# Create weights that cause the network to make perfect prediction (3): weights_1
weights_1 = {'node_0': [2, 1],
             'node_1': [1, 2],
             'output': [1, 0]
            }
# Make prediction using new weights: model_output_1
model_output_1 = predict_with_network(input_data, weights_1)

# Calculate error: error_1
error_1 = model_output_1 - target_actual

# Print error_0 and error_1
print(error_0)
print(error_1)

Scaling up to multiple data points | Python

from sklearn.metrics import mean_squared_error

# Create model_output_0 
model_output_0 = []
# Create model_output_0
model_output_1 = []

# Loop over input_data
for row in input_data:
    # Append prediction to model_output_0
    model_output_0.append(predict_with_network(row,weights_0))
    
    # Append prediction to model_output_1
    model_output_1.append(predict_with_network(row,weights_1))

# Calculate the mean squared error for model_output_0: mse_0
mse_0 = mean_squared_error(target_actuals, model_output_0)

# Calculate the mean squared error for model_output_1: mse_1
mse_1 = mean_squared_error(target_actuals, model_output_1)

# Print mse_0 and mse_1
print("Mean squared error with weights_0: %f" %mse_0)
print("Mean squared error with weights_1: %f" %mse_1)
<script.py> output:
    Mean squared error with weights_0: 294.000000
    Mean squared error with weights_1: 395.062500

Gradient descent | Python

书签,回家继续看。

但是好好弄Python了。

因此梯度下降,现在开始弄了,我估计是个函数或者包。

If the slope is positive:

  • Going opposite the slope means moving to lower numbers
  • Subtract the slope from the current value
  • Too big a step might lead us astray2

Solution: learning rate

  • Update each weight by subtracting \(learning \space rate \times slope\)

slope of loss function 是个啥东西?

Calculating slopes | Python

我实在不知道,为什么可以这样瞎搞?

坡度就是

\[2 X(Y - X\beta)\]

为什么,虽然我知道大概是这个样子! 这里\(X\)\(\beta\)都是向量。

# Calculate the predictions: preds
preds = (weights * input_data).sum()

# Calculate the error: error
error = preds - target

# Calculate the slope: slope
slope = 2 * input_data * error

# Print the slope
print(slope)
In [2]: print(slope)
[14 28 42]

三个斜率。

Improving model weights | Python

If you add the slopes to your weights, you will move in the right direction. However, it’s possible to move too far in that direction. So you will want to take a small step in that direction first, using a lower learning rate, and verify that the model is improving.

In [1]: weights
Out[1]: array([0, 2, 1])

In [2]: (weights * input_data)
Out[2]: array([0, 4, 3])

In [3]: (weights * input_data).sum()
Out[3]: 7

相乘关系,理解成功。

这次假设的很简单,直接是input_data过渡weights完成。

# Set the learning rate: learning_rate
learning_rate = 0.01

# Calculate the predictions: preds
preds = (weights * input_data).sum()

# Calculate the error: error
error = preds - target

# Calculate the slope: slope
slope = 2 * input_data * error

# Update the weights: weights_updated
weights_updated = weights - learning_rate * slope

# Get updated predictions: preds_updated
preds_updated = (weights_updated * input_data).sum()

# Calculate updated error: error_updated
error_updated = preds_updated - target

# Print the original error
print(error)

# Print the updated error
print(error_updated)
<script.py> output:
    7
    5.04

所以增加了learning rate 很小的,就可以完成了!

weights - learning_rate * slope所以slope\(> 0\)就减,否则加?

Making multiple updates to weights | Python

n_updates = 20
mse_hist = []

# Iterate over the number of updates
for i in range(n_updates):
    # Calculate the slope: slope
    slope = get_slope(input_data, target, weights)
    
    # Update the weights: weights
    weights = weights - 0.01 * slope
    
    # Calculate mse with new weights: mse
    mse = get_mse(input_data, target, weights)
    
    # Append the mse to mse_hist
    mse_hist.append(mse)

# Plot the mse history
plt.plot(mse_hist)
plt.xlabel('Iterations')
plt.ylabel('Mean Squared Error')
plt.show()

这个地方迭代可能没看懂, weights = weights - 0.01 * slope每次for loop都会搞一次,机智不机智! get_slopeget_mse简单,自己都可以写的。

the mean squared error decreases as the number of iterations go up.

看来还没有迭代好,再搞搞。

Backpropagation | Python3

先用Forwardpropagation搞起来,搞出来error后,再来Backpropagation。

令人窒息的操作,没看懂。

Thinking about backward propagation | Python

If your predictions were all exactly right, and your errors were all exactly 0, the slope of the loss function with respect to your predictions would also be 0.

因为\(y = \hat y\)

Backpropagation in practice | Python

不是特别懂啊。

A round of backpropagation | Python

In the network shown below, we have done forward propagation, and node values calculated as part of forward propagation are shown in white.

的确,每个节点上的值都计算了。

感觉理论部分就这样完结了,好尴尬啊!所以必须要,好好学习下神经网络的理论部分,感觉没讲完啊。卧槽。蒙蔽啊。

Creating a keras model | Python

继续开始学习了起来。

这里是讲神经网络的。

我就喜欢步骤! 我就喜欢学习,学习使我快乐! 觉得自己变厉害。

data.shape[1]这个是数据的col数量,这个是input layer里面的变量数量。 Sequential很容易吗? Linear stack of layers. Dense是啥意思? model.add中,input_shape = (n_cols,)限定了多少个变量的感觉? model.add(Dense(1))所以这个是output layer。 model.add(Dense(100))这里搞一百个一千个都是很常见的!

Understanding your data | Python

.head().describe() 可以看看数据,搞搞EDA。

Specifying a model | Python

我倒要看看Keras是个啥。

To start, you’ll take the skeleton of a neural network and add a hidden layer and an output layer. You’ll then fit that model and see Keras do the optimization so your model continually gets better.

所以啊,现在就搞三层。

As a start, you’ll predict workers wages based on characteristics like their industry, education and level of experience.

所以啊,现在就搞搞一个监督学习,预测一下。真无聊,要快啊!

help还是本地方便,打开jupyter。

the input_shape parameter to be the tuple (n_cols,) which means it has n_cols items in each row of data, and any number of rows of data are acceptable as inputs.

input_shape = (n_cols,)限定了多少个变量的感觉? 这里定义了input layer。

# Import necessary modules
import keras
from keras.layers import Dense
from keras.models import Sequential

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]

# Set up the model: model
model = Sequential()

# Add the first layer
model.add(Dense(50, activation='relu', input_shape=(n_cols,)))

# Add the second layer
model.add(Dense(32, activation='relu'))

# Add the output layer
model.add(Dense(1))

那么Dense是啥? 真的没看懂。

you’ve specified the model, the next step is to compile it.

Compiling and fitting a model | Python

所以现在有两种方法去调参。 learning rate 和 Loss function。

fit model反正就是backpropagation和梯度下降啊。 update weight。

所以啊,神经网络也是归一化的。

Compiling the model | Python

'adam'是啥? 实际上我也没弄懂,等以后有空吧。

Optimizers - Keras Documentation

model.compile就可以开始编辑模型了。

# Import necessary modules
import keras
from keras.layers import Dense
from keras.models import Sequential

# Specify the model
n_cols = predictors.shape[1]
model = Sequential()
model.add(Dense(50, activation='relu', input_shape = (n_cols,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss = 'mean_squared_error')

# Verify that model contains information from compiling
print("Loss function: " + model.loss)
<script.py> output:
    Loss function: mean_squared_error

Fitting the model | Python

Recall that the data to be used as predictive features is loaded in a NumPy matrix called predictors and the data to be predicted is stored in a NumPy matrix called target.

自变量和因变量都在这,predictorstarget

# Import necessary modules
import keras
from keras.layers import Dense
from keras.models import Sequential

# Specify the model
n_cols = predictors.shape[1]
model = Sequential()
model.add(Dense(50, activation='relu', input_shape = (n_cols,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Fit the model
model.fit(predictors, target)

Classification models | Python

到二分类问题了,metrics=['accuracy'],我记得逻辑回归是极大似然函数啊!

所以啊,神经网络可以搞softmax activation函数啊,哈哈。

使用.dropas_matrix()来把数据表变成X变量表。

Last steps in classification models | Python

df.variable1是取变量的一种方法。 to_categorical变成分类变量。

model.compile(optimizer = 'sgd')就不是adam了。 loss = 'categorical_crossentropy'改了,不是平方误差和最小了。 metrics=['accuracy']这个是在结果里面看准确率。 model.fit(x, y)格式是这样写。

# Import necessary modules
import keras
from keras.layers import Dense
from keras.models import Sequential
from keras.utils import to_categorical

# Convert the target to categorical: target
target = to_categorical(df.survived)

# Set up the model
model = Sequential()

# Add the first layer
model.add(Dense(32, activation='relu', input_shape=(n_cols,)))

# Add the output layer
model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer = 'sgd', loss = 'categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model.fit(predictors, target)
891/891 [==============================] - 0s - loss: 3.4210 - acc: 0.6521

This simple model is generating an accuracy of 68!

Using models | Python

开始搞预测了。

\[save \to reload \to predict\]

机智!存成一个.h5文件。 可以用.summary来看看结果。

Making predictions | Python

感觉这种最简单的模型,不加入非线性方程的话,真的就是线性回归啊,好low比啊。

In [6]: predictions[:5,:]
Out[6]: 
array([[  9.99999881e-01,   1.28950361e-07],
       [  9.99793470e-01,   2.06520112e-04],
       [  9.48479218e-25,   1.00000000e+00],
       [  3.72184999e-02,   9.62781489e-01],
       [  9.99999285e-01,   6.83408246e-07]], dtype=float32)

所以predictions为什么取第二列是有原因的,相加等于1,因此也是随机森林的频数概念。

# Specify, compile, and fit the model
model = Sequential()
model.add(Dense(32, activation='relu', input_shape = (n_cols,)))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='sgd', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])
model.fit(predictors, target)

# Calculate predictions: predictions
predictions = model.predict(pred_data)

# Calculate predicted probability of survival: predicted_prob_true
predicted_prob_true = predictions[:,1]

# print predicted_prob_true
print(predicted_prob_true)

Understanding model optimization | Python

开始学会调参数了!!!

可以用for loop来选择更好的的learning rate啊。

dying neuron problem,要是搞到了负的input,那么就惨了。 因此这个时候可以用tanh函数,感觉和逻辑回归的转化函数很像。

Changing optimization parameters | Python

# Import the SGD optimizer
from keras.optimizers import SGD

# Create list of learning rates: lr_to_test
lr_to_test = [.000001, 0.01, 1]

# Loop over learning rates
for lr in lr_to_test:
    print('\n\nTesting model with learning rate: %f\n'%lr )
    
    # Build new model to test, unaffected by previous models
    model = get_new_model()
    
    # Create SGD optimizer with specified learning rate: my_optimizer
    my_optimizer = SGD(lr=lr)
    
    # Compile the model
# Import the SGD optimizer
from keras.optimizers import SGD

# Create list of learning rates: lr_to_test
lr_to_test = [.000001, 0.01, 1]

# Loop over learning rates
for lr in lr_to_test:
    print('\n\nTesting model with learning rate: %f\n'%lr )
    
    # Build new model to test, unaffected by previous models
    model = get_new_model()
    
    # Create SGD optimizer with specified learning rate: my_optimizer
    my_optimizer = SGD(lr=lr)
    
    # Compile the model
    model.compile(optimizer=my_optimizer, loss='categorical_crossentropy')
    
    # Fit the model
    model.fit(predictors, target)

但是也三个learning rate,搞飞机。

Model validation | Python

.fit里面加validation_split=0.3进行validation。 很简单,7/3的样本分类,然后看各自的loss和acc。 其中还是按照损失函数来进行迭代。

EarlyStopping没有精进就停止了,还有参数patience笑死我了。 主要迭代卡在极大(小)值,因此也许多几个就越过山丘了。 loss是去最小,那么如果最后两个没有倒数第三个小,那么倒数第三个就是我们想要的了。 写在.fit里面加callback

神经网络就是神经啊,没有方向的调参,真累。

Evaluating model accuracy on validation dataset | Python

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)

# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = input_shape))
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy'])

# Fit the model
hist = model.fit(predictors, target, validation_split=0.3)

Early stopping: Optimizing the optimization | Python

you can use early stopping to stop optimization when it isn’t helping any more.

you can also set a high value for epochs in your call to .fit(), as Dan showed in the video.

epochs = 30指的就是训练过程中数据将被“轮”多少次,就这样。 一些基本概念 - Keras中文文档

Wonderful work! Because optimization will automatically stop when it is no longer helpful, it is okay to specify the maximum number of epochs as 30 rather than using the default of 10 that you’ve used so far. Here, it seems like the optimization stopped after 7 epochs.

# Import EarlyStopping
from keras.callbacks import EarlyStopping

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)

# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = input_shape))
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy'])

# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)

# Fit the model
model.fit(predictors, target, epochs = 30, validation_split=0.3,callbacks=[early_stopping_monitor])
In [2]: help(EarlyStopping)
Help on class EarlyStopping in module keras.callbacks:

class EarlyStopping(Callback)
 |  Stop training when a monitored quantity has stopped improving.
 |  
 |  # Arguments
 |      monitor: quantity to be monitored.
 |      min_delta: minimum change in the monitored quantity
 |          to qualify as an improvement, i.e. an absolute
 |          change of less than min_delta, will count as no
 |          improvement.
 |      patience: number of epochs with no improvement
 |          after which training will be stopped.

回去搞清楚!

Experimenting with wider networks | Python

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_2 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 22        
=================================================================
Total params: 242.0
Trainable params: 242
Non-trainable params: 0.0

这里形容每个层只有10个节点。

# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)

# Create the new model: model_2
model_2 = Sequential()

# Add the first and second layers
model_2.add(Dense(100, activation='relu', input_shape=input_shape))
model_2.add(Dense(100, activation='relu'))

# Add the output layer
model_2.add(Dense(2,activation = 'softmax'))

# Compile model_2
model_2.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy'])

# Fit model_1
model_1_training = model_1.fit(predictors, target, epochs=15, validation_split=0.2, callbacks=[early_stopping_monitor], verbose=False)

# Fit model_2
model_2_training = model_2.fit(predictors, target, epochs=15, validation_split=0.2, callbacks=[early_stopping_monitor], verbose=False)

# Create the plot
plt.plot(model_1_training.history['val_loss'], 'r', model_2_training.history['val_loss'], 'b')
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()

Adding layers to a network | Python

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 50)                550       
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 102       
=================================================================
Total params: 652.0
Trainable params: 652
Non-trainable params: 0.0
_________________________________________________________________
None

deeper network!

# The input shape to use in the first hidden layer
input_shape = (n_cols,)

# Create the new model: model_2
model_2 = Sequential()

# Add the first, second, and third hidden layers
model_2.add(Dense(50,activation='relu',input_shape=input_shape))
model_2.add(Dense(50,activation='relu'))
model_2.add(Dense(50,activation='relu'))

# Add the output layer
model_2.add(Dense(2,activation='softmax'))

# Compile model_2
model_2.compile(optimizer = 'adam', loss='categorical_crossentropy',metrics=['accuracy'])

# Fit model 1
model_1_training = model_1.fit(predictors, target, epochs=20, validation_split=0.4, callbacks=[early_stopping_monitor], verbose=False)

# Fit model 2
model_2_training = model_2.fit(predictors, target, epochs=20, validation_split=0.4, callbacks=[early_stopping_monitor], verbose=False)

# Create the plot
plt.plot(model_1_training.history['val_loss'], 'r', model_2_training.history['val_loss'], 'b')
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()

我并没有觉得这个变好啊,逗我!

注意Validation score是好东西!

Thinking about model capacity | Python

# Create the model: model
model = Sequential()

# Add the first hidden layer
model.add(Dense(50,activation = 'relu', input_shape = (784,)))

# Add the second hidden layer
model.add(Dense(50,activation = 'relu'))

# Add the output layer
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Fit the model
model.fit(X, y, validation_split=0.3)

Congrats! You’ve done something pretty amazing. You should see better than 90% accuracy recognizing handwritten digits, even while using a small training set of only 1750 images!

Final thoughts | Python

去打Kaggle的比赛

去看keras.io的文档!

Statement of Accomplishment

证书

Rizwan, Muhammad. 2018. “How to Select Activation Function for Deep Neural Network.” 2018. https://engmrk.com/activation-function-for-dnn/.


  1. rectify 美音 /’rɛktɪfaɪ/ vt. 改正;[电]整流;[化]精馏

  2. astray 美音 /ə’stre/ adv. 迷途地;迷路;误入歧途地

  3. propagation 美音 /ˌprɑpə’geʃən/ n. 繁殖;增殖