## 一、概念介绍¶

1. 模型

• 一系列层的组合，用于输入到输出的映射（前向计算）

• 一些参数变量，在训练过程中实时更新

2. 层

• 一个或多个具体的算子，用于完成相应的计算

• 计算所需的变量，以临时变量或参数的形式作为层的成员存在

## 二、数据处理¶

### 2.1 加载 Mnist 数据集¶

```import paddle
import math

transform = Compose([Normalize(mean=[127.5],
std=[127.5],
data_format='CHW')])

```

### 2.2 对数据集进行预处理¶

```train_data0 = train_dataset[0]
print("x_data's shape is:", x_data.shape)
```
```x_data's shape is: [1, 784]
```

## 三、搭建一个完整的深度学习网络¶

### 3.1 参数初始化¶

```weight = paddle.randn([784, 10]) * (1/math.sqrt(784))
```

### 3.2 准备网络结构¶

```def log_softmax(x):
return x - x.exp().sum(-1).log().unsqueeze(-1)

def model(x):
```

### 3.3 前向计算¶

```batch_size = 64
train_batch_data_x = []
train_batch_data_y = []
for i in range(batch_size):
train_batch_data_x.append(train_dataset[i][0])
train_batch_data_y.append(train_dataset[i][1])

print("x_batch_data's shape is:", x_batch_data.shape)

y = model(x_batch_data)

print("y[0]: {} \ny.shape: {}".format(y[0], y.shape))
```
```x_data's shape is: [1, 784]
x_batch_data's shape is: [64, 784]
[-1.20662355, -4.20237827, -2.47686505, -0.78191900, -5.13888979,
-3.07260418, -2.94610834, -4.91643810, -3.71131158, -4.85082626])
y.shape: [64, 10]
```

### 3.4 反向传播¶

```loss_func = paddle.nn.functional.nll_loss

print("y_batch_data's shape is:", y_batch_data.shape)
y_standard = y_batch_data[0:batch_size]
loss = loss_func(y, y_standard)
print("loss: ", loss)
```
```loss:  Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=False,
[2.85819387])
```

### 3.5 计算 ACC 观察模型收敛情况¶

```def accuracy(out, y):
return (preds == y).cast("float32").mean()

accuracy = accuracy(y, y_standard)
print("accuracy:", accuracy)
```
```accuracy: Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=True,
[0.09375000])
```

### 3.6 使用自动微分功能计算网络的梯度并更新参数¶

```loss.backward()

def OptimizeNetwork(lr=0.5):
print("weight: ", weight)
print("bias: ", bias)
OptimizeNetwork()
print("weight after optimize: ", weight)
print("bias after optimize: ", bias)
```
```weight:  Tensor(shape=[784, 10], dtype=float32, place=Place(cpu), stop_gradient=False,
[[-0.02580861,  0.03132926,  0.07240372, ...,  0.05494612,
-0.03443871, -0.00228449],
[-0.01263286, -0.03029860,  0.04301141, ...,  0.02060869,
-0.00263721, -0.01837303],
[ 0.02355293, -0.06277876, -0.03418431, ...,  0.03847973,
0.02322033,  0.08055742],
...,
[-0.02945464,  0.00892299, -0.07298648, ...,  0.04788664,
0.03856503,  0.07544740],
[ 0.06136639, -0.00014994,  0.00933051, ..., -0.00939863,
0.06214209, -0.01135642],
[-0.01522523, -0.04802566,  0.01832000, ...,  0.01538999,
0.04224478,  0.01449125]])
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
weight after optimize:  Tensor(shape=[784, 10], dtype=float32, place=Place(cpu), stop_gradient=False,
[[-0.05760278,  0.03702446,  0.06256686, ...,  0.13622762,
-0.01372341, -0.04273041],
[-0.04442703, -0.02460339,  0.03317455, ...,  0.10189019,
0.01807809, -0.05881895],
[-0.00824124, -0.05708356, -0.04402117, ...,  0.11976123,
0.04393563,  0.04011151],
...,
[-0.06124880,  0.01461819, -0.08282334, ...,  0.12916814,
0.05928034,  0.03500149],
[ 0.02957222,  0.00554527, -0.00050635, ...,  0.07188287,
0.08285740, -0.05180233],
[-0.04701940, -0.04233045,  0.00848314, ...,  0.09667149,
0.06296009, -0.02595467]])
bias after optimize:  Tensor(shape=[10], dtype=float32, place=Place(cpu), stop_gradient=False,
[ 0.03179417, -0.00569520,  0.00983686, -0.02128297,  0.00566411,
0.02163870,  0.01959525, -0.08128151, -0.02071531,  0.04044591])
```

### 4.1 使用 Layer 改造线性层¶

```class MyLayer(paddle.nn.Layer):
def __init__(self):
super().__init__()
self.weight = self.create_parameter([784,10])

def forward(self, inputs):
```

#### 4.1.1 子类调用父类的构造函数¶

```    def __init__(self):
super().__init__()
```

#### 4.1.2 完成一系列的初始化¶

```    def __init__(self):
super().__init__()
self.weight = self.create_parameter([784,10])
```

### 4.2 访问并自动记录参数的更新过程¶

```my_layer = MyLayer()
for name, param in my_layer.named_parameters():
print("Parameters: {}, {}".format(name, param) )
```
```Parameters: weight, Parameter containing:
[[-0.03399023, -0.02405306, -0.06372951, ..., -0.05039166,
0.05060801,  0.05453540],
[ 0.01788948, -0.06409007,  0.02617371, ...,  0.08341692,
-0.01115795,  0.06199412],
[-0.07155208,  0.01988612,  0.03681165, ..., -0.00741174,
0.03892786,  0.03055505],
...,
[-0.01735171, -0.05819885, -0.05768500, ...,  0.04783282,
0.05039406, -0.04458937],
[ 0.08272233,  0.02620430, -0.00838694, ...,  0.03075657,
-0.05368494,  0.03899705],
[-0.06041612, -0.05808754, -0.07175658, ..., -0.07276732,
0.08097268, -0.00280717]])
Parameters: bias, Parameter containing:
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
```

### 4.3 执行已定义的层¶

#### 4.3.1 进入训练阶段并执行¶

```my_layer = MyLayer()
my_layer.train()
# my_layer.eval()
y = my_layer(x_batch_data)
print("y[0]", y[0])
```

```y[0] Tensor(shape=[10], dtype=float32, place=Place(gpu:0), stop_gradient=False,
[-2.78626776, -2.75923157, -3.15698314, -2.98575473, -5.58894873,
-5.03897095, -1.63698268, -0.70400816, -6.44660282, -2.51351619])
```

#### 4.3.2 计算 loss¶

```loss_func = paddle.nn.functional.nll_loss
y = my_layer(x_batch_data)
loss = loss_func(y, y_standard)
print("loss: ", loss)
```

#### 4.3.3 构建 SGD 优化器、参数传递及计算¶

```my_layer = MyLayer()
y = my_layer(x_batch_data)
loss = loss_func(y, y_standard)
print("loss: ", loss)
```
```loss.backward()
opt.step()
[2.76338077])
```

```class MyLayer(paddle.nn.Layer):
def __init__(self):
super().__init__()

def forward(self, inputs):
```

### 5.1 查看模型的所有层¶

```mylayer = MyLayer()
print(mylayer.sublayers())

print("----------------------")

for item in mylayer.named_sublayers():
print(item)
```
```[Linear(in_features=784, out_features=10, dtype=float32)]
----------------------
('linear', Linear(in_features=784, out_features=10, dtype=float32))
```

### 5.2 向模型添加一个子层¶

```my_layer = MyLayer()
print(my_layer.sublayers())
```
```[Linear(in_features=784, out_features=10, dtype=float32), Linear(in_features=10, out_features=3, dtype=float32)]
```

### 5.3 自定义函数并批量作用在所有子层¶

```def function(layer):
print(layer)

my_layer.apply(function)
Linear(in_features=784, out_features=10, dtype=float32)
Linear(in_features=10, out_features=3, dtype=float32)
```
```MyLayer(
(linear): Linear(in_features=784, out_features=10, dtype=float32)
(fc): Linear(in_features=10, out_features=3, dtype=float32)
)
```

### 5.4 循环访问所有子层¶

```my_layer = MyLayer()
sublayer_iter = my_layer.children()
for sublayer in sublayer_iter:
print(sublayer)
```
```Linear(in_features=784, out_features=10, dtype=float32)
Linear(in_features=10, out_features=3, dtype=float32)
```

### 6.1 批量添加参数变量¶

```class MyLayer(paddle.nn.Layer):
def __init__(self):
super().__init__()
for i in range(10):
def forward(inputs):
pass

my_layer = MyLayer()
for name, item in my_layer.named_parameters():
print(name)
```

### 6.2 添加临时中间变量¶

```class Model(paddle.nn.Layer):

def __init__(self):
super().__init__()
self.saved_tensor = self.create_tensor(name="saved_tensor0")

def forward(self, input):
y = self.flatten(input)
# Save intermediate tensor
y = self.fc(y)
return y
```

### 6.3 添加 Buffer 变量完成动转静¶

Buffer 的概念仅仅影响动态图向静态图的转换过程。在上一节中创建了一个临时变量用来临时存储中间变量的值。但这个临时变量在动态图向静态图转换的过程中并不会被记录在静态的计算图当中。如果希望该变量成为静态图的一部分，就需要进一步调用 register_buffers() 接口。

```class Model(paddle.nn.Layer):

def __init__(self):
super().__init__()
saved_tensor = self.create_tensor(name="saved_tensor0")
self.register_buffer("saved_tensor", saved_tensor, persistable=True)

def forward(self, input):
y = self.flatten(input)
# Save intermediate tensor
y = self.fc(y)
return y
```

```model = Model()
print(model.buffers())
for item in model.named_buffers():
print(item)
```
```[Tensor(Not initialized)]
('saved_tensor', Tensor(Not initialized))
```

## 七、存储模型的参数¶

state_dict 是一个简单的 Python 字典对象，将每一层与它的对应参数建立映射关系。可用于保存 Layer 或者 Optimizer。Layer.state_dict 可以保存训练过程中需要学习的权重和偏执系数，保存文件推荐使用后缀 `.pdparams` 。如果想要连同模型一起保存，则可以参考paddle.jit.save()

```model = Model()
state_dict = model.state_dict()
```model = Model()