迁移学习的原理，基于Keras实现迁移学习

数据科学家prakash jay介绍了迁移学习的原理，基于keras实现迁移学习，以及迁移学习的常见情形。
inception-v3
什么是迁移学习？
机器学习中的迁移学习问题，关注如何保存解决一个问题时获得的知识，并将其应用于另一个相关的不同问题。
为什么迁移学习？
在实践中，很少有人从头训练一个卷积网络，因为很难获取足够的数据集。使用预训练的网络有助于解决大多数手头的问题。
训练深度网络代价高昂。即使使用数百台配备了昂贵的gpu的机器，训练最复杂的模型也需要好多周。
决定深度学习的拓扑/特色/训练方法/超参数是没有多少理论指导的黑魔法。
我的经验
不要试图成为英雄。
—— andrej karapathy
我面对的大多数计算机视觉问题没有非常大的数据集（5000-40000图像）。即使使用极端的数据增强策略，也很难达到像样的精确度。而在少量数据集上训练数百万参数的网络通常会导致过拟合。所以迁移学习是我的救星。
迁移学习为何有效？
让我们看下深度学习网络学习了什么，靠前的层尝试检测边缘，中间层尝试检测形状，而靠后的层尝试检测高层数据特征。这些训练好的网络通常有助于解决其他计算机视觉问题。
下面，让我们看下如何使用keras实现迁移学习，以及迁移学习的常见情形。
基于keras的简单实现
from keras import applications
from keras.preprocessing.image importimagedatagenerator
from keras import optimizers
from keras.models importsequential, model
from keras.layers importdropout, flatten, dense, globalaveragepooling2d
from keras import backend as k
from keras.callbacks importmodelcheckpoint, learningratescheduler, tensorboard, earlystopping
img_width, img_height = 256, 256
train_data_dir = data/train
validation_data_dir = data/val
nb_train_samples = 4125
nb_validation_samples = 466
batch_size = 16
epochs = 50
model = applications.vgg19(weights = imagenet, include_top=false, input_shape = (img_width, img_height, 3))
层 (类型) 输出形状参数数量
=================================================================
input_1 (inputlayer) (none, 256, 256, 3) 0
_________________________________________________________________
block1_conv1 (conv2d) (none, 256, 256, 64) 1792
_________________________________________________________________
block1_conv2 (conv2d) (none, 256, 256, 64) 36928
_________________________________________________________________
block1_pool (maxpooling2d) (none, 128, 128, 64) 0
_________________________________________________________________
block2_conv1 (conv2d) (none, 128, 128, 128) 73856
_________________________________________________________________
block2_conv2 (conv2d) (none, 128, 128, 128) 147584
_________________________________________________________________
block2_pool (maxpooling2d) (none, 64, 64, 128) 0
_________________________________________________________________
block3_conv1 (conv2d) (none, 64, 64, 256) 295168
_________________________________________________________________
block3_conv2 (conv2d) (none, 64, 64, 256) 590080
_________________________________________________________________
block3_conv3 (conv2d) (none, 64, 64, 256) 590080
_________________________________________________________________
block3_conv4 (conv2d) (none, 64, 64, 256) 590080
_________________________________________________________________
block3_pool (maxpooling2d) (none, 32, 32, 256) 0
_________________________________________________________________
block4_conv1 (conv2d) (none, 32, 32, 512) 1180160
_________________________________________________________________
block4_conv2 (conv2d) (none, 32, 32, 512) 2359808
_________________________________________________________________
block4_conv3 (conv2d) (none, 32, 32, 512) 2359808
_________________________________________________________________
block4_conv4 (conv2d) (none, 32, 32, 512) 2359808
_________________________________________________________________
block4_pool (maxpooling2d) (none, 16, 16, 512) 0
_________________________________________________________________
block5_conv1 (conv2d) (none, 16, 16, 512) 2359808
_________________________________________________________________
block5_conv2 (conv2d) (none, 16, 16, 512) 2359808
_________________________________________________________________
block5_conv3 (conv2d) (none, 16, 16, 512) 2359808
_________________________________________________________________
block5_conv4 (conv2d) (none, 16, 16, 512) 2359808
_________________________________________________________________
block5_pool (maxpooling2d) (none, 8, 8, 512) 0
=================================================================
总参数： 20,024,384.0
可训练参数： 20,024,384.0
不可训练参数： 0.0
# 冻结不打算训练的层。这里我冻结了前5层。
for layer in model.layers[:5]:
layer.trainable = false
# 增加定制层
x = model.output
x = flatten()(x)
x = dense(1024, activation=relu)(x)
x = dropout(0.5)(x)
x = dense(1024, activation=relu)(x)
predictions = dense(16, activation=softmax)(x)
# 创建最终模型
model_final = model(input = model.input, output = predictions)
# 编译最终模型
model_final.compile(loss = categorical_crossentropy, optimizer = optimizers.sgd(lr=0.0001, momentum=0.9), metrics=[accuracy])
# 数据增强
train_datagen = imagedatagenerator(
rescale = 1./255,
horizontal_flip = true,
fill_mode = nearest,
zoom_range = 0.3,
width_shift_range = 0.3,
height_shift_range=0.3,
rotation_range=30)
test_datagen = imagedatagenerator(
rescale = 1./255,
horizontal_flip = true,
fill_mode = nearest,
zoom_range = 0.3,
width_shift_range = 0.3,
height_shift_range=0.3,
rotation_range=30)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size = (img_height, img_width),
batch_size = batch_size,
class_mode = categorical)
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size = (img_height, img_width),
class_mode = categorical)
# 保存模型
checkpoint = modelcheckpoint(vgg16_1.h5, monitor='val_acc', verbose=1, save_best_only=true, save_weights_only=false, mode='auto', period=1)
early = earlystopping(monitor='val_acc', min_delta=0, patience=10, verbose=1, mode='auto')
# 训练模型
model_final.fit_generator(
train_generator,
samples_per_epoch = nb_train_samples,
epochs = epochs,
validation_data = validation_generator,
nb_val_samples = nb_validation_samples,
callbacks = [checkpoint, early])
迁移学习的常见情形
别忘了，靠前的层中的卷积特征更通用，靠后的层中的卷积特征更针对原本的数据集。迁移学习有4种主要场景：
1. 新数据集较小，和原数据集相似
如果我们尝试训练整个网络，容易导致过拟合。由于新数据和原数据相似，因此我们期望卷积网络中的高层特征和新数据集相关。因此，建议冻结所有卷积层，只训练分类器（比如，线性分类器）：
for layer in model.layers:
layer.trainable = false
2. 新数据集较大，和原数据集相似
由于我们有更多数据，我们更有自信，如果尝试对整个网络进行精细调整，不会导致过拟合。
for layer in model.layers:
layer.trainable = true
其实默认值就是true，上面的代码明确指定所有层可训练，是为了更清楚地强调这一点。
由于开始的几层检测边缘，你也可以选择冻结这些层。比如，以下代码冻结vgg19的前5层：
for layer in model.layers[:5]:
layer.trainable = false
3. 新数据集很小，但和原数据很不一样
由于数据集很小，我们大概想要从靠前的层提取特征，然后在此之上训练一个分类器：（假定你对h5py有所了解）
from keras import applications
from keras.preprocessing.image importimagedatagenerator
from keras import optimizers
from keras.models importsequential, model
from keras.layers importdropout, flatten, dense, globalaveragepooling2d
from keras import backend as k
from keras.callbacks importmodelcheckpoint, learningratescheduler, tensorboard, earlystopping
img_width, img_height = 256, 256
### 创建网络
img_input = input(shape=(256, 256, 3))
x = conv2d(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = conv2d(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
x = maxpooling2d((2, 2), strides=(2, 2), name='block1_pool')(x)
# 块2
x = conv2d(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
x = conv2d(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
x = maxpooling2d((2, 2), strides=(2, 2), name='block2_pool')(x)
model = model(input = img_input, output = x)
model.summary()
_________________________________________________________________
层 (类型) 输出形状参数数量
=================================================================
input_1 (inputlayer) (none, 256, 256, 3) 0
_________________________________________________________________
block1_conv1 (conv2d) (none, 256, 256, 64) 1792
_________________________________________________________________
block1_conv2 (conv2d) (none, 256, 256, 64) 36928
_________________________________________________________________
block1_pool (maxpooling2d) (none, 128, 128, 64) 0
_________________________________________________________________
block2_conv1 (conv2d) (none, 128, 128, 128) 73856
_________________________________________________________________
block2_conv2 (conv2d) (none, 128, 128, 128) 147584
_________________________________________________________________
block2_pool (maxpooling2d) (none, 64, 64, 128) 0
=================================================================
总参数：260,160.0
可训练参数：260,160.0
不可训练参数：0.0
layer_dict = dict([(layer.name, layer) for layer in model.layers])
[layer.name for layer in model.layers]
['input_1',
'block1_conv1',
'block1_conv2',
'block1_pool',
'block2_conv1',
'block2_conv2',
'block2_pool']
import h5py
weights_path = 'vgg19_weights.h5'# ('https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels.h5)
f = h5py.file(weights_path)
list(f[model_weights].keys())
['block1_conv1',
'block1_conv2',
'block1_pool',
'block2_conv1',
'block2_conv2',
'block2_pool',
'block3_conv1',
'block3_conv2',
'block3_conv3',
'block3_conv4',
'block3_pool',
'block4_conv1',
'block4_conv2',
'block4_conv3',
'block4_conv4',
'block4_pool',
'block5_conv1',
'block5_conv2',
'block5_conv3',
'block5_conv4',
'block5_pool',
'dense_1',
'dense_2',
'dense_3',
'dropout_1',
'global_average_pooling2d_1',
'input_1']
# 列出模型中的所有层的名称
layer_names = [layer.name for layer in model.layers]
# 提取`.h5`文件中每层的模型权重
>>> f[model_weights][block1_conv1].attrs[weight_names]
array([b'block1_conv1/kernel:0', b'block1_conv1/bias:0'],
dtype='|s21')
# 将这一数组分配给weight_names
>>> f[model_weights][block1_conv1][block1_conv1/kernel:0]
# 列表推导（weights）储存层的权重和偏置
>>>layer_names.index(block1_conv1)
1
>>> model.layers[1].set_weights(weights)
# 为特定层设置权重。
使用for循环我们可以为整个网络设置权重。
for i in layer_dict.keys():
weight_names = f[model_weights][i].attrs[weight_names]
weights = [f[model_weights][i][j] for j in weight_names]
index = layer_names.index(i)
model.layers[index].set_weights(weights)
import cv2
import numpy as np
import pandas as pd
from tqdm import tqdm
import itertools
import glob
features = []
for i in tqdm(files_location):
im = cv2.imread(i)
im = cv2.resize(cv2.cvtcolor(im, cv2.color_bgr2rgb), (256, 256)).astype(np.float32) / 255.0
im = np.expand_dims(im, axis =0)
outcome = model_final.predict(im)
features.append(outcome)
## 收集这些特征，创建一个dataframe，在其上训练一个分类器
以上代码提取block2_pool特征。通常而言，由于这层有64 x 64 x 128特征，在其上训练一个分类器可能于事无补。我们可以加上一些全连接层，然后在其基础上训练神经网络。
增加少量全连接层和一个输出层。
为靠前的层设置权重，然后冻结。
训练网络。
4. 新数据集很大，和原数据很不一样
由于你有一个很大的数据集，你可以设计你自己的网络，或者使用现有的网络。
你可以基于随机初始化权重或预训练网络权重初始化训练网络。一般选择后者。
你可以使用不同的网络，或者基于现有网络做些改动。

6秒完成超算需要47年的计算，谷歌再次祭出“量子霸权”
乐鑫ESP32-C3的Wi-Fi单火线智能开关方案
深度剖析电网5G化遇到的难题
固定翼与多旋翼无人机的区别,如何选择应用于航测工作中
双十二买什么智能语音鼠标？看完这篇你就懂了
迁移学习的原理，基于Keras实现迁移学习
正在扩张的区块链将如何转变供应链？
METZ CONNECT推出千兆以太网插塞式连接器
66W超级快充120Hz全视屏荣耀Play5活力版正式开售
如何使用Vivado Device Programmer创建和配置存储设备
从负200万到年营收10亿
携手深耕城市数字化，华为政务一网通军团与拓维信息等同舟共济伙伴签约
松下研发出一种可在黑暗中拍摄250米远10厘米物体图像的深度图像传感器
算法资源波动比肩过山车？
魅族pro7最新曝光真机上手，双屏是一定了，这也许是副屏设计最好看的一次了
飞兆半导体扩展3.3V光电耦合器精选产品
新能源汽车用功率器件需求量大国产功率半导体突破曙光初现
不同形式有源电力滤波器与负载之间的连接原理图
深度解析室温超导实验
了解数据表热参数和IC结温