模型简介
Attention Cluster模型为ActivityNet Kinetics Challenge 2017中最佳序列模型。该模型通过带Shifting Opeation的Attention Clusters处理已抽取好的RGB、Flow、Audio特征数据,Attention Cluster结构如下图所示。
Shifting Operation通过对每一个attention单元的输出添加一个独立可学习的线性变换处理后进行L2-normalization,使得各attention单元倾向于学习特征的不同成分,从而让Attention Cluster能更好地学习不同分布的数据,提高整个网络的学习表征能力。
详细内容请参考Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification
本例采用的是YouTube-8M 2018年更新之后的数据集。使用官方数据集,并将TFRecord文件转化为pickle文件以便PaddlePaddle使用。Youtube-8M数据集官方提供了frame-level和video-level的特征。本例挂靠的数据集为预处理后的数据集, 该数据集为YouTUbe 8M数据集的子集,仅包含5个视频文件,并且训练和测试使用的数据一样,主要用途是模型示例。
若用户想进行大数据集的训练可按以下步骤操作
数据下载
请使用Youtube-8M官方链接分别下载训练集和验证集。每个链接里各提供了3844个文件的下载地址,用户也可以使用官方提供的下载脚本下载数据。数据下载完成后,将会得到3844个训练数据文件和3844个验证数据文件(TFRecord格式)。 假设存放视频模型代码库的主目录为: Code_Root,进入dataset/youtube8m目录
cd dataset/youtube8m
在youtube8m下新建目录tf/train和tf/val
mkdir tf && cd tf
mkdir train && mkdir val
并分别将下载的train和validate数据存放在其中。
数据格式转化
为了适用于PaddlePaddle训练,需要离线将下载好的TFRecord文件格式转成了pickle格式,转换脚本请使用PaddleVideo/tf2pkl.py。
在dataset/youtube8m 目录下新建目录pkl/train和pkl/val
cd dataset/youtube8m
mkdir pkl && cd pkl
mkdir train && mkdir val
转化文件格式(TFRecord -> pkl),进入dataset/youtube8m目录,运行脚本
python tf2pkl.py ./tf/train ./pkl/train
和
python tf2pkl.py ./tf/val ./pkl/val
分别将train和validate数据集转化为pkl文件。tf2pkl.py文件运行时需要两个参数,分别是数据源tf文件存放路径和转化后的pkl文件存放路径。
备注:由于TFRecord文件的读取需要用到Tensorflow,用户要先安装Tensorflow,或者在安装有Tensorflow的环境中转化完数据,再拷贝到dataset/youtube8m/pkl目录下。为了避免和PaddlePaddle环境冲突,建议先在其他地方转化完成再将数据拷贝过来。
生成文件列表
进入dataset/youtube8m目录
ls $Code_Root/dataset/youtube8m/pkl/train/* > train.list
ls $Code_Root/dataset/youtube8m/pkl/val/* > val.list
在dataset/youtube8m目录下将生成两个文件,train.list和val.list,每一行分别保存了一个pkl文件的绝对路径。
#解压数据集
!cd data/data10073/ && unzip -qo youtube8m.zip
###安装wegt
!pip install wget
Looking in indexes: https://pypi.mirrors.ustc.edu.cn/simple/ Collecting wget Downloading https://mirrors.tuna.tsinghua.edu.cn/pypi/web/packages/47/6a/62e288da7bcda82b935ff0c6cfe542970f04e29c756b0e147251b2fb251f/wget-3.2.zip Building wheels for collected packages: wget Running setup.py bdist_wheel for wget ... done Stored in directory: /home/aistudio/.cache/pip/wheels/26/28/0d/cd5205dcdeaca81bf62909a7cfd449eaf6698e8ab18992f71a Successfully built wget Installing collected packages: wget Successfully installed wget-3.2
#模型训练,模型参数保存在checkpoints,固化模型保存在freeze_model
!python PaddleVideo/train.py --model_name=AttentionCluster \
--config=PaddleVideo/configs/attention_cluster.txt \
--save_dir=PaddleVideo/checkpoints \
--log_interval=20 \
--use_gpu='True' \
--valid_interval=1
[INFO: train.py: 284]: Namespace(batch_size=None, config='PaddleVideo/configs/attention_cluster.txt', enable_ce=False, epoch=1, learning_rate=None, log_interval=20, model_name='AttentionCluster', no_memory_optimize=True, no_use_pyreader=True, pretrain=None, resume=None, save_dir='PaddleVideo/checkpoints', use_gpu=True, valid_interval=1) [INFO: config.py: 66]: ---------------- Train Arguments ---------------- [INFO: config.py: 68]: TEST: [INFO: config.py: 70]: batch_size:5 [INFO: config.py: 70]: filelist:data/data10073/youtube8m/infer.list [INFO: config.py: 68]: TRAIN: [INFO: config.py: 70]: num_gpus:1 [INFO: config.py: 70]: use_gpu:True [INFO: config.py: 70]: learning_rate:0.001 [INFO: config.py: 70]: batch_size:5 [INFO: config.py: 70]: filelist:data/data10073/youtube8m/train.list [INFO: config.py: 70]: epoch:1 [INFO: config.py: 70]: pretrain_base:None [INFO: config.py: 68]: INFER: [INFO: config.py: 70]: batch_size:1 [INFO: config.py: 70]: filelist:data/data10073/youtube8m/infer.list [INFO: config.py: 68]: MODEL: [INFO: config.py: 70]: drop_rate:0.5 [INFO: config.py: 70]: bone_network:None [INFO: config.py: 70]: feature_dims:[1024, 128] [INFO: config.py: 70]: topk:20 [INFO: config.py: 70]: num_classes:3862 [INFO: config.py: 70]: cluster_nums:[32, 32] [INFO: config.py: 70]: feature_num:2 [INFO: config.py: 70]: name:AttentionCluster [INFO: config.py: 70]: dataset:YouTube-8M [INFO: config.py: 70]: feature_names:['rgb', 'audio'] [INFO: config.py: 70]: seg_num:100 [INFO: config.py: 68]: VALID: [INFO: config.py: 70]: batch_size:5 [INFO: config.py: 70]: filelist:data/data10073/youtube8m/val.list [INFO: config.py: 71]: ------------------------------------------------- W0902 17:43:55.617861 608 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0902 17:43:55.621732 608 device_context.cc:267] device: 0, cuDNN Version: 7.3. [WARNING: compiler.py: 239]: You can try our memory optimize feature to save your memory usage: # create a build_strategy variable to set memory optimize option build_strategy = compiler.BuildStrategy() build_strategy.enable_inplace = True build_strategy.memory_optimize = True # pass the build_strategy to with_data_parallel API compiled_prog = compiler.CompiledProgram(main).with_data_parallel( loss_name=loss.name, build_strategy=build_strategy) !!! Memory optimize is our experimental feature !!! some variables may be removed/reused internal to save memory usage, in order to fetch the right value of the fetch_list, please set the persistable property to true for each variable in fetch_list # Sample conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) # if you need to fetch conv1, then: conv1.persistable = True I0902 17:43:55.677418 608 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0902 17:43:55.730741 608 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 [WARNING: compiler.py: 239]: You can try our memory optimize feature to save your memory usage: # create a build_strategy variable to set memory optimize option build_strategy = compiler.BuildStrategy() build_strategy.enable_inplace = True build_strategy.memory_optimize = True # pass the build_strategy to with_data_parallel API compiled_prog = compiler.CompiledProgram(main).with_data_parallel( loss_name=loss.name, build_strategy=build_strategy) !!! Memory optimize is our experimental feature !!! some variables may be removed/reused internal to save memory usage, in order to fetch the right value of the fetch_list, please set the persistable property to true for each variable in fetch_list # Sample conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) # if you need to fetch conv1, then: conv1.persistable = True share_vars_from is set, scope is ignored. I0902 17:43:55.769202 608 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0902 17:43:55.786227 608 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 [INFO: train_utils.py: 30]: ------- learning rate [0.001], learning rate counter [-] ----- [INFO: metrics_util.py: 67]: [TRAIN] Epoch 0, iter 0 , loss = 2678.043701, Hit@1 = 0.00, PERR = 0.00, GAP = 0.00 [INFO: metrics_util.py: 67]: [TRAIN] Epoch 0, iter 20 , loss = 27.297926, Hit@1 = 0.00, PERR = 0.05, GAP = 0.02 [INFO: metrics_util.py: 67]: [TRAIN] Epoch 0, iter 40 , loss = 54.099422, Hit@1 = 0.00, PERR = 0.00, GAP = 0.00
#利用固化后的模型进行预测
!python PaddleVideo/test.py --model_name="AttentionCluster" --config=PaddleVideo/configs/attention_cluster.txt \
--log_interval=10 --weights=PaddleVideo/checkpoints/ --use_gpu='True'
[INFO: test.py: 151]: Namespace(batch_size=None, config='PaddleVideo/configs/attention_cluster.txt', log_interval=10, model_name='AttentionCluster', use_gpu=True, weights='PaddleVideo/checkpoints/') [INFO: config.py: 66]: ---------------- Test Arguments ---------------- [INFO: config.py: 68]: TRAIN: [INFO: config.py: 70]: learning_rate:0.001 [INFO: config.py: 70]: epoch:5 [INFO: config.py: 70]: filelist:data/data10073/youtube8m/train.list [INFO: config.py: 70]: use_gpu:True [INFO: config.py: 70]: num_gpus:1 [INFO: config.py: 70]: pretrain_base:None [INFO: config.py: 70]: batch_size:5 [INFO: config.py: 68]: MODEL: [INFO: config.py: 70]: name:AttentionCluster [INFO: config.py: 70]: bone_network:None [INFO: config.py: 70]: feature_names:['rgb', 'audio'] [INFO: config.py: 70]: seg_num:100 [INFO: config.py: 70]: num_classes:3862 [INFO: config.py: 70]: feature_dims:[1024, 128] [INFO: config.py: 70]: feature_num:2 [INFO: config.py: 70]: dataset:YouTube-8M [INFO: config.py: 70]: cluster_nums:[32, 32] [INFO: config.py: 70]: topk:20 [INFO: config.py: 70]: drop_rate:0.5 [INFO: config.py: 68]: VALID: [INFO: config.py: 70]: filelist:data/data10073/youtube8m/val.list [INFO: config.py: 70]: batch_size:5 [INFO: config.py: 68]: INFER: [INFO: config.py: 70]: filelist:data/data10073/youtube8m/infer.list [INFO: config.py: 70]: batch_size:1 [INFO: config.py: 68]: TEST: [INFO: config.py: 70]: filelist:data/data10073/youtube8m/infer.list [INFO: config.py: 70]: batch_size:5 [INFO: config.py: 71]: ------------------------------------------------- W0902 17:40:24.814287 470 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0902 17:40:24.817791 470 device_context.cc:267] device: 0, cuDNN Version: 7.3. [INFO: metrics_util.py: 67]: [EVAL] Batch 0 , loss = 16.431852, Hit@1 = 0.60, PERR = 0.27, GAP = 0.35 [INFO: metrics_util.py: 67]: [EVAL] Batch 10 , loss = 17.586128, Hit@1 = 0.20, PERR = 0.12, GAP = 0.20 [INFO: metrics_util.py: 67]: [EVAL] Batch 20 , loss = 9.226382, Hit@1 = 0.60, PERR = 0.68, GAP = 0.55 [INFO: metrics_util.py: 67]: [EVAL] Batch 30 , loss = 11.062404, Hit@1 = 0.80, PERR = 0.62, GAP = 0.48 [INFO: metrics_util.py: 67]: [EVAL] Batch 40 , loss = 11.580819, Hit@1 = 0.60, PERR = 0.40, GAP = 0.48 [INFO: metrics_util.py: 67]: [EVAL] Batch 50 , loss = 12.862601, Hit@1 = 0.80, PERR = 0.49, GAP = 0.56 [INFO: metrics_util.py: 67]: [EVAL] Batch 60 , loss = 14.932129, Hit@1 = 0.40, PERR = 0.32, GAP = 0.33 ^C current pid is 470, group id is 469
#利用固化后的模型进行预测,此处仅打印10例结果, 结果分别为vedio_id,所属类别和概率
!python PaddleVideo/freeze_infer.py --use_gpu='True'
W0902 17:42:27.264341 539 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0902 17:42:27.267810 539 device_context.cc:267] device: 0, cuDNN Version: 7.3. [b'Eu4t', [5], [0.9748388528823853]] [b'nC4t', [5], [0.8345569968223572]] [b'0i4t', [8], [0.6524688005447388]] [b'kB4t', [1], [0.8780305981636047]] [b'V04t', [0], [0.8229674696922302]] [b'mQ4t', [8], [0.2174115777015686]] [b'kI4t', [1], [0.5383145213127136]] [b'xr4t', [5], [0.3262545168399811]] [b'oz4t', [0], [0.5421494841575623]] [b'1E4t', [2], [0.6699605584144592]]
点击链接,使用AI Studio一键上手实践项目吧:https://aistudio.baidu.com/aistudio/projectdetail/205013
来源:oschina
链接:https://my.oschina.net/u/4067628/blog/3285386