BERT相关笔记（一）

2020-10-08 10:00:00 study Visits 0

#BERT 相关笔记（一）

BERT 是在Transformer的结构基础上进行更新，所以主要看了一些基础的知识。

相关链接里前两个写的很好，结合着看可以加深理解。

#相关链接

PAN验证结果-20201006

2020-10-06 14:09:32 study Visits 0

测试结果：

Windows下的Ubuntu子系统

2020-10-06 11:31:28 software Visits 0

windows下用Linux下的东西会出现各种各样的问题，装个Ubuntu子系统完美解决问题。

TEA Temporal Excitation and Aggregation for Action Recognition

2020-10-04 19:54:12 study Visits 0

#TEA: Temporal Excitation and Aggregation for Action Recognition

#问题提出

在行为识别网络的时序建模中存在着两个问题：short-range motion encoding和long-range temporal aggregation，即短时运动编码和长时的时间信息融合问题。

前者基本依赖光流来解决，因为计算量很大，且无法满足实时的任务，所以作者提出motion excitation。

后者现有的解决方案有两个：

adopt 2D CNN backbones to extract frame-wise features and then utilize a simple temporal max/average pooling to obtain the whole video representation.
adopt local 3D/(2+1)D convolutional operations to process local temporal window

这样所带来的问题是，时空信息在网络的顶端进行融合，再反向传播回来，可能会导致优化困难。所以作者提出multiple temporal aggregation。

结构图

#相关链接

阴历生日生成脚本

2020-10-04 09:39:51 idea Visits 0

Google Calendar本身没有阴历日历的功能，只能通过生成然后导入的方式添加。

PAN Lite 训练结果

2020-10-02 13:17:42 study Visits 0

#PAN Lite 训练结果

从 26 日 0 点跑到了 29 日 18 点，中间实验室停电耽误了几个小时的时间。

周报-20201002

2020-10-02 12:07:32 study Visits 0

2020-10-02	周报#06	刘潘

禁用Acrobat的自动更新

2020-10-02 11:07:49 software Visits 0

最近其实做了很多事，只是真的没有时间来写博客记录这些了 😔 ，毕业真的好难 😭 。
希望以后可以周更或三日一更吧～而且更自己的东西。

禁用Acrobat的自动更新

2020-10-02 11:07:49 software Visits 0

最近其实做了很多事，只是真的没有时间来写博客记录这些了 😔 ，毕业真的好难 😭 。
希望以后可以周更或三日一更吧～而且更自己的东西。

Temporal Segment Networks for Action Recognition in Videos

2020-09-28 10:53:12 study Visits 0

#Temporal Segment Networks for Action Recognition in Videos

这篇是最近研究的论文的起始论文，提出了一种基于分片采样的策略，传统的不论是双流法还是三维卷积法，受限于 GPU 资源和网络结构的限制，都只能处理一段时间内的视频帧，没有办法做到长时间的采样。

As discussed in Sec. 1, long-range temporal modeling is important for action understanding in videos. The existing deep architectures such as two-stream ConvNets [1] and 3D convolutional networks [16] are designed to operate on a single frame or a stack of frames (e.g., 16 frames) with limited temporal durations. Therefore, these structures lack capacity of incorporating long-range temporal information of videos into the learning of action models.

要解决这样一个问题，有两种方向，第一种是stacking more consecutive frames，第二种是sampling more frames at a fixed rate，即要么堆叠更多的帧数，要么进行局部采样。

但是前者会造成计算复杂度急剧升高，后者会导致模型不能很好地表达完整的信息。

与此同时，作者注意到，其实连续的多帧中其实内容变换很少，所以提出了一种segment based sampling的采样策略。

although the frames are densely recorded in the videos, the content changes relatively slowly.

这个策略的思想其实还蛮简单的，看图就懂了：

网络结构示意图

本质上就是先把视频均分成等份，然后每一份里选取一个RGB、Optical Flow或RGB Differences之类的来代表这一个片段的信息，然后提取这一片的信息（CNN）进行信息融合。

#融合函数

所以整个网络其实就是有三段：分段特征表示、分段信息提取和多段信息融合。

现在往回看去，分段特征表示就是将多帧图像的信息进行转化，转化成一个可以用来计算的方法，即前面所说的RGB、Optical Flow或RGB Differences。

分段信息提取是属于骨干网络的事情，我们也无法进行修改。

故最重要的部分就是，如果将多个分段所提取到的信息进行融合，这个对于模型的表达能力来说，是十分重要的。

As analyzed above, the consensus (aggregation) function is an important component in our temporal segment network framework.

论文中给出了五种融合方法：max pooling、average pooling、top-K pooling、weighted average、attention weighting。

前两个就不解释了，第三个相当于不取最大的，而是取最大的K个进行平均，k=1即max pooling，k=[分片数]即average pooling。

第四个和第五个相当于在平均池化的基础上给每个分片进行了加权。

（PAN 的创新之一就是提出了一种新的加权方式）