TEA Temporal Excitation and Aggregation for Action Recognition
#TEA: Temporal Excitation and Aggregation for Action Recognition
#问题提出
在行为识别网络的时序建模中存在着两个问题:short-range motion encoding
和long-range temporal aggregation
,即短时运动编码和长时的时间信息融合问题。
前者基本依赖光流来解决,因为计算量很大,且无法满足实时的任务,所以作者提出motion excitation
。
后者现有的解决方案有两个:
- adopt 2D CNN backbones to extract frame-wise features and then utilize a simple temporal max/average pooling to obtain the whole video representation.
- adopt local 3D/(2+1)D convolutional operations to process local temporal window
这样所带来的问题是,时空信息在网络的顶端进行融合,再反向传播回来,可能会导致优化困难。所以作者提出multiple temporal aggregation
。
#相关链接
- [CVPR 2020 ] 南京大学/腾讯 PCG 用于时序建模的轻量级行为识别模型 TEA
- TEA: Temporal Excitation and Aggregation for Action Recognition 阅读笔记
- CVPR2020 南大+腾讯 TEA 轻量级视频行为识别模型
- TEA: Temporal Excitation and Aggregation for Action Recognition
- Phoenix1327/tea-action-recognition
- 论文浏览(3) TEA: Temporal Excitation and Aggregation for Action Recognition